[v7,1/7] driver core: Add support for linking devices during device addition
diff mbox series

Message ID 20190724001100.133423-2-saravanak@google.com
State New
Headers show
Series
  • Solve postboot supplier cleanup and optimize probe ordering
Related show

Commit Message

Saravana Kannan July 24, 2019, 12:10 a.m. UTC
When devices are added, the bus might want to create device links to track
functional dependencies between supplier and consumer devices. This
tracking of supplier-consumer relationship allows optimizing device probe
order and tracking whether all consumers of a supplier are active. The
add_links bus callback is added to support this.

However, when consumer devices are added, they might not have a supplier
device to link to despite needing mandatory resources/functionality from
one or more suppliers. A waiting_for_suppliers list is created to track
such consumers and retry linking them when new devices get added.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/device.h | 14 +++++++
 2 files changed, 97 insertions(+)

Comments

Frank Rowand Aug. 8, 2019, 2:04 a.m. UTC | #1
> Date: Tue, 23 Jul 2019 17:10:54 -0700
> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>  device addition
> From: Saravana Kannan <saravanak@google.com>
> 
> When devices are added, the bus might want to create device links to track
> functional dependencies between supplier and consumer devices. This
> tracking of supplier-consumer relationship allows optimizing device probe
> order and tracking whether all consumers of a supplier are active. The
> add_links bus callback is added to support this.

Change above to:

When devices are added, the bus may create device links to track which
suppliers a consumer device depends upon.  This
tracking of supplier-consumer relationship may be used to defer probing
the driver of a consumer device before the driver(s) for its supplier device(s)
are probed.  It may also be used by a supplier driver to determine if
all of its consumers have been successfully probed.
The add_links bus callback is added to create the supplier device links

> 
> However, when consumer devices are added, they might not have a supplier
> device to link to despite needing mandatory resources/functionality from
> one or more suppliers. A waiting_for_suppliers list is created to track
> such consumers and retry linking them when new devices get added.

Change above to:

If a supplier device has not yet been created when the consumer device attempts
to link it, the consumer device is added to the wait_for_suppliers list.
When supplier devices are created, the supplier device link will be added to
the relevant consumer devices on the wait_for_suppliers list.

> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/device.h | 14 +++++++
>  2 files changed, 97 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index da84a73f2ba6..1b4eb221968f 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>  #endif
>  
>  /* Device links support. */
> +static LIST_HEAD(wait_for_suppliers);
> +static DEFINE_MUTEX(wfs_lock);
>  
>  #ifdef CONFIG_SRCU
>  static DEFINE_MUTEX(device_links_lock);
> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
>  }
>  EXPORT_SYMBOL_GPL(device_link_add);
>  
> +/**

> + * device_link_wait_for_supplier - Mark device as waiting for supplier

    * device_link_wait_for_supplier - Add device to wait_for_suppliers list


> + * @consumer: Consumer device
> + *
> + * Marks the consumer device as waiting for suppliers to become available. The
> + * consumer device will never be probed until it's unmarked as waiting for
> + * suppliers. The caller is responsible for adding the link to the supplier
> + * once the supplier device is present.
> + *
> + * This function is NOT meant to be called from the probe function of the
> + * consumer but rather from code that creates/adds the consumer device.
> + */
> +static void device_link_wait_for_supplier(struct device *consumer)
> +{
> +	mutex_lock(&wfs_lock);
> +	list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> +	mutex_unlock(&wfs_lock);
> +}
> +
> +/**


> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> + *
> + * Loops through all consumers waiting on suppliers and tries to add all their
> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> + *
> + * The add_links bus callback is expected to return 0 if it has found and added
> + * all the supplier links for the consumer device. It should return an error if
> + * it isn't able to do so.
> + *
> + * The caller of device_link_wait_for_supplier() is expected to call this once
> + * it's aware of potential suppliers becoming available.

Change above comment to:

    * device_link_add_supplier_links - add links from consumer devices to
    *                                  supplier devices, leaving any consumer
    *                                  with inactive suppliers on the
    *                                  wait_for_suppliers list

    * Scan all consumer devices in the devicetree.  For any supplier device that
    * is not already linked to the consumer device, add the supplier to the
    * consumer device's device links.
    *
    * If all of a consumer device's suppliers are available then the consumer
    * is removed from the wait_for_suppliers list (if previously on the list).
    * Otherwise the consumer is added to the wait_for_suppliers list (if not
    * already on the list).


    * The add_links bus callback must return 0 if it has found and added all
    * the supplier links for the consumer device. It must return an error if
    * it is not able to do so.
    *
    * The caller of device_link_wait_for_supplier() is expected to call this once
    * it is aware of potential suppliers becoming available.



> + */
> +static void device_link_check_waiting_consumers(void)

Function name is misleading and hides side effects.

I have not come up with a name that does not hide side effects, but a better
name would be:

   device_link_add_supplier_links()


> +{
> +	struct device *dev, *tmp;
> +
> +	mutex_lock(&wfs_lock);
> +	list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> +				 links.needs_suppliers)
> +		if (!dev->bus->add_links(dev))
> +			list_del_init(&dev->links.needs_suppliers);

Empties dev->links.needs_suppliers, but does not remove dev from
wait_for_suppliers list.  Where does that happen?

> +	mutex_unlock(&wfs_lock);
> +}
> +
>  static void device_link_free(struct device_link *link)
>  {
>  	while (refcount_dec_not_one(&link->rpm_active))
> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
>  	struct device_link *link;
>  	int ret = 0;
>  
> +	/*
> +	 * If a device is waiting for one or more suppliers (in
> +	 * wait_for_suppliers list), it is not ready to probe yet. So just
> +	 * return -EPROBE_DEFER without having to check the links with existing
> +	 * suppliers.
> +	 */

Change comment to:

	/*
	 * Device waiting for supplier to become available is not allowed
	 * to probe
	 */

> +	mutex_lock(&wfs_lock);
> +	if (!list_empty(&dev->links.needs_suppliers)) {
> +		mutex_unlock(&wfs_lock);
> +		return -EPROBE_DEFER;
> +	}
> +	mutex_unlock(&wfs_lock);
> +
>  	device_links_write_lock();

Update Documentation/driver-api/device_link.rst to reflect the
check of &dev->links.needs_suppliers in device_links_check_suppliers().

>  
>  	list_for_each_entry(link, &dev->links.suppliers, c_node) {
> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
>  {
>  	struct device_link *link, *ln;
>  
> +	mutex_lock(&wfs_lock);
> +	list_del(&dev->links.needs_suppliers);
> +	mutex_unlock(&wfs_lock);
> +
>  	/*
>  	 * Delete all of the remaining links from this device to any other
>  	 * devices (either consumers or suppliers).
> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
>  #endif
>  	INIT_LIST_HEAD(&dev->links.consumers);
>  	INIT_LIST_HEAD(&dev->links.suppliers);
> +	INIT_LIST_HEAD(&dev->links.needs_suppliers);
>  	dev->links.status = DL_DEV_NO_DRIVER;
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
>  					     BUS_NOTIFY_ADD_DEVICE, dev);
>  
>  	kobject_uevent(&dev->kobj, KOBJ_ADD);

> +
> +	/*
> +	 * Check if any of the other devices (consumers) have been waiting for
> +	 * this device (supplier) to be added so that they can create a device
> +	 * link to it.
> +	 *
> +	 * This needs to happen after device_pm_add() because device_link_add()
> +	 * requires the supplier be registered before it's called.
> +	 *
> +	 * But this also needs to happe before bus_probe_device() to make sure
> +	 * waiting consumers can link to it before the driver is bound to the
> +	 * device and the driver sync_state callback is called for this device.
> +	 */

	/*
	 * Add links to dev from any dependent consumer that has dev on it's
	 * list of needed suppliers (links.needs_suppliers).  Device_pm_add()
	 * must have previously registered dev to allow the links to be added.
	 *
	 * The consumer links must be created before dev is probed because the
	 * sync_state callback for dev will use the consumer links.
	 */

> +	device_link_check_waiting_consumers();
> +
> +	if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> +		device_link_wait_for_supplier(dev);
> +
>  	bus_probe_device(dev);
>  	if (parent)
>  		klist_add_tail(&dev->p->knode_parent,
> diff --git a/include/linux/device.h b/include/linux/device.h
> index c330b75c6c57..5d70babb7462 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
>   *		-EPROBE_DEFER it will queue the device for deferred probing.
>   * @uevent:	Called when a device is added, removed, or a few other things
>   *		that generate uevents to add the environment variables.

> + * @add_links:	Called, perhaps multiple times per device, after a device is
> + *		added to this bus.  The function is expected to create device
> + *		links to all the suppliers of the input device that are
> + *		available at the time this function is called.  As in, the
> + *		function should NOT stop at the first failed device link if
> + *		other unlinked supplier devices are present in the system.

* @add_links:	Called after a device is added to this bus.  The function is
*		expected to create device links to all the suppliers of the
*		device that are available at the time this function is called.
*		The function must NOT stop at the first failed device link if
*		other unlinked supplier devices are present in the system.
*		If some suppliers are not yet available, this function will be
*		called again when the suppliers become available.

but add_links() not needed, so moving this comment to of_link_to_suppliers()


> + *
> + *		Return 0 if device links have been successfully created to all
> + *		the suppliers of this device.  Return an error if some of the
> + *		suppliers are not yet available and this function needs to be
> + *		reattempted in the future.

*
*		Return 0 if device links have been successfully created to all
*		the suppliers of this device.  Return an error if some of the
*		suppliers are not yet available.


>   * @probe:	Called when a new device or driver add to this bus, and callback
>   *		the specific driver's probe to initial the matched device.
>   * @remove:	Called when a device removed from this bus.
> @@ -122,6 +133,7 @@ struct bus_type {
>  
>  	int (*match)(struct device *dev, struct device_driver *drv);
>  	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);


> +	int (*add_links)(struct device *dev);

              ^^^^^^^^^  add_supplier              ???
              ^^^^^^^^^  add_suppliers             ???

              ^^^^^^^^^  link_suppliers            ???

              ^^^^^^^^^  add_supplier_dependency   ???
              ^^^^^^^^^  add_supplier_dependencies ???
add_links() not needed


>  	int (*probe)(struct device *dev);
>  	int (*remove)(struct device *dev);
>  	void (*shutdown)(struct device *dev);




> @@ -893,11 +905,13 @@ enum dl_dev_state {
>   * struct dev_links_info - Device data related to device links.
>   * @suppliers: List of links to supplier devices.
>   * @consumers: List of links to consumer devices.

> + * @needs_suppliers: Hook to global list of devices waiting for suppliers.

    * @needs_suppliers: List of devices deferring probe until supplier drivers
    *                   are successfully probed.

>   * @status: Driver status information.
>   */
>  struct dev_links_info {
>  	struct list_head suppliers;
>  	struct list_head consumers;
> +	struct list_head needs_suppliers;
>  	enum dl_dev_state status;
>  };
>  
> -- 
> 2.22.0.709.g102302147b-goog
> 
>
Saravana Kannan Aug. 16, 2019, 1:50 a.m. UTC | #2
On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> > Date: Tue, 23 Jul 2019 17:10:54 -0700
> > Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >  device addition
> > From: Saravana Kannan <saravanak@google.com>
> >
> > When devices are added, the bus might want to create device links to track
> > functional dependencies between supplier and consumer devices. This
> > tracking of supplier-consumer relationship allows optimizing device probe
> > order and tracking whether all consumers of a supplier are active. The
> > add_links bus callback is added to support this.
>
> Change above to:
>
> When devices are added, the bus may create device links to track which
> suppliers a consumer device depends upon.  This
> tracking of supplier-consumer relationship may be used to defer probing
> the driver of a consumer device before the driver(s) for its supplier device(s)
> are probed.  It may also be used by a supplier driver to determine if
> all of its consumers have been successfully probed.
> The add_links bus callback is added to create the supplier device links
>
> >
> > However, when consumer devices are added, they might not have a supplier
> > device to link to despite needing mandatory resources/functionality from
> > one or more suppliers. A waiting_for_suppliers list is created to track
> > such consumers and retry linking them when new devices get added.
>
> Change above to:
>
> If a supplier device has not yet been created when the consumer device attempts
> to link it, the consumer device is added to the wait_for_suppliers list.
> When supplier devices are created, the supplier device link will be added to
> the relevant consumer devices on the wait_for_suppliers list.
>

I'll take these commit text suggestions if we decide to revert the
entire series at the end of this review.

> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/device.h | 14 +++++++
> >  2 files changed, 97 insertions(+)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index da84a73f2ba6..1b4eb221968f 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> >  #endif
> >
> >  /* Device links support. */
> > +static LIST_HEAD(wait_for_suppliers);
> > +static DEFINE_MUTEX(wfs_lock);
> >
> >  #ifdef CONFIG_SRCU
> >  static DEFINE_MUTEX(device_links_lock);
> > @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
> >  }
> >  EXPORT_SYMBOL_GPL(device_link_add);
> >
> > +/**
>
> > + * device_link_wait_for_supplier - Mark device as waiting for supplier
>
>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list

I intentionally chose "Mark device..." because that's a better
description of the semantics of the function instead of trying to
describe the implementation. Whether I'm using a linked list or some
other data structure should not be the one line documentation of a
function. Unless the function is explicitly about operating on that
specific data structure.

>
>
> > + * @consumer: Consumer device
> > + *
> > + * Marks the consumer device as waiting for suppliers to become available. The
> > + * consumer device will never be probed until it's unmarked as waiting for
> > + * suppliers. The caller is responsible for adding the link to the supplier
> > + * once the supplier device is present.
> > + *
> > + * This function is NOT meant to be called from the probe function of the
> > + * consumer but rather from code that creates/adds the consumer device.
> > + */
> > +static void device_link_wait_for_supplier(struct device *consumer)
> > +{
> > +     mutex_lock(&wfs_lock);
> > +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> > +     mutex_unlock(&wfs_lock);
> > +}
> > +
> > +/**
>
>
> > + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> > + *
> > + * Loops through all consumers waiting on suppliers and tries to add all their
> > + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> > + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> > + *
> > + * The add_links bus callback is expected to return 0 if it has found and added
> > + * all the supplier links for the consumer device. It should return an error if
> > + * it isn't able to do so.
> > + *
> > + * The caller of device_link_wait_for_supplier() is expected to call this once
> > + * it's aware of potential suppliers becoming available.
>
> Change above comment to:
>
>     * device_link_add_supplier_links - add links from consumer devices to
>     *                                  supplier devices, leaving any consumer
>     *                                  with inactive suppliers on the
>     *                                  wait_for_suppliers list

I didn't know that the first one line comment could span multiple
lines. Good to know.


>     * Scan all consumer devices in the devicetree.

This function doesn't have anything to do with devicetree. I've
intentionally kept all OF related parts out of the driver/core because
I hope that other busses can start using this feature too. So I can't
take this bit.

>  For any supplier device that
>     * is not already linked to the consumer device, add the supplier to the
>     * consumer device's device links.
>     *
>     * If all of a consumer device's suppliers are available then the consumer
>     * is removed from the wait_for_suppliers list (if previously on the list).
>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
>     * already on the list).

Honestly, I don't think this is any better than what I already have.

>     * The add_links bus callback must return 0 if it has found and added all
>     * the supplier links for the consumer device. It must return an error if
>     * it is not able to do so.
>     *
>     * The caller of device_link_wait_for_supplier() is expected to call this once
>     * it is aware of potential suppliers becoming available.
>
>
>
> > + */
> > +static void device_link_check_waiting_consumers(void)
>
> Function name is misleading and hides side effects.
>
> I have not come up with a name that does not hide side effects, but a better
> name would be:
>
>    device_link_add_supplier_links()

I kinda agree that it could afford a better name. The current name is
too similar to device_links_check_suppliers() and I never liked that.

Maybe device_link_add_missing_suppliers()?

I don't think we need "links" repeated twice in the function name.
With this suggestion, what side effect is hidden in your opinion? That
the fully linked consumer is removed from the "waiting for suppliers"
list?

Maybe device_link_try_removing_from_wfs()?

I'll wait for us to agree on a better name here before I change this.

> > +{
> > +     struct device *dev, *tmp;
> > +
> > +     mutex_lock(&wfs_lock);
> > +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> > +                              links.needs_suppliers)
> > +             if (!dev->bus->add_links(dev))
> > +                     list_del_init(&dev->links.needs_suppliers);
>
> Empties dev->links.needs_suppliers, but does not remove dev from
> wait_for_suppliers list.  Where does that happen?

I'll chalk this up to you having a long day or forgetting your coffee
:) list_del_init() does both of those things because needs_suppliers
is the node and wait_for_suppliers is the list.

>
> > +     mutex_unlock(&wfs_lock);
> > +}
> > +
> >  static void device_link_free(struct device_link *link)
> >  {
> >       while (refcount_dec_not_one(&link->rpm_active))
> > @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
> >       struct device_link *link;
> >       int ret = 0;
> >
> > +     /*
> > +      * If a device is waiting for one or more suppliers (in
> > +      * wait_for_suppliers list), it is not ready to probe yet. So just
> > +      * return -EPROBE_DEFER without having to check the links with existing
> > +      * suppliers.
> > +      */
>
> Change comment to:
>
>         /*
>          * Device waiting for supplier to become available is not allowed
>          * to probe
>          */

Po-tay-to. Po-tah-to? I think my comment is just as good.

> > +     mutex_lock(&wfs_lock);
> > +     if (!list_empty(&dev->links.needs_suppliers)) {
> > +             mutex_unlock(&wfs_lock);
> > +             return -EPROBE_DEFER;
> > +     }
> > +     mutex_unlock(&wfs_lock);
> > +
> >       device_links_write_lock();
>
> Update Documentation/driver-api/device_link.rst to reflect the
> check of &dev->links.needs_suppliers in device_links_check_suppliers().

Thanks! Will do.

>
> >
> >       list_for_each_entry(link, &dev->links.suppliers, c_node) {
> > @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
> >  {
> >       struct device_link *link, *ln;
> >
> > +     mutex_lock(&wfs_lock);
> > +     list_del(&dev->links.needs_suppliers);
> > +     mutex_unlock(&wfs_lock);
> > +
> >       /*
> >        * Delete all of the remaining links from this device to any other
> >        * devices (either consumers or suppliers).
> > @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
> >  #endif
> >       INIT_LIST_HEAD(&dev->links.consumers);
> >       INIT_LIST_HEAD(&dev->links.suppliers);
> > +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
> >       dev->links.status = DL_DEV_NO_DRIVER;
> >  }
> >  EXPORT_SYMBOL_GPL(device_initialize);
> > @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
> >                                            BUS_NOTIFY_ADD_DEVICE, dev);
> >
> >       kobject_uevent(&dev->kobj, KOBJ_ADD);
>
> > +
> > +     /*
> > +      * Check if any of the other devices (consumers) have been waiting for
> > +      * this device (supplier) to be added so that they can create a device
> > +      * link to it.
> > +      *
> > +      * This needs to happen after device_pm_add() because device_link_add()
> > +      * requires the supplier be registered before it's called.
> > +      *
> > +      * But this also needs to happe before bus_probe_device() to make sure
> > +      * waiting consumers can link to it before the driver is bound to the
> > +      * device and the driver sync_state callback is called for this device.
> > +      */
>
>         /*
>          * Add links to dev from any dependent consumer that has dev on it's
>          * list of needed suppliers

There is no list of needed suppliers.

> (links.needs_suppliers).  Device_pm_add()
>          * must have previously registered dev to allow the links to be added.
>          *
>          * The consumer links must be created before dev is probed because the
>          * sync_state callback for dev will use the consumer links.
>          */

I think what I wrote is just as clear.

>
> > +     device_link_check_waiting_consumers();
> > +
> > +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> > +             device_link_wait_for_supplier(dev);
> > +
> >       bus_probe_device(dev);
> >       if (parent)
> >               klist_add_tail(&dev->p->knode_parent,
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index c330b75c6c57..5d70babb7462 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
> >   *           -EPROBE_DEFER it will queue the device for deferred probing.
> >   * @uevent:  Called when a device is added, removed, or a few other things
> >   *           that generate uevents to add the environment variables.
>
> > + * @add_links:       Called, perhaps multiple times per device, after a device is
> > + *           added to this bus.  The function is expected to create device
> > + *           links to all the suppliers of the input device that are
> > + *           available at the time this function is called.  As in, the
> > + *           function should NOT stop at the first failed device link if
> > + *           other unlinked supplier devices are present in the system.
>
> * @add_links:   Called after a device is added to this bus.

Why are you removing the "perhaps multiple times" part? that's true
and that's how some of the other ops are documented.

>  The function is
> *               expected to create device links to all the suppliers of the
> *               device that are available at the time this function is called.
> *               The function must NOT stop at the first failed device link if
> *               other unlinked supplier devices are present in the system.
> *               If some suppliers are not yet available, this function will be
> *               called again when the suppliers become available.
>
> but add_links() not needed, so moving this comment to of_link_to_suppliers()

Sorry, I'm not sure I understand. Can you please explain what you are
trying to say? of_link_to_suppliers() is just one implementation of
add_links(). The comment above is try for any bus trying to implement
add_links().

>
>
> > + *
> > + *           Return 0 if device links have been successfully created to all
> > + *           the suppliers of this device.  Return an error if some of the
> > + *           suppliers are not yet available and this function needs to be
> > + *           reattempted in the future.
>
> *
> *               Return 0 if device links have been successfully created to all
> *               the suppliers of this device.  Return an error if some of the
> *               suppliers are not yet available.
>
>
> >   * @probe:   Called when a new device or driver add to this bus, and callback
> >   *           the specific driver's probe to initial the matched device.
> >   * @remove:  Called when a device removed from this bus.
> > @@ -122,6 +133,7 @@ struct bus_type {
> >
> >       int (*match)(struct device *dev, struct device_driver *drv);
> >       int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
>
>
> > +     int (*add_links)(struct device *dev);
>
>               ^^^^^^^^^  add_supplier              ???
>               ^^^^^^^^^  add_suppliers             ???
>
>               ^^^^^^^^^  link_suppliers            ???
>
>               ^^^^^^^^^  add_supplier_dependency   ???
>               ^^^^^^^^^  add_supplier_dependencies ???
> add_links() not needed

add_links() was an intentional decision. There's no requirement that
the bus should only create links from this device to its suppliers. If
the bus also knows the consumers of this device (dev), then it
can/should add those too. So, it shouldn't have "suppliers" in the
name.

> >       int (*probe)(struct device *dev);
> >       int (*remove)(struct device *dev);
> >       void (*shutdown)(struct device *dev);
>
>
>
>
> > @@ -893,11 +905,13 @@ enum dl_dev_state {
> >   * struct dev_links_info - Device data related to device links.
> >   * @suppliers: List of links to supplier devices.
> >   * @consumers: List of links to consumer devices.
>
> > + * @needs_suppliers: Hook to global list of devices waiting for suppliers.
>
>     * @needs_suppliers: List of devices deferring probe until supplier drivers
>     *                   are successfully probed.

It's "need suppliers". As in, this is a device that "needs suppliers".
So, no, this is not a list. This is a node in a list. And all "nodes
in a list" are documented as "Hook" in rest of places in this file. So
I think the documentation is correct as is.

Thanks for your review.



-Saravana
Frank Rowand Aug. 19, 2019, 3:38 a.m. UTC | #3
On 8/15/19 6:50 PM, Saravana Kannan wrote:
> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>>>  device addition
>>> From: Saravana Kannan <saravanak@google.com>
>>>
>>> When devices are added, the bus might want to create device links to track
>>> functional dependencies between supplier and consumer devices. This
>>> tracking of supplier-consumer relationship allows optimizing device probe
>>> order and tracking whether all consumers of a supplier are active. The
>>> add_links bus callback is added to support this.
>>
>> Change above to:
>>
>> When devices are added, the bus may create device links to track which
>> suppliers a consumer device depends upon.  This
>> tracking of supplier-consumer relationship may be used to defer probing
>> the driver of a consumer device before the driver(s) for its supplier device(s)
>> are probed.  It may also be used by a supplier driver to determine if
>> all of its consumers have been successfully probed.
>> The add_links bus callback is added to create the supplier device links
>>
>>>
>>> However, when consumer devices are added, they might not have a supplier
>>> device to link to despite needing mandatory resources/functionality from
>>> one or more suppliers. A waiting_for_suppliers list is created to track
>>> such consumers and retry linking them when new devices get added.
>>
>> Change above to:
>>
>> If a supplier device has not yet been created when the consumer device attempts
>> to link it, the consumer device is added to the wait_for_suppliers list.
>> When supplier devices are created, the supplier device link will be added to
>> the relevant consumer devices on the wait_for_suppliers list.
>>
> 
> I'll take these commit text suggestions if we decide to revert the
> entire series at the end of this review.
> 
>>>
>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>> ---
>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/device.h | 14 +++++++
>>>  2 files changed, 97 insertions(+)
>>>
>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>> index da84a73f2ba6..1b4eb221968f 100644
>>> --- a/drivers/base/core.c
>>> +++ b/drivers/base/core.c
>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>>>  #endif
>>>
>>>  /* Device links support. */
>>> +static LIST_HEAD(wait_for_suppliers);
>>> +static DEFINE_MUTEX(wfs_lock);
>>>
>>>  #ifdef CONFIG_SRCU
>>>  static DEFINE_MUTEX(device_links_lock);
>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
>>>  }
>>>  EXPORT_SYMBOL_GPL(device_link_add);
>>>
>>> +/**
>>
>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
>>
>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> 

As a meta-comment, I found this series very hard to understand in the context
of reading the new code for the first time.  When I read the code again in
six months or a year or two years it will not be in near term memory and it
will be as if I am reading it for the first time.  A lot of my suggestions
for changes of names are in that context -- the current names may be fine
when one has recently read the code, but not so much when trying to read
the whole thing again with a blank mind.

The code also inherits a good deal of complexity because it does not stand
alone in a nice discrete chunk, but instead delicately weaves into a more
complex body of code.

When I was trying to understand the code, I wrote a lot of additional
comments within my reply email to provide myself context, information
about various things, and questions that I needed to answer (or if I
could not answer to then ask you).  Then I ended up being able to remove
many of those notes before sending the reply.


> I intentionally chose "Mark device..." because that's a better
> description of the semantics of the function instead of trying to
> describe the implementation. Whether I'm using a linked list or some
> other data structure should not be the one line documentation of a
> function. Unless the function is explicitly about operating on that
> specific data structure.

I agree with the intent of trying to describe the semantics of a function,
especially at the API level where other systems (or drivers) would be using
the function.  But for this case the function is at the implementation level
and describing explicitly what it is doing makes this much more readable for
me.

I also find "Mark device" to be vague and not descriptive of what the
intent is.

> 
>>
>>
>>> + * @consumer: Consumer device
>>> + *
>>> + * Marks the consumer device as waiting for suppliers to become available. The
>>> + * consumer device will never be probed until it's unmarked as waiting for
>>> + * suppliers. The caller is responsible for adding the link to the supplier
>>> + * once the supplier device is present.
>>> + *
>>> + * This function is NOT meant to be called from the probe function of the
>>> + * consumer but rather from code that creates/adds the consumer device.
>>> + */
>>> +static void device_link_wait_for_supplier(struct device *consumer)
>>> +{
>>> +     mutex_lock(&wfs_lock);
>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
>>> +     mutex_unlock(&wfs_lock);
>>> +}
>>> +
>>> +/**
>>
>>
>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
>>> + *
>>> + * Loops through all consumers waiting on suppliers and tries to add all their
>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
>>> + *
>>> + * The add_links bus callback is expected to return 0 if it has found and added
>>> + * all the supplier links for the consumer device. It should return an error if
>>> + * it isn't able to do so.
>>> + *
>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
>>> + * it's aware of potential suppliers becoming available.
>>
>> Change above comment to:
>>
>>     * device_link_add_supplier_links - add links from consumer devices to
>>     *                                  supplier devices, leaving any consumer
>>     *                                  with inactive suppliers on the
>>     *                                  wait_for_suppliers list
> 
> I didn't know that the first one line comment could span multiple
> lines. Good to know.
> 
> 
>>     * Scan all consumer devices in the devicetree.
> 
> This function doesn't have anything to do with devicetree. I've
> intentionally kept all OF related parts out of the driver/core because
> I hope that other busses can start using this feature too. So I can't
> take this bit.

My comment is left over from when I was taking notes, trying to understand the
code.

At the moment, only devicetree is used as a source of the dependency information.
The comment would better be re-phrased as:

        * Scan all consumer devices in the firmware description of the hardware topology

I did not ask why this feature is tied to _only_ the platform bus, but will now.

I do not know of any reason that a consumer / supplier relationship can not be
between devices on different bus types.  Do you know of such a reason?


> 
>>  For any supplier device that
>>     * is not already linked to the consumer device, add the supplier to the
>>     * consumer device's device links.
>>     *
>>     * If all of a consumer device's suppliers are available then the consumer
>>     * is removed from the wait_for_suppliers list (if previously on the list).
>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
>>     * already on the list).
> 
> Honestly, I don't think this is any better than what I already have.

Note that my version of these comments was written while I was reading the code,
and did not have any big picture understanding yet.  This will likely also be
the mind set of most everyone who reads this code in the future, once it is
woven into the kernel.

If you don't like the change, I can revisit it in a later version of the
patch set.


> 
>>     * The add_links bus callback must return 0 if it has found and added all
>>     * the supplier links for the consumer device. It must return an error if
>>     * it is not able to do so.
>>     *
>>     * The caller of device_link_wait_for_supplier() is expected to call this once
>>     * it is aware of potential suppliers becoming available.
>>
>>
>>
>>> + */
>>> +static void device_link_check_waiting_consumers(void)
>>
>> Function name is misleading and hides side effects.
>>
>> I have not come up with a name that does not hide side effects, but a better
>> name would be:
>>
>>    device_link_add_supplier_links()
> 
> I kinda agree that it could afford a better name. The current name is
> too similar to device_links_check_suppliers() and I never liked that.

Naming new fields or variables related to device links looks pretty
challenging to me, because of the desire to be part of device links
and not a wart pasted on the side.  So I share the pain in trying
to find good names.

> 
> Maybe device_link_add_missing_suppliers()?

My first reaction was "yes, that sounds good".  But then I stopped and
tried to read the name out of context.  The name is not adding the
missing suppliers, it is saving the information that a supplier is
not yet available (eg, is "missing").  I struggled in coming up with
the name that I suggested.  We can keep thinking.


> 
> I don't think we need "links" repeated twice in the function name.

Yeah, I didn't like that either.


> With this suggestion, what side effect is hidden in your opinion? That
> the fully linked consumer is removed from the "waiting for suppliers"
> list?

The side effect is that the function does not merely do a check.  It also
adds missing suppliers to a list.


> 
> Maybe device_link_try_removing_from_wfs()?

I like that, other than the fact that it still does not provide a clue
that the function is potentially adding suppliers to a list.  I think
part of the challenge is that the function does two things: (1) a check,
and (2) potentially adding missing suppliers to a list.  Maybe a simple
one line comment at the call site, something like:

   /* adds missing suppliers to wfs */


> 
> I'll wait for us to agree on a better name here before I change this.
> 
>>> +{
>>> +     struct device *dev, *tmp;
>>> +
>>> +     mutex_lock(&wfs_lock);
>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
>>> +                              links.needs_suppliers)
>>> +             if (!dev->bus->add_links(dev))
>>> +                     list_del_init(&dev->links.needs_suppliers);
>>
>> Empties dev->links.needs_suppliers, but does not remove dev from
>> wait_for_suppliers list.  Where does that happen?
> 
> I'll chalk this up to you having a long day or forgetting your coffee
> :) list_del_init() does both of those things because needs_suppliers
> is the node and wait_for_suppliers is the list.

Yes, brain mis-fire on my part.  I'll have to go back and look at the
list related code again.


> 
>>
>>> +     mutex_unlock(&wfs_lock);
>>> +}
>>> +
>>>  static void device_link_free(struct device_link *link)
>>>  {
>>>       while (refcount_dec_not_one(&link->rpm_active))
>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
>>>       struct device_link *link;
>>>       int ret = 0;
>>>
>>> +     /*
>>> +      * If a device is waiting for one or more suppliers (in
>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
>>> +      * return -EPROBE_DEFER without having to check the links with existing
>>> +      * suppliers.
>>> +      */
>>
>> Change comment to:
>>
>>         /*
>>          * Device waiting for supplier to become available is not allowed
>>          * to probe
>>          */
> 
> Po-tay-to. Po-tah-to? I think my comment is just as good.

If just as good and shorter, then better.

Also the original says "it is not ready to probe".  That is not correct.  It
is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
Nit picky on my part, but tiny things like that mean I have to think harder.
I have to think "why is it not ready to probe?".  Maybe my version should have
instead been something like:

        * Device waiting for supplier to become available will return
        * -EPROBE_DEFER if probed.  Avoid the unneeded processing.

> 
>>> +     mutex_lock(&wfs_lock);
>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
>>> +             mutex_unlock(&wfs_lock);
>>> +             return -EPROBE_DEFER;
>>> +     }
>>> +     mutex_unlock(&wfs_lock);
>>> +
>>>       device_links_write_lock();
>>
>> Update Documentation/driver-api/device_link.rst to reflect the
>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
> 
> Thanks! Will do.
> 
>>
>>>
>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
>>>  {
>>>       struct device_link *link, *ln;
>>>
>>> +     mutex_lock(&wfs_lock);
>>> +     list_del(&dev->links.needs_suppliers);
>>> +     mutex_unlock(&wfs_lock);
>>> +
>>>       /*
>>>        * Delete all of the remaining links from this device to any other
>>>        * devices (either consumers or suppliers).
>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
>>>  #endif
>>>       INIT_LIST_HEAD(&dev->links.consumers);
>>>       INIT_LIST_HEAD(&dev->links.suppliers);
>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
>>>       dev->links.status = DL_DEV_NO_DRIVER;
>>>  }
>>>  EXPORT_SYMBOL_GPL(device_initialize);
>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
>>>
>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
>>
>>> +
>>> +     /*
>>> +      * Check if any of the other devices (consumers) have been waiting for
>>> +      * this device (supplier) to be added so that they can create a device
>>> +      * link to it.
>>> +      *
>>> +      * This needs to happen after device_pm_add() because device_link_add()
>>> +      * requires the supplier be registered before it's called.
>>> +      *
>>> +      * But this also needs to happe before bus_probe_device() to make sure
>>> +      * waiting consumers can link to it before the driver is bound to the
>>> +      * device and the driver sync_state callback is called for this device.
>>> +      */
>>
>>         /*
>>          * Add links to dev from any dependent consumer that has dev on it's
>>          * list of needed suppliers
> 
> There is no list of needed suppliers.

"the other devices (consumers) have been waiting for this device (supplier)".
Isn't that a list of needed suppliers?


> 
>> (links.needs_suppliers).  Device_pm_add()
>>          * must have previously registered dev to allow the links to be added.
>>          *
>>          * The consumer links must be created before dev is probed because the
>>          * sync_state callback for dev will use the consumer links.
>>          */
> 
> I think what I wrote is just as clear.

The original comment is vague.  It does not explain why consumer links must be
created before the probe.  I had to go off and read other code to determine
why that is true.

And again, brevity is better if otherwise just as clear.


> 
>>
>>> +     device_link_check_waiting_consumers();
>>> +
>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
>>> +             device_link_wait_for_supplier(dev);
>>> +
>>>       bus_probe_device(dev);
>>>       if (parent)
>>>               klist_add_tail(&dev->p->knode_parent,
>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>> index c330b75c6c57..5d70babb7462 100644
>>> --- a/include/linux/device.h
>>> +++ b/include/linux/device.h
>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
>>>   * @uevent:  Called when a device is added, removed, or a few other things
>>>   *           that generate uevents to add the environment variables.
>>
>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
>>> + *           added to this bus.  The function is expected to create device
>>> + *           links to all the suppliers of the input device that are
>>> + *           available at the time this function is called.  As in, the
>>> + *           function should NOT stop at the first failed device link if
>>> + *           other unlinked supplier devices are present in the system.
>>
>> * @add_links:   Called after a device is added to this bus.
> 
> Why are you removing the "perhaps multiple times" part? that's true
> and that's how some of the other ops are documented.

I didn't remove it.  I rephrased it with a little bit more explanation as
"If some suppliers are not yet available, this function will be
called again when the suppliers become available." (below).


> 
>>  The function is
>> *               expected to create device links to all the suppliers of the
>> *               device that are available at the time this function is called.
>> *               The function must NOT stop at the first failed device link if
>> *               other unlinked supplier devices are present in the system.
>> *               If some suppliers are not yet available, this function will be
>> *               called again when the suppliers become available.
>>
>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
> 
> Sorry, I'm not sure I understand. Can you please explain what you are
> trying to say? of_link_to_suppliers() is just one implementation of
> add_links(). The comment above is try for any bus trying to implement
> add_links().

This is conflating bus with the source of the firmware description of the
hardware topology.  For drivers that use various APIs to access firmware
description of topology that may be either devicetree or ACPI the access
is done via fwnode_operations, based on struct device.fwnode (if I recall
properly).

I failed to completely address why add_links() is not needed.  The answer
is that there should be a single function called for all buses.  Then
the proper firmware data source would be accessed via a struct fwnode_operations.

I think I left this out because I had not yet asked why this feature is
tied only to the platform bus.  Which I asked earlier in this reply.

> 
>>
>>
>>> + *
>>> + *           Return 0 if device links have been successfully created to all
>>> + *           the suppliers of this device.  Return an error if some of the
>>> + *           suppliers are not yet available and this function needs to be
>>> + *           reattempted in the future.
>>
>> *
>> *               Return 0 if device links have been successfully created to all
>> *               the suppliers of this device.  Return an error if some of the
>> *               suppliers are not yet available.
>>
>>
>>>   * @probe:   Called when a new device or driver add to this bus, and callback
>>>   *           the specific driver's probe to initial the matched device.
>>>   * @remove:  Called when a device removed from this bus.
>>> @@ -122,6 +133,7 @@ struct bus_type {
>>>
>>>       int (*match)(struct device *dev, struct device_driver *drv);
>>>       int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
>>
>>
>>> +     int (*add_links)(struct device *dev);
>>
>>               ^^^^^^^^^  add_supplier              ???
>>               ^^^^^^^^^  add_suppliers             ???
>>
>>               ^^^^^^^^^  link_suppliers            ???
>>
>>               ^^^^^^^^^  add_supplier_dependency   ???
>>               ^^^^^^^^^  add_supplier_dependencies ???
>> add_links() not needed
> 
> add_links() was an intentional decision. There's no requirement that
> the bus should only create links from this device to its suppliers. If
> the bus also knows the consumers of this device (dev), then it
> can/should add those too.

Is creating links to consumers of this device implemented in this patch
series?  If so, I overlooked it and will have to consider how that
fits in to the design.


> So, it shouldn't have "suppliers" in the
> name.
> 
>>>       int (*probe)(struct device *dev);
>>>       int (*remove)(struct device *dev);
>>>       void (*shutdown)(struct device *dev);
>>
>>
>>
>>
>>> @@ -893,11 +905,13 @@ enum dl_dev_state {
>>>   * struct dev_links_info - Device data related to device links.
>>>   * @suppliers: List of links to supplier devices.
>>>   * @consumers: List of links to consumer devices.
>>
>>> + * @needs_suppliers: Hook to global list of devices waiting for suppliers.
>>
>>     * @needs_suppliers: List of devices deferring probe until supplier drivers
>>     *                   are successfully probed.
> 
> It's "need suppliers". As in, this is a device that "needs suppliers".
> So, no, this is not a list. This is a node in a list. And all "nodes
> in a list" are documented as "Hook" in rest of places in this file. So
> I think the documentation is correct as is.

Aha, I got confused about that while trying to keep everything straight.

Somehow I managed to conflate needs_suppliers with the links between
consumers and suppliers that are create via device_link_add().

So original comment is fine.

It is getting late, so I'll continue with patches 2 and 3 tomorrow.

-Frank

> 
> Thanks for your review.
> 
> 
> 
> -Saravana
>
Saravana Kannan Aug. 20, 2019, midnight UTC | #4
On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> > On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> >>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >>>  device addition
> >>> From: Saravana Kannan <saravanak@google.com>
> >>>
> >>> When devices are added, the bus might want to create device links to track
> >>> functional dependencies between supplier and consumer devices. This
> >>> tracking of supplier-consumer relationship allows optimizing device probe
> >>> order and tracking whether all consumers of a supplier are active. The
> >>> add_links bus callback is added to support this.
> >>
> >> Change above to:
> >>
> >> When devices are added, the bus may create device links to track which
> >> suppliers a consumer device depends upon.  This
> >> tracking of supplier-consumer relationship may be used to defer probing
> >> the driver of a consumer device before the driver(s) for its supplier device(s)
> >> are probed.  It may also be used by a supplier driver to determine if
> >> all of its consumers have been successfully probed.
> >> The add_links bus callback is added to create the supplier device links
> >>
> >>>
> >>> However, when consumer devices are added, they might not have a supplier
> >>> device to link to despite needing mandatory resources/functionality from
> >>> one or more suppliers. A waiting_for_suppliers list is created to track
> >>> such consumers and retry linking them when new devices get added.
> >>
> >> Change above to:
> >>
> >> If a supplier device has not yet been created when the consumer device attempts
> >> to link it, the consumer device is added to the wait_for_suppliers list.
> >> When supplier devices are created, the supplier device link will be added to
> >> the relevant consumer devices on the wait_for_suppliers list.
> >>
> >
> > I'll take these commit text suggestions if we decide to revert the
> > entire series at the end of this review.
> >
> >>>
> >>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>> ---
> >>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
> >>>  include/linux/device.h | 14 +++++++
> >>>  2 files changed, 97 insertions(+)
> >>>
> >>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>> index da84a73f2ba6..1b4eb221968f 100644
> >>> --- a/drivers/base/core.c
> >>> +++ b/drivers/base/core.c
> >>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> >>>  #endif
> >>>
> >>>  /* Device links support. */
> >>> +static LIST_HEAD(wait_for_suppliers);
> >>> +static DEFINE_MUTEX(wfs_lock);
> >>>
> >>>  #ifdef CONFIG_SRCU
> >>>  static DEFINE_MUTEX(device_links_lock);
> >>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(device_link_add);
> >>>
> >>> +/**
> >>
> >>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
> >>
> >>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> >
>
> As a meta-comment, I found this series very hard to understand in the context
> of reading the new code for the first time.  When I read the code again in
> six months or a year or two years it will not be in near term memory and it
> will be as if I am reading it for the first time.  A lot of my suggestions
> for changes of names are in that context -- the current names may be fine
> when one has recently read the code, but not so much when trying to read
> the whole thing again with a blank mind.

Thanks for the context.

> The code also inherits a good deal of complexity because it does not stand
> alone in a nice discrete chunk, but instead delicately weaves into a more
> complex body of code.

I'll take this as a compliment :)

> When I was trying to understand the code, I wrote a lot of additional
> comments within my reply email to provide myself context, information
> about various things, and questions that I needed to answer (or if I
> could not answer to then ask you).  Then I ended up being able to remove
> many of those notes before sending the reply.
>
>
> > I intentionally chose "Mark device..." because that's a better
> > description of the semantics of the function instead of trying to
> > describe the implementation. Whether I'm using a linked list or some
> > other data structure should not be the one line documentation of a
> > function. Unless the function is explicitly about operating on that
> > specific data structure.
>
> I agree with the intent of trying to describe the semantics of a function,
> especially at the API level where other systems (or drivers) would be using
> the function.  But for this case the function is at the implementation level
> and describing explicitly what it is doing makes this much more readable for
> me.

Are you distinguishing between API level vs implementation level based
on the function being "static"/not exported? I believe the earlier
version of this series had this function as an exported API. So maybe
that's why I had it as "Mark device".

> I also find "Mark device" to be vague and not descriptive of what the
> intent is.
>
> >
> >>
> >>
> >>> + * @consumer: Consumer device
> >>> + *
> >>> + * Marks the consumer device as waiting for suppliers to become available. The
> >>> + * consumer device will never be probed until it's unmarked as waiting for
> >>> + * suppliers. The caller is responsible for adding the link to the supplier
> >>> + * once the supplier device is present.
> >>> + *
> >>> + * This function is NOT meant to be called from the probe function of the
> >>> + * consumer but rather from code that creates/adds the consumer device.
> >>> + */
> >>> +static void device_link_wait_for_supplier(struct device *consumer)
> >>> +{
> >>> +     mutex_lock(&wfs_lock);
> >>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> >>> +     mutex_unlock(&wfs_lock);
> >>> +}
> >>> +
> >>> +/**
> >>
> >>
> >>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> >>> + *
> >>> + * Loops through all consumers waiting on suppliers and tries to add all their
> >>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> >>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> >>> + *
> >>> + * The add_links bus callback is expected to return 0 if it has found and added
> >>> + * all the supplier links for the consumer device. It should return an error if
> >>> + * it isn't able to do so.
> >>> + *
> >>> + * The caller of device_link_wait_for_supplier() is expected to call this once
> >>> + * it's aware of potential suppliers becoming available.
> >>
> >> Change above comment to:
> >>
> >>     * device_link_add_supplier_links - add links from consumer devices to
> >>     *                                  supplier devices, leaving any consumer
> >>     *                                  with inactive suppliers on the
> >>     *                                  wait_for_suppliers list
> >
> > I didn't know that the first one line comment could span multiple
> > lines. Good to know.
> >
> >
> >>     * Scan all consumer devices in the devicetree.
> >
> > This function doesn't have anything to do with devicetree. I've
> > intentionally kept all OF related parts out of the driver/core because
> > I hope that other busses can start using this feature too. So I can't
> > take this bit.
>
> My comment is left over from when I was taking notes, trying to understand the
> code.
>
> At the moment, only devicetree is used as a source of the dependency information.
> The comment would better be re-phrased as:
>
>         * Scan all consumer devices in the firmware description of the hardware topology
>

Ok

> I did not ask why this feature is tied to _only_ the platform bus, but will now.

Because devicetree and platform bus the only ones I'm familiar with.
If other busses want to add this, I'd be happy to help with code
and/or direction/review. But I won't pretend to know anything about
ACPI.

> I do not know of any reason that a consumer / supplier relationship can not be
> between devices on different bus types.  Do you know of such a reason?

Yes, it's hypothetically possible. But I haven't seen such a
relationship being defined in DT. Nor somewhere else where this might
be captured. So, how common/realistic is it?

> >
> >>  For any supplier device that
> >>     * is not already linked to the consumer device, add the supplier to the
> >>     * consumer device's device links.
> >>     *
> >>     * If all of a consumer device's suppliers are available then the consumer
> >>     * is removed from the wait_for_suppliers list (if previously on the list).
> >>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
> >>     * already on the list).
> >
> > Honestly, I don't think this is any better than what I already have.
>
> Note that my version of these comments was written while I was reading the code,
> and did not have any big picture understanding yet.  This will likely also be
> the mind set of most everyone who reads this code in the future, once it is
> woven into the kernel.
>
> If you don't like the change, I can revisit it in a later version of the
> patch set.

I'll take in all the ones I feel are reasonable or don't feel strongly
about. We can revisit the rest later.

> >
> >>     * The add_links bus callback must return 0 if it has found and added all
> >>     * the supplier links for the consumer device. It must return an error if
> >>     * it is not able to do so.
> >>     *
> >>     * The caller of device_link_wait_for_supplier() is expected to call this once
> >>     * it is aware of potential suppliers becoming available.
> >>
> >>
> >>
> >>> + */
> >>> +static void device_link_check_waiting_consumers(void)
> >>
> >> Function name is misleading and hides side effects.
> >>
> >> I have not come up with a name that does not hide side effects, but a better
> >> name would be:
> >>
> >>    device_link_add_supplier_links()
> >
> > I kinda agree that it could afford a better name. The current name is
> > too similar to device_links_check_suppliers() and I never liked that.
>
> Naming new fields or variables related to device links looks pretty
> challenging to me, because of the desire to be part of device links
> and not a wart pasted on the side.  So I share the pain in trying
> to find good names.
>
> >
> > Maybe device_link_add_missing_suppliers()?
>
> My first reaction was "yes, that sounds good".  But then I stopped and
> tried to read the name out of context.  The name is not adding the
> missing suppliers, it is saving the information that a supplier is
> not yet available (eg, is "missing").  I struggled in coming up with
> the name that I suggested.  We can keep thinking.

No, this function _IS_ about adding links to suppliers. These
consumers were "saved" as "not yet having the supplier" earlier by
device_link_wait_for_supplier(). This function doesn't do that. This
function is just trying to see if those missing suppliers are present
now and if so adding a link to them from the "saved" consumers. I
think device_link_add_missing_suppliers() is actually a pretty good
name. Let me know what you think now.

>
>
> >
> > I don't think we need "links" repeated twice in the function name.
>
> Yeah, I didn't like that either.
>
>
> > With this suggestion, what side effect is hidden in your opinion? That
> > the fully linked consumer is removed from the "waiting for suppliers"
> > list?
>
> The side effect is that the function does not merely do a check.  It also
> adds missing suppliers to a list.

No, it doesn't do that. I can't keep a list of things that aren't
allocated yet :). In the whole patch series, we only keep a list of things
(consumers) that are waiting on other things (missing suppliers).

> >
> > Maybe device_link_try_removing_from_wfs()?
>
> I like that, other than the fact that it still does not provide a clue
> that the function is potentially adding suppliers to a list.

It doesn't. How would you add a supplier device to a list if the
device itself isn't there? :)

>  I think
> part of the challenge is that the function does two things: (1) a check,
> and (2) potentially adding missing suppliers to a list.  Maybe a simple
> one line comment at the call site, something like:
>
>    /* adds missing suppliers to wfs */
>
>
> >
> > I'll wait for us to agree on a better name here before I change this.
> >
> >>> +{
> >>> +     struct device *dev, *tmp;
> >>> +
> >>> +     mutex_lock(&wfs_lock);
> >>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> >>> +                              links.needs_suppliers)
> >>> +             if (!dev->bus->add_links(dev))
> >>> +                     list_del_init(&dev->links.needs_suppliers);
> >>
> >> Empties dev->links.needs_suppliers, but does not remove dev from
> >> wait_for_suppliers list.  Where does that happen?
> >
> > I'll chalk this up to you having a long day or forgetting your coffee
> > :) list_del_init() does both of those things because needs_suppliers
> > is the node and wait_for_suppliers is the list.
>
> Yes, brain mis-fire on my part.  I'll have to go back and look at the
> list related code again.
>
>
> >
> >>
> >>> +     mutex_unlock(&wfs_lock);
> >>> +}
> >>> +
> >>>  static void device_link_free(struct device_link *link)
> >>>  {
> >>>       while (refcount_dec_not_one(&link->rpm_active))
> >>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
> >>>       struct device_link *link;
> >>>       int ret = 0;
> >>>
> >>> +     /*
> >>> +      * If a device is waiting for one or more suppliers (in
> >>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
> >>> +      * return -EPROBE_DEFER without having to check the links with existing
> >>> +      * suppliers.
> >>> +      */
> >>
> >> Change comment to:
> >>
> >>         /*
> >>          * Device waiting for supplier to become available is not allowed
> >>          * to probe
> >>          */
> >
> > Po-tay-to. Po-tah-to? I think my comment is just as good.
>
> If just as good and shorter, then better.
>
> Also the original says "it is not ready to probe".  That is not correct.  It
> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
> Nit picky on my part, but tiny things like that mean I have to think harder.
> I have to think "why is it not ready to probe?".  Maybe my version should have
> instead been something like:
>
>         * Device waiting for supplier to become available will return
>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
>
> >
> >>> +     mutex_lock(&wfs_lock);
> >>> +     if (!list_empty(&dev->links.needs_suppliers)) {
> >>> +             mutex_unlock(&wfs_lock);
> >>> +             return -EPROBE_DEFER;
> >>> +     }
> >>> +     mutex_unlock(&wfs_lock);
> >>> +
> >>>       device_links_write_lock();
> >>
> >> Update Documentation/driver-api/device_link.rst to reflect the
> >> check of &dev->links.needs_suppliers in device_links_check_suppliers().
> >
> > Thanks! Will do.
> >
> >>
> >>>
> >>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
> >>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
> >>>  {
> >>>       struct device_link *link, *ln;
> >>>
> >>> +     mutex_lock(&wfs_lock);
> >>> +     list_del(&dev->links.needs_suppliers);
> >>> +     mutex_unlock(&wfs_lock);
> >>> +
> >>>       /*
> >>>        * Delete all of the remaining links from this device to any other
> >>>        * devices (either consumers or suppliers).
> >>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
> >>>  #endif
> >>>       INIT_LIST_HEAD(&dev->links.consumers);
> >>>       INIT_LIST_HEAD(&dev->links.suppliers);
> >>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
> >>>       dev->links.status = DL_DEV_NO_DRIVER;
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(device_initialize);
> >>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
> >>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
> >>>
> >>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
> >>
> >>> +
> >>> +     /*
> >>> +      * Check if any of the other devices (consumers) have been waiting for
> >>> +      * this device (supplier) to be added so that they can create a device
> >>> +      * link to it.
> >>> +      *
> >>> +      * This needs to happen after device_pm_add() because device_link_add()
> >>> +      * requires the supplier be registered before it's called.
> >>> +      *
> >>> +      * But this also needs to happe before bus_probe_device() to make sure
> >>> +      * waiting consumers can link to it before the driver is bound to the
> >>> +      * device and the driver sync_state callback is called for this device.
> >>> +      */
> >>
> >>         /*
> >>          * Add links to dev from any dependent consumer that has dev on it's
> >>          * list of needed suppliers
> >
> > There is no list of needed suppliers.
>
> "the other devices (consumers) have been waiting for this device (supplier)".
> Isn't that a list of needed suppliers?

No, that's a list of consumers that needs_suppliers.

> >
> >> (links.needs_suppliers).  Device_pm_add()
> >>          * must have previously registered dev to allow the links to be added.
> >>          *
> >>          * The consumer links must be created before dev is probed because the
> >>          * sync_state callback for dev will use the consumer links.
> >>          */
> >
> > I think what I wrote is just as clear.
>
> The original comment is vague.  It does not explain why consumer links must be
> created before the probe.  I had to go off and read other code to determine
> why that is true.
>
> And again, brevity is better if otherwise just as clear.
>
>
> >
> >>
> >>> +     device_link_check_waiting_consumers();
> >>> +
> >>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> >>> +             device_link_wait_for_supplier(dev);
> >>> +
> >>>       bus_probe_device(dev);
> >>>       if (parent)
> >>>               klist_add_tail(&dev->p->knode_parent,
> >>> diff --git a/include/linux/device.h b/include/linux/device.h
> >>> index c330b75c6c57..5d70babb7462 100644
> >>> --- a/include/linux/device.h
> >>> +++ b/include/linux/device.h
> >>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
> >>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
> >>>   * @uevent:  Called when a device is added, removed, or a few other things
> >>>   *           that generate uevents to add the environment variables.
> >>
> >>> + * @add_links:       Called, perhaps multiple times per device, after a device is
> >>> + *           added to this bus.  The function is expected to create device
> >>> + *           links to all the suppliers of the input device that are
> >>> + *           available at the time this function is called.  As in, the
> >>> + *           function should NOT stop at the first failed device link if
> >>> + *           other unlinked supplier devices are present in the system.
> >>
> >> * @add_links:   Called after a device is added to this bus.
> >
> > Why are you removing the "perhaps multiple times" part? that's true
> > and that's how some of the other ops are documented.
>
> I didn't remove it.  I rephrased it with a little bit more explanation as
> "If some suppliers are not yet available, this function will be
> called again when the suppliers become available." (below).
>
>
> >
> >>  The function is
> >> *               expected to create device links to all the suppliers of the
> >> *               device that are available at the time this function is called.
> >> *               The function must NOT stop at the first failed device link if
> >> *               other unlinked supplier devices are present in the system.
> >> *               If some suppliers are not yet available, this function will be
> >> *               called again when the suppliers become available.
> >>
> >> but add_links() not needed, so moving this comment to of_link_to_suppliers()
> >
> > Sorry, I'm not sure I understand. Can you please explain what you are
> > trying to say? of_link_to_suppliers() is just one implementation of
> > add_links(). The comment above is try for any bus trying to implement
> > add_links().
>
> This is conflating bus with the source of the firmware description of the
> hardware topology.  For drivers that use various APIs to access firmware
> description of topology that may be either devicetree or ACPI the access
> is done via fwnode_operations, based on struct device.fwnode (if I recall
> properly).
>
> I failed to completely address why add_links() is not needed.  The answer
> is that there should be a single function called for all buses.  Then
> the proper firmware data source would be accessed via a struct fwnode_operations.
>
> I think I left this out because I had not yet asked why this feature is
> tied only to the platform bus.  Which I asked earlier in this reply.

Thanks for the pointer about fwnode and fwnode_operations. I wasn't
aware of those. I see where you are going with this. I see a couple of
problems with this approach though:

1. How you interpret the properties of a fwnode is specific to the fw
type. The clocks DT property isn't going to have the same definition
in ACPI or some other firmware. Heck, I don't know if ACPI even has a
clocks like property. So have one function to parse all the FW types
doesn't make a lot of sense.

2. If this common code is implemented as part of driver/base/, then at
a minimum, I'll have to check if a fwnode is a DT node before I start
interpreting the properties of a device's fwnode. But that means I'll
have to include linux/of.h to use is_of_node(). I don't like having
driver/base code depend on OF or platform or ACPI headers.

3. The supplier info doesn't always need to come from a firmware. So I
don't want to limit it to that?

Also, I don't necessarily see this as conflating firmware (DT, ACPI,
etc) with the bus (platform bus, ACPI bus, PCI bus). Whoever creates
the device seems like the entity best suited to figure out the
suppliers of the device (apart from the driver, obviously). So the bus
deciding the suppliers doesn't seem wrong to me.

In this specific case, I'm trying to address DT for now and leaving
ACPI to whoever else wants to add device links based on ACPI data.
Most OF/DT based devices end up in platform bus. So I'm just handling
this in platform bus. If some other person wants this to work for ACPI
bus or PCI bus, they are welcome to implement add_links() for those
busses? I'm nowhere close to an expert on those.

> >
> >>
> >>
> >>> + *
> >>> + *           Return 0 if device links have been successfully created to all
> >>> + *           the suppliers of this device.  Return an error if some of the
> >>> + *           suppliers are not yet available and this function needs to be
> >>> + *           reattempted in the future.
> >>
> >> *
> >> *               Return 0 if device links have been successfully created to all
> >> *               the suppliers of this device.  Return an error if some of the
> >> *               suppliers are not yet available.
> >>
> >>
> >>>   * @probe:   Called when a new device or driver add to this bus, and callback
> >>>   *           the specific driver's probe to initial the matched device.
> >>>   * @remove:  Called when a device removed from this bus.
> >>> @@ -122,6 +133,7 @@ struct bus_type {
> >>>
> >>>       int (*match)(struct device *dev, struct device_driver *drv);
> >>>       int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
> >>
> >>
> >>> +     int (*add_links)(struct device *dev);
> >>
> >>               ^^^^^^^^^  add_supplier              ???
> >>               ^^^^^^^^^  add_suppliers             ???
> >>
> >>               ^^^^^^^^^  link_suppliers            ???
> >>
> >>               ^^^^^^^^^  add_supplier_dependency   ???
> >>               ^^^^^^^^^  add_supplier_dependencies ???
> >> add_links() not needed
> >
> > add_links() was an intentional decision. There's no requirement that
> > the bus should only create links from this device to its suppliers. If
> > the bus also knows the consumers of this device (dev), then it
> > can/should add those too.
>
> Is creating links to consumers of this device implemented in this patch
> series?  If so, I overlooked it and will have to consider how that
> fits in to the design.
>
>
> > So, it shouldn't have "suppliers" in the
> > name.
> >
> >>>       int (*probe)(struct device *dev);
> >>>       int (*remove)(struct device *dev);
> >>>       void (*shutdown)(struct device *dev);
> >>
> >>
> >>
> >>
> >>> @@ -893,11 +905,13 @@ enum dl_dev_state {
> >>>   * struct dev_links_info - Device data related to device links.
> >>>   * @suppliers: List of links to supplier devices.
> >>>   * @consumers: List of links to consumer devices.
> >>
> >>> + * @needs_suppliers: Hook to global list of devices waiting for suppliers.
> >>
> >>     * @needs_suppliers: List of devices deferring probe until supplier drivers
> >>     *                   are successfully probed.
> >
> > It's "need suppliers". As in, this is a device that "needs suppliers".
> > So, no, this is not a list. This is a node in a list. And all "nodes
> > in a list" are documented as "Hook" in rest of places in this file. So
> > I think the documentation is correct as is.
>
> Aha, I got confused about that while trying to keep everything straight.
>
> Somehow I managed to conflate needs_suppliers with the links between
> consumers and suppliers that are create via device_link_add().
>
> So original comment is fine.
>
> It is getting late, so I'll continue with patches 2 and 3 tomorrow.
>

Thanks for the review.

-Saravana
Frank Rowand Aug. 20, 2019, 4:25 a.m. UTC | #5
On 8/19/19 5:00 PM, Saravana Kannan wrote:
> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>>>>>  device addition
>>>>> From: Saravana Kannan <saravanak@google.com>
>>>>>
>>>>> When devices are added, the bus might want to create device links to track
>>>>> functional dependencies between supplier and consumer devices. This
>>>>> tracking of supplier-consumer relationship allows optimizing device probe
>>>>> order and tracking whether all consumers of a supplier are active. The
>>>>> add_links bus callback is added to support this.
>>>>
>>>> Change above to:
>>>>
>>>> When devices are added, the bus may create device links to track which
>>>> suppliers a consumer device depends upon.  This
>>>> tracking of supplier-consumer relationship may be used to defer probing
>>>> the driver of a consumer device before the driver(s) for its supplier device(s)
>>>> are probed.  It may also be used by a supplier driver to determine if
>>>> all of its consumers have been successfully probed.
>>>> The add_links bus callback is added to create the supplier device links
>>>>
>>>>>
>>>>> However, when consumer devices are added, they might not have a supplier
>>>>> device to link to despite needing mandatory resources/functionality from
>>>>> one or more suppliers. A waiting_for_suppliers list is created to track
>>>>> such consumers and retry linking them when new devices get added.
>>>>
>>>> Change above to:
>>>>
>>>> If a supplier device has not yet been created when the consumer device attempts
>>>> to link it, the consumer device is added to the wait_for_suppliers list.
>>>> When supplier devices are created, the supplier device link will be added to
>>>> the relevant consumer devices on the wait_for_suppliers list.
>>>>
>>>
>>> I'll take these commit text suggestions if we decide to revert the
>>> entire series at the end of this review.
>>>
>>>>>
>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>>>> ---
>>>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
>>>>>  include/linux/device.h | 14 +++++++
>>>>>  2 files changed, 97 insertions(+)
>>>>>
>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>> index da84a73f2ba6..1b4eb221968f 100644
>>>>> --- a/drivers/base/core.c
>>>>> +++ b/drivers/base/core.c
>>>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>>>>>  #endif
>>>>>
>>>>>  /* Device links support. */
>>>>> +static LIST_HEAD(wait_for_suppliers);
>>>>> +static DEFINE_MUTEX(wfs_lock);
>>>>>
>>>>>  #ifdef CONFIG_SRCU
>>>>>  static DEFINE_MUTEX(device_links_lock);
>>>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(device_link_add);
>>>>>
>>>>> +/**
>>>>
>>>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
>>>>
>>>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
>>>
>>
>> As a meta-comment, I found this series very hard to understand in the context
>> of reading the new code for the first time.  When I read the code again in
>> six months or a year or two years it will not be in near term memory and it
>> will be as if I am reading it for the first time.  A lot of my suggestions
>> for changes of names are in that context -- the current names may be fine
>> when one has recently read the code, but not so much when trying to read
>> the whole thing again with a blank mind.
> 
> Thanks for the context.
> 
>> The code also inherits a good deal of complexity because it does not stand
>> alone in a nice discrete chunk, but instead delicately weaves into a more
>> complex body of code.
> 
> I'll take this as a compliment :)

Please do!


> 
>> When I was trying to understand the code, I wrote a lot of additional
>> comments within my reply email to provide myself context, information
>> about various things, and questions that I needed to answer (or if I
>> could not answer to then ask you).  Then I ended up being able to remove
>> many of those notes before sending the reply.
>>
>>
>>> I intentionally chose "Mark device..." because that's a better
>>> description of the semantics of the function instead of trying to
>>> describe the implementation. Whether I'm using a linked list or some
>>> other data structure should not be the one line documentation of a
>>> function. Unless the function is explicitly about operating on that
>>> specific data structure.
>>
>> I agree with the intent of trying to describe the semantics of a function,
>> especially at the API level where other systems (or drivers) would be using
>> the function.  But for this case the function is at the implementation level
>> and describing explicitly what it is doing makes this much more readable for
>> me.
> 
> Are you distinguishing between API level vs implementation level based
> on the function being "static"/not exported? I believe the earlier

No, being static helps say a function is not API, but an function that is
not static may be intended to be used in a limited and constrained manner.
I distinguished based on the usage of the function.


> version of this series had this function as an exported API. So maybe
> that's why I had it as "Mark device".
> 
>> I also find "Mark device" to be vague and not descriptive of what the
>> intent is.
>>
>>>
>>>>
>>>>
>>>>> + * @consumer: Consumer device
>>>>> + *
>>>>> + * Marks the consumer device as waiting for suppliers to become available. The
>>>>> + * consumer device will never be probed until it's unmarked as waiting for
>>>>> + * suppliers. The caller is responsible for adding the link to the supplier
>>>>> + * once the supplier device is present.
>>>>> + *
>>>>> + * This function is NOT meant to be called from the probe function of the
>>>>> + * consumer but rather from code that creates/adds the consumer device.
>>>>> + */
>>>>> +static void device_link_wait_for_supplier(struct device *consumer)
>>>>> +{
>>>>> +     mutex_lock(&wfs_lock);
>>>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
>>>>> +     mutex_unlock(&wfs_lock);
>>>>> +}
>>>>> +
>>>>> +/**
>>>>
>>>>
>>>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
>>>>> + *
>>>>> + * Loops through all consumers waiting on suppliers and tries to add all their
>>>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
>>>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
>>>>> + *
>>>>> + * The add_links bus callback is expected to return 0 if it has found and added
>>>>> + * all the supplier links for the consumer device. It should return an error if
>>>>> + * it isn't able to do so.
>>>>> + *
>>>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
>>>>> + * it's aware of potential suppliers becoming available.
>>>>
>>>> Change above comment to:
>>>>
>>>>     * device_link_add_supplier_links - add links from consumer devices to
>>>>     *                                  supplier devices, leaving any consumer
>>>>     *                                  with inactive suppliers on the
>>>>     *                                  wait_for_suppliers list
>>>
>>> I didn't know that the first one line comment could span multiple
>>> lines. Good to know.
>>>
>>>
>>>>     * Scan all consumer devices in the devicetree.
>>>
>>> This function doesn't have anything to do with devicetree. I've
>>> intentionally kept all OF related parts out of the driver/core because
>>> I hope that other busses can start using this feature too. So I can't
>>> take this bit.
>>
>> My comment is left over from when I was taking notes, trying to understand the
>> code.
>>
>> At the moment, only devicetree is used as a source of the dependency information.
>> The comment would better be re-phrased as:
>>
>>         * Scan all consumer devices in the firmware description of the hardware topology
>>
> 
> Ok
> 
>> I did not ask why this feature is tied to _only_ the platform bus, but will now.
> 
> Because devicetree and platform bus the only ones I'm familiar with.
> If other busses want to add this, I'd be happy to help with code
> and/or direction/review. But I won't pretend to know anything about
> ACPI.

Sorry, you don't get to ignore other buses because you are not familiar
with them.

I am not aware of any reason to exclude devices that on other buses and your
answer below does not provide a valid technical reason why the new feature is
correct when it excludes all other buses.

> 
>> I do not know of any reason that a consumer / supplier relationship can not be
>> between devices on different bus types.  Do you know of such a reason?
> 
> Yes, it's hypothetically possible. But I haven't seen such a
> relationship being defined in DT. Nor somewhere else where this might
> be captured. So, how common/realistic is it?

It is entirely legal.  I have no idea how common it is but that is not a valid
reason to exclude other buses from the feature.

> 
>>>
>>>>  For any supplier device that
>>>>     * is not already linked to the consumer device, add the supplier to the
>>>>     * consumer device's device links.
>>>>     *
>>>>     * If all of a consumer device's suppliers are available then the consumer
>>>>     * is removed from the wait_for_suppliers list (if previously on the list).
>>>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
>>>>     * already on the list).
>>>
>>> Honestly, I don't think this is any better than what I already have.
>>
>> Note that my version of these comments was written while I was reading the code,
>> and did not have any big picture understanding yet.  This will likely also be
>> the mind set of most everyone who reads this code in the future, once it is
>> woven into the kernel.
>>
>> If you don't like the change, I can revisit it in a later version of the
>> patch set.
> 
> I'll take in all the ones I feel are reasonable or don't feel strongly
> about. We can revisit the rest later.
> 
>>>
>>>>     * The add_links bus callback must return 0 if it has found and added all
>>>>     * the supplier links for the consumer device. It must return an error if
>>>>     * it is not able to do so.
>>>>     *
>>>>     * The caller of device_link_wait_for_supplier() is expected to call this once
>>>>     * it is aware of potential suppliers becoming available.
>>>>
>>>>
>>>>
>>>>> + */
>>>>> +static void device_link_check_waiting_consumers(void)
>>>>
>>>> Function name is misleading and hides side effects.
>>>>
>>>> I have not come up with a name that does not hide side effects, but a better
>>>> name would be:
>>>>
>>>>    device_link_add_supplier_links()
>>>
>>> I kinda agree that it could afford a better name. The current name is
>>> too similar to device_links_check_suppliers() and I never liked that.
>>
>> Naming new fields or variables related to device links looks pretty
>> challenging to me, because of the desire to be part of device links
>> and not a wart pasted on the side.  So I share the pain in trying
>> to find good names.
>>
>>>
>>> Maybe device_link_add_missing_suppliers()?
>>
>> My first reaction was "yes, that sounds good".  But then I stopped and
>> tried to read the name out of context.  The name is not adding the
>> missing suppliers, it is saving the information that a supplier is
>> not yet available (eg, is "missing").  I struggled in coming up with

Reading what you say below, and looking at the code again, what I say
in that sentence is backwards.  It is not adding the missing supplier
device links, it is instead adding existing supplier device inks.


>> the name that I suggested.  We can keep thinking.
> 
> No, this function _IS_ about adding links to suppliers. These

You are mis-reading what I wrote.  I said the function "is not adding
the missing suppliers".  You are converting that to "is not adding
links to the missing suppliers".

My suggested name was hinting "add_supplier_links", which is what you
say it does below.  The name you suggest is hinting "add_missing_suppliers".
Do you see the difference?


> consumers were "saved" as "not yet having the supplier" earlier by
> device_link_wait_for_supplier(). This function doesn't do that. This
> function is just trying to see if those missing suppliers are present
> now and if so adding a link to them from the "saved" consumers. I
> think device_link_add_missing_suppliers() is actually a pretty good
> name. Let me know what you think now.
> 
>>
>>
>>>
>>> I don't think we need "links" repeated twice in the function name.
>>
>> Yeah, I didn't like that either.
>>
>>
>>> With this suggestion, what side effect is hidden in your opinion? That
>>> the fully linked consumer is removed from the "waiting for suppliers"
>>> list?
>>
>> The side effect is that the function does not merely do a check.  It also
>> adds missing suppliers to a list.
> 
> No, it doesn't do that. I can't keep a list of things that aren't
> allocated yet :). In the whole patch series, we only keep a list of things
> (consumers) that are waiting on other things (missing suppliers).

OK, as I noted above, I stated that backwards.  It is adding links for
existing suppliers, not for the missing suppliers.

> 
>>>
>>> Maybe device_link_try_removing_from_wfs()?
>>
>> I like that, other than the fact that it still does not provide a clue
>> that the function is potentially adding suppliers to a list.
> 
> It doesn't. How would you add a supplier device to a list if the
> device itself isn't there? :)

Again, that should be existing suppliers, as you noted.  But the point stands
that the function is potentially adding links.


> 
>>  I think
>> part of the challenge is that the function does two things: (1) a check,
>> and (2) potentially adding missing suppliers to a list.  Maybe a simple
>> one line comment at the call site, something like:
>>
>>    /* adds missing suppliers to wfs */
>>
>>
>>>
>>> I'll wait for us to agree on a better name here before I change this.
>>>
>>>>> +{
>>>>> +     struct device *dev, *tmp;
>>>>> +
>>>>> +     mutex_lock(&wfs_lock);
>>>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
>>>>> +                              links.needs_suppliers)
>>>>> +             if (!dev->bus->add_links(dev))
>>>>> +                     list_del_init(&dev->links.needs_suppliers);
>>>>
>>>> Empties dev->links.needs_suppliers, but does not remove dev from
>>>> wait_for_suppliers list.  Where does that happen?
>>>
>>> I'll chalk this up to you having a long day or forgetting your coffee
>>> :) list_del_init() does both of those things because needs_suppliers
>>> is the node and wait_for_suppliers is the list.
>>
>> Yes, brain mis-fire on my part.  I'll have to go back and look at the
>> list related code again.
>>
>>
>>>
>>>>
>>>>> +     mutex_unlock(&wfs_lock);
>>>>> +}
>>>>> +
>>>>>  static void device_link_free(struct device_link *link)
>>>>>  {
>>>>>       while (refcount_dec_not_one(&link->rpm_active))
>>>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
>>>>>       struct device_link *link;
>>>>>       int ret = 0;
>>>>>
>>>>> +     /*
>>>>> +      * If a device is waiting for one or more suppliers (in
>>>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
>>>>> +      * return -EPROBE_DEFER without having to check the links with existing
>>>>> +      * suppliers.
>>>>> +      */
>>>>
>>>> Change comment to:
>>>>
>>>>         /*
>>>>          * Device waiting for supplier to become available is not allowed
>>>>          * to probe
>>>>          */
>>>
>>> Po-tay-to. Po-tah-to? I think my comment is just as good.
>>
>> If just as good and shorter, then better.
>>
>> Also the original says "it is not ready to probe".  That is not correct.  It
>> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
>> Nit picky on my part, but tiny things like that mean I have to think harder.
>> I have to think "why is it not ready to probe?".  Maybe my version should have
>> instead been something like:
>>
>>         * Device waiting for supplier to become available will return
>>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
>>
>>>
>>>>> +     mutex_lock(&wfs_lock);
>>>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
>>>>> +             mutex_unlock(&wfs_lock);
>>>>> +             return -EPROBE_DEFER;
>>>>> +     }
>>>>> +     mutex_unlock(&wfs_lock);
>>>>> +
>>>>>       device_links_write_lock();
>>>>
>>>> Update Documentation/driver-api/device_link.rst to reflect the
>>>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
>>>
>>> Thanks! Will do.
>>>
>>>>
>>>>>
>>>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
>>>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
>>>>>  {
>>>>>       struct device_link *link, *ln;
>>>>>
>>>>> +     mutex_lock(&wfs_lock);
>>>>> +     list_del(&dev->links.needs_suppliers);
>>>>> +     mutex_unlock(&wfs_lock);
>>>>> +
>>>>>       /*
>>>>>        * Delete all of the remaining links from this device to any other
>>>>>        * devices (either consumers or suppliers).
>>>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
>>>>>  #endif
>>>>>       INIT_LIST_HEAD(&dev->links.consumers);
>>>>>       INIT_LIST_HEAD(&dev->links.suppliers);
>>>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
>>>>>       dev->links.status = DL_DEV_NO_DRIVER;
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(device_initialize);
>>>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
>>>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
>>>>>
>>>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
>>>>
>>>>> +
>>>>> +     /*
>>>>> +      * Check if any of the other devices (consumers) have been waiting for
>>>>> +      * this device (supplier) to be added so that they can create a device
>>>>> +      * link to it.
>>>>> +      *
>>>>> +      * This needs to happen after device_pm_add() because device_link_add()
>>>>> +      * requires the supplier be registered before it's called.
>>>>> +      *
>>>>> +      * But this also needs to happe before bus_probe_device() to make sure
>>>>> +      * waiting consumers can link to it before the driver is bound to the
>>>>> +      * device and the driver sync_state callback is called for this device.
>>>>> +      */
>>>>
>>>>         /*
>>>>          * Add links to dev from any dependent consumer that has dev on it's
>>>>          * list of needed suppliers
>>>
>>> There is no list of needed suppliers.
>>
>> "the other devices (consumers) have been waiting for this device (supplier)".
>> Isn't that a list of needed suppliers?
> 
> No, that's a list of consumers that needs_suppliers.
> 
>>>
>>>> (links.needs_suppliers).  Device_pm_add()
>>>>          * must have previously registered dev to allow the links to be added.
>>>>          *
>>>>          * The consumer links must be created before dev is probed because the
>>>>          * sync_state callback for dev will use the consumer links.
>>>>          */
>>>
>>> I think what I wrote is just as clear.
>>
>> The original comment is vague.  It does not explain why consumer links must be
>> created before the probe.  I had to go off and read other code to determine
>> why that is true.
>>
>> And again, brevity is better if otherwise just as clear.
>>
>>
>>>
>>>>
>>>>> +     device_link_check_waiting_consumers();
>>>>> +
>>>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
>>>>> +             device_link_wait_for_supplier(dev);
>>>>> +
>>>>>       bus_probe_device(dev);
>>>>>       if (parent)
>>>>>               klist_add_tail(&dev->p->knode_parent,
>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>>> index c330b75c6c57..5d70babb7462 100644
>>>>> --- a/include/linux/device.h
>>>>> +++ b/include/linux/device.h
>>>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
>>>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
>>>>>   * @uevent:  Called when a device is added, removed, or a few other things
>>>>>   *           that generate uevents to add the environment variables.
>>>>
>>>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
>>>>> + *           added to this bus.  The function is expected to create device
>>>>> + *           links to all the suppliers of the input device that are
>>>>> + *           available at the time this function is called.  As in, the
>>>>> + *           function should NOT stop at the first failed device link if
>>>>> + *           other unlinked supplier devices are present in the system.
>>>>
>>>> * @add_links:   Called after a device is added to this bus.
>>>
>>> Why are you removing the "perhaps multiple times" part? that's true
>>> and that's how some of the other ops are documented.
>>
>> I didn't remove it.  I rephrased it with a little bit more explanation as
>> "If some suppliers are not yet available, this function will be
>> called again when the suppliers become available." (below).
>>
>>
>>>
>>>>  The function is
>>>> *               expected to create device links to all the suppliers of the
>>>> *               device that are available at the time this function is called.
>>>> *               The function must NOT stop at the first failed device link if
>>>> *               other unlinked supplier devices are present in the system.
>>>> *               If some suppliers are not yet available, this function will be
>>>> *               called again when the suppliers become available.
>>>>
>>>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
>>>
>>> Sorry, I'm not sure I understand. Can you please explain what you are
>>> trying to say? of_link_to_suppliers() is just one implementation of
>>> add_links(). The comment above is try for any bus trying to implement
>>> add_links().
>>
>> This is conflating bus with the source of the firmware description of the
>> hardware topology.  For drivers that use various APIs to access firmware
>> description of topology that may be either devicetree or ACPI the access
>> is done via fwnode_operations, based on struct device.fwnode (if I recall
>> properly).
>>
>> I failed to completely address why add_links() is not needed.  The answer
>> is that there should be a single function called for all buses.  Then
>> the proper firmware data source would be accessed via a struct fwnode_operations.
>>
>> I think I left this out because I had not yet asked why this feature is
>> tied only to the platform bus.  Which I asked earlier in this reply.
> 
> Thanks for the pointer about fwnode and fwnode_operations. I wasn't
> aware of those. I see where you are going with this. I see a couple of
> problems with this approach though:
> 
> 1. How you interpret the properties of a fwnode is specific to the fw
> type. The clocks DT property isn't going to have the same definition
> in ACPI or some other firmware. Heck, I don't know if ACPI even has a
> clocks like property. So have one function to parse all the FW types
> doesn't make a lot of sense.

The functions in fwnode_operations are specific to the proper firmware.
So there is a set of functions in a struct fwnode_operations for
devicetree that only know about devicetree.  And there is a different
variable of type fwnode_operations that is initialized with ACPI
specific functions.

> 
> 2. If this common code is implemented as part of driver/base/, then at
> a minimum, I'll have to check if a fwnode is a DT node before I start
> interpreting the properties of a device's fwnode. But that means I'll
> have to include linux/of.h to use is_of_node(). I don't like having
> driver/base code depend on OF or platform or ACPI headers.

You just use the function in the device's fwnode_operations (I think,
I would have to go look at the precise way the code works because it
has been quite a while since I've looked at it).

> 
> 3. The supplier info doesn't always need to come from a firmware. So I
> don't want to limit it to that?

If you can find another source of topology info, then I would expect
that another set of fwnode_operations functions would be created
for the info source.


> 
> Also, I don't necessarily see this as conflating firmware (DT, ACPI,
> etc) with the bus (platform bus, ACPI bus, PCI bus). Whoever creates
> the device seems like the entity best suited to figure out the
> suppliers of the device (apart from the driver, obviously). So the bus
> deciding the suppliers doesn't seem wrong to me.

Patch 3 assigns the devicetree add_links function to the platform bus.
It seems incorrect to me for of_platform_default_populate_init() to be
changing a field in platform_bus_type.


   of_platform_default_populate_init()
           ...
           platform_bus_type.add_links = of_link_to_suppliers;


> 
> In this specific case, I'm trying to address DT for now and leaving
> ACPI to whoever else wants to add device links based on ACPI data.
> Most OF/DT based devices end up in platform bus. So I'm just handling
> this in platform bus. If some other person wants this to work for ACPI
> bus or PCI bus, they are welcome to implement add_links() for those
> busses? I'm nowhere close to an expert on those.

Devicetree is not limited to the platform bus.


> 
>>>
>>>>
>>>>
>>>>> + *
>>>>> + *           Return 0 if device links have been successfully created to all
>>>>> + *           the suppliers of this device.  Return an error if some of the
>>>>> + *           suppliers are not yet available and this function needs to be
>>>>> + *           reattempted in the future.
>>>>
>>>> *
>>>> *               Return 0 if device links have been successfully created to all
>>>> *               the suppliers of this device.  Return an error if some of the
>>>> *               suppliers are not yet available.
>>>>
>>>>
>>>>>   * @probe:   Called when a new device or driver add to this bus, and callback
>>>>>   *           the specific driver's probe to initial the matched device.
>>>>>   * @remove:  Called when a device removed from this bus.
>>>>> @@ -122,6 +133,7 @@ struct bus_type {
>>>>>
>>>>>       int (*match)(struct device *dev, struct device_driver *drv);
>>>>>       int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
>>>>
>>>>
>>>>> +     int (*add_links)(struct device *dev);
>>>>
>>>>               ^^^^^^^^^  add_supplier              ???
>>>>               ^^^^^^^^^  add_suppliers             ???
>>>>
>>>>               ^^^^^^^^^  link_suppliers            ???
>>>>
>>>>               ^^^^^^^^^  add_supplier_dependency   ???
>>>>               ^^^^^^^^^  add_supplier_dependencies ???
>>>> add_links() not needed
>>>
>>> add_links() was an intentional decision. There's no requirement that
>>> the bus should only create links from this device to its suppliers. If
>>> the bus also knows the consumers of this device (dev), then it
>>> can/should add those too.
>>
>> Is creating links to consumers of this device implemented in this patch
>> series?  If so, I overlooked it and will have to consider how that
>> fits in to the design.
>>
>>
>>> So, it shouldn't have "suppliers" in the
>>> name.
>>>
>>>>>       int (*probe)(struct device *dev);
>>>>>       int (*remove)(struct device *dev);
>>>>>       void (*shutdown)(struct device *dev);
>>>>
>>>>
>>>>
>>>>
>>>>> @@ -893,11 +905,13 @@ enum dl_dev_state {
>>>>>   * struct dev_links_info - Device data related to device links.
>>>>>   * @suppliers: List of links to supplier devices.
>>>>>   * @consumers: List of links to consumer devices.
>>>>
>>>>> + * @needs_suppliers: Hook to global list of devices waiting for suppliers.
>>>>
>>>>     * @needs_suppliers: List of devices deferring probe until supplier drivers
>>>>     *                   are successfully probed.
>>>
>>> It's "need suppliers". As in, this is a device that "needs suppliers".
>>> So, no, this is not a list. This is a node in a list. And all "nodes
>>> in a list" are documented as "Hook" in rest of places in this file. So
>>> I think the documentation is correct as is.
>>
>> Aha, I got confused about that while trying to keep everything straight.
>>
>> Somehow I managed to conflate needs_suppliers with the links between
>> consumers and suppliers that are create via device_link_add().
>>
>> So original comment is fine.
>>
>> It is getting late, so I'll continue with patches 2 and 3 tomorrow.
>>
> 
> Thanks for the review.
> 
> -Saravana
>
Saravana Kannan Aug. 20, 2019, 10:10 p.m. UTC | #6
On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> > On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> >>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >>>>>  device addition
> >>>>> From: Saravana Kannan <saravanak@google.com>
> >>>>>
> >>>>> When devices are added, the bus might want to create device links to track
> >>>>> functional dependencies between supplier and consumer devices. This
> >>>>> tracking of supplier-consumer relationship allows optimizing device probe
> >>>>> order and tracking whether all consumers of a supplier are active. The
> >>>>> add_links bus callback is added to support this.
> >>>>
> >>>> Change above to:
> >>>>
> >>>> When devices are added, the bus may create device links to track which
> >>>> suppliers a consumer device depends upon.  This
> >>>> tracking of supplier-consumer relationship may be used to defer probing
> >>>> the driver of a consumer device before the driver(s) for its supplier device(s)
> >>>> are probed.  It may also be used by a supplier driver to determine if
> >>>> all of its consumers have been successfully probed.
> >>>> The add_links bus callback is added to create the supplier device links
> >>>>
> >>>>>
> >>>>> However, when consumer devices are added, they might not have a supplier
> >>>>> device to link to despite needing mandatory resources/functionality from
> >>>>> one or more suppliers. A waiting_for_suppliers list is created to track
> >>>>> such consumers and retry linking them when new devices get added.
> >>>>
> >>>> Change above to:
> >>>>
> >>>> If a supplier device has not yet been created when the consumer device attempts
> >>>> to link it, the consumer device is added to the wait_for_suppliers list.
> >>>> When supplier devices are created, the supplier device link will be added to
> >>>> the relevant consumer devices on the wait_for_suppliers list.
> >>>>
> >>>
> >>> I'll take these commit text suggestions if we decide to revert the
> >>> entire series at the end of this review.
> >>>
> >>>>>
> >>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>>>> ---
> >>>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
> >>>>>  include/linux/device.h | 14 +++++++
> >>>>>  2 files changed, 97 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>>>> index da84a73f2ba6..1b4eb221968f 100644
> >>>>> --- a/drivers/base/core.c
> >>>>> +++ b/drivers/base/core.c
> >>>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> >>>>>  #endif
> >>>>>
> >>>>>  /* Device links support. */
> >>>>> +static LIST_HEAD(wait_for_suppliers);
> >>>>> +static DEFINE_MUTEX(wfs_lock);
> >>>>>
> >>>>>  #ifdef CONFIG_SRCU
> >>>>>  static DEFINE_MUTEX(device_links_lock);
> >>>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
> >>>>>  }
> >>>>>  EXPORT_SYMBOL_GPL(device_link_add);
> >>>>>
> >>>>> +/**
> >>>>
> >>>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
> >>>>
> >>>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> >>>
> >>
> >> As a meta-comment, I found this series very hard to understand in the context
> >> of reading the new code for the first time.  When I read the code again in
> >> six months or a year or two years it will not be in near term memory and it
> >> will be as if I am reading it for the first time.  A lot of my suggestions
> >> for changes of names are in that context -- the current names may be fine
> >> when one has recently read the code, but not so much when trying to read
> >> the whole thing again with a blank mind.
> >
> > Thanks for the context.
> >
> >> The code also inherits a good deal of complexity because it does not stand
> >> alone in a nice discrete chunk, but instead delicately weaves into a more
> >> complex body of code.
> >
> > I'll take this as a compliment :)
>
> Please do!
>
>
> >
> >> When I was trying to understand the code, I wrote a lot of additional
> >> comments within my reply email to provide myself context, information
> >> about various things, and questions that I needed to answer (or if I
> >> could not answer to then ask you).  Then I ended up being able to remove
> >> many of those notes before sending the reply.
> >>
> >>
> >>> I intentionally chose "Mark device..." because that's a better
> >>> description of the semantics of the function instead of trying to
> >>> describe the implementation. Whether I'm using a linked list or some
> >>> other data structure should not be the one line documentation of a
> >>> function. Unless the function is explicitly about operating on that
> >>> specific data structure.
> >>
> >> I agree with the intent of trying to describe the semantics of a function,
> >> especially at the API level where other systems (or drivers) would be using
> >> the function.  But for this case the function is at the implementation level
> >> and describing explicitly what it is doing makes this much more readable for
> >> me.
> >
> > Are you distinguishing between API level vs implementation level based
> > on the function being "static"/not exported? I believe the earlier
>
> No, being static helps say a function is not API, but an function that is
> not static may be intended to be used in a limited and constrained manner.
> I distinguished based on the usage of the function.
>
>
> > version of this series had this function as an exported API. So maybe
> > that's why I had it as "Mark device".
> >
> >> I also find "Mark device" to be vague and not descriptive of what the
> >> intent is.
> >>
> >>>
> >>>>
> >>>>
> >>>>> + * @consumer: Consumer device
> >>>>> + *
> >>>>> + * Marks the consumer device as waiting for suppliers to become available. The
> >>>>> + * consumer device will never be probed until it's unmarked as waiting for
> >>>>> + * suppliers. The caller is responsible for adding the link to the supplier
> >>>>> + * once the supplier device is present.
> >>>>> + *
> >>>>> + * This function is NOT meant to be called from the probe function of the
> >>>>> + * consumer but rather from code that creates/adds the consumer device.
> >>>>> + */
> >>>>> +static void device_link_wait_for_supplier(struct device *consumer)
> >>>>> +{
> >>>>> +     mutex_lock(&wfs_lock);
> >>>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> >>>>> +     mutex_unlock(&wfs_lock);
> >>>>> +}
> >>>>> +
> >>>>> +/**
> >>>>
> >>>>
> >>>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> >>>>> + *
> >>>>> + * Loops through all consumers waiting on suppliers and tries to add all their
> >>>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> >>>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> >>>>> + *
> >>>>> + * The add_links bus callback is expected to return 0 if it has found and added
> >>>>> + * all the supplier links for the consumer device. It should return an error if
> >>>>> + * it isn't able to do so.
> >>>>> + *
> >>>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
> >>>>> + * it's aware of potential suppliers becoming available.
> >>>>
> >>>> Change above comment to:
> >>>>
> >>>>     * device_link_add_supplier_links - add links from consumer devices to
> >>>>     *                                  supplier devices, leaving any consumer
> >>>>     *                                  with inactive suppliers on the
> >>>>     *                                  wait_for_suppliers list
> >>>
> >>> I didn't know that the first one line comment could span multiple
> >>> lines. Good to know.
> >>>
> >>>
> >>>>     * Scan all consumer devices in the devicetree.
> >>>
> >>> This function doesn't have anything to do with devicetree. I've
> >>> intentionally kept all OF related parts out of the driver/core because
> >>> I hope that other busses can start using this feature too. So I can't
> >>> take this bit.
> >>
> >> My comment is left over from when I was taking notes, trying to understand the
> >> code.
> >>
> >> At the moment, only devicetree is used as a source of the dependency information.
> >> The comment would better be re-phrased as:
> >>
> >>         * Scan all consumer devices in the firmware description of the hardware topology
> >>
> >
> > Ok
> >
> >> I did not ask why this feature is tied to _only_ the platform bus, but will now.
> >
> > Because devicetree and platform bus the only ones I'm familiar with.
> > If other busses want to add this, I'd be happy to help with code
> > and/or direction/review. But I won't pretend to know anything about
> > ACPI.
>
> Sorry, you don't get to ignore other buses because you are not familiar
> with them.

It's important that I don't design out other buses -- which I don't.
But why would you want someone who has no idea of ACPI to write code
for it? It's a futile effort that's going to be rejected by people who
know ACPI anyway.

> I am not aware of any reason to exclude devices that on other buses and your
> answer below does not provide a valid technical reason why the new feature is
> correct when it excludes all other buses.
> >
> >> I do not know of any reason that a consumer / supplier relationship can not be
> >> between devices on different bus types.  Do you know of such a reason?
> >
> > Yes, it's hypothetically possible. But I haven't seen such a
> > relationship being defined in DT. Nor somewhere else where this might
> > be captured. So, how common/realistic is it?
>
> It is entirely legal.  I have no idea how common it is but that is not a valid
> reason to exclude other buses from the feature.

I'm not going to write code for a hypothetical hardware scenario. Find
one supported in upstream, show me that it'll benefit from this series
and tell me how to interpret the dependency graph and then we'll talk
about writing code for that.

> >>>
> >>>>  For any supplier device that
> >>>>     * is not already linked to the consumer device, add the supplier to the
> >>>>     * consumer device's device links.
> >>>>     *
> >>>>     * If all of a consumer device's suppliers are available then the consumer
> >>>>     * is removed from the wait_for_suppliers list (if previously on the list).
> >>>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
> >>>>     * already on the list).
> >>>
> >>> Honestly, I don't think this is any better than what I already have.
> >>
> >> Note that my version of these comments was written while I was reading the code,
> >> and did not have any big picture understanding yet.  This will likely also be
> >> the mind set of most everyone who reads this code in the future, once it is
> >> woven into the kernel.
> >>
> >> If you don't like the change, I can revisit it in a later version of the
> >> patch set.
> >
> > I'll take in all the ones I feel are reasonable or don't feel strongly
> > about. We can revisit the rest later.
> >
> >>>
> >>>>     * The add_links bus callback must return 0 if it has found and added all
> >>>>     * the supplier links for the consumer device. It must return an error if
> >>>>     * it is not able to do so.
> >>>>     *
> >>>>     * The caller of device_link_wait_for_supplier() is expected to call this once
> >>>>     * it is aware of potential suppliers becoming available.
> >>>>
> >>>>
> >>>>
> >>>>> + */
> >>>>> +static void device_link_check_waiting_consumers(void)
> >>>>
> >>>> Function name is misleading and hides side effects.
> >>>>
> >>>> I have not come up with a name that does not hide side effects, but a better
> >>>> name would be:
> >>>>
> >>>>    device_link_add_supplier_links()
> >>>
> >>> I kinda agree that it could afford a better name. The current name is
> >>> too similar to device_links_check_suppliers() and I never liked that.
> >>
> >> Naming new fields or variables related to device links looks pretty
> >> challenging to me, because of the desire to be part of device links
> >> and not a wart pasted on the side.  So I share the pain in trying
> >> to find good names.
> >>
> >>>
> >>> Maybe device_link_add_missing_suppliers()?
> >>
> >> My first reaction was "yes, that sounds good".  But then I stopped and
> >> tried to read the name out of context.  The name is not adding the
> >> missing suppliers, it is saving the information that a supplier is
> >> not yet available (eg, is "missing").  I struggled in coming up with
>
> Reading what you say below, and looking at the code again, what I say
> in that sentence is backwards.  It is not adding the missing supplier
> device links, it is instead adding existing supplier device inks.
>
>
> >> the name that I suggested.  We can keep thinking.
> >
> > No, this function _IS_ about adding links to suppliers. These
>
> You are mis-reading what I wrote.  I said the function "is not adding
> the missing suppliers".  You are converting that to "is not adding
> links to the missing suppliers".
>
> My suggested name was hinting "add_supplier_links", which is what you
> say it does below.  The name you suggest is hinting "add_missing_suppliers".
> Do you see the difference?

Yeah, which is why I said earlier that I didn't want to repeat "links"
twice in a function name. As in
device_links_add_missing_supplier_links() has too many "links". In the
context of device_links_, "add missing suppliers" means "add missing
supplier links". Anyway, I think we can come back to figuring out a
good name once we agree on the more important discussions further
below.

> > consumers were "saved" as "not yet having the supplier" earlier by
> > device_link_wait_for_supplier(). This function doesn't do that. This
> > function is just trying to see if those missing suppliers are present
> > now and if so adding a link to them from the "saved" consumers. I
> > think device_link_add_missing_suppliers() is actually a pretty good
> > name. Let me know what you think now.
> >
> >>
> >>
> >>>
> >>> I don't think we need "links" repeated twice in the function name.
> >>
> >> Yeah, I didn't like that either.
> >>
> >>
> >>> With this suggestion, what side effect is hidden in your opinion? That
> >>> the fully linked consumer is removed from the "waiting for suppliers"
> >>> list?
> >>
> >> The side effect is that the function does not merely do a check.  It also
> >> adds missing suppliers to a list.
> >
> > No, it doesn't do that. I can't keep a list of things that aren't
> > allocated yet :). In the whole patch series, we only keep a list of things
> > (consumers) that are waiting on other things (missing suppliers).
>
> OK, as I noted above, I stated that backwards.  It is adding links for
> existing suppliers, not for the missing suppliers.
>
> >
> >>>
> >>> Maybe device_link_try_removing_from_wfs()?
> >>
> >> I like that, other than the fact that it still does not provide a clue
> >> that the function is potentially adding suppliers to a list.
> >
> > It doesn't. How would you add a supplier device to a list if the
> > device itself isn't there? :)
>
> Again, that should be existing suppliers, as you noted.  But the point stands
> that the function is potentially adding links.
>
>
> >
> >>  I think
> >> part of the challenge is that the function does two things: (1) a check,
> >> and (2) potentially adding missing suppliers to a list.  Maybe a simple
> >> one line comment at the call site, something like:
> >>
> >>    /* adds missing suppliers to wfs */
> >>
> >>
> >>>
> >>> I'll wait for us to agree on a better name here before I change this.
> >>>
> >>>>> +{
> >>>>> +     struct device *dev, *tmp;
> >>>>> +
> >>>>> +     mutex_lock(&wfs_lock);
> >>>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> >>>>> +                              links.needs_suppliers)
> >>>>> +             if (!dev->bus->add_links(dev))
> >>>>> +                     list_del_init(&dev->links.needs_suppliers);
> >>>>
> >>>> Empties dev->links.needs_suppliers, but does not remove dev from
> >>>> wait_for_suppliers list.  Where does that happen?
> >>>
> >>> I'll chalk this up to you having a long day or forgetting your coffee
> >>> :) list_del_init() does both of those things because needs_suppliers
> >>> is the node and wait_for_suppliers is the list.
> >>
> >> Yes, brain mis-fire on my part.  I'll have to go back and look at the
> >> list related code again.
> >>
> >>
> >>>
> >>>>
> >>>>> +     mutex_unlock(&wfs_lock);
> >>>>> +}
> >>>>> +
> >>>>>  static void device_link_free(struct device_link *link)
> >>>>>  {
> >>>>>       while (refcount_dec_not_one(&link->rpm_active))
> >>>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
> >>>>>       struct device_link *link;
> >>>>>       int ret = 0;
> >>>>>
> >>>>> +     /*
> >>>>> +      * If a device is waiting for one or more suppliers (in
> >>>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
> >>>>> +      * return -EPROBE_DEFER without having to check the links with existing
> >>>>> +      * suppliers.
> >>>>> +      */
> >>>>
> >>>> Change comment to:
> >>>>
> >>>>         /*
> >>>>          * Device waiting for supplier to become available is not allowed
> >>>>          * to probe
> >>>>          */
> >>>
> >>> Po-tay-to. Po-tah-to? I think my comment is just as good.
> >>
> >> If just as good and shorter, then better.
> >>
> >> Also the original says "it is not ready to probe".  That is not correct.  It
> >> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
> >> Nit picky on my part, but tiny things like that mean I have to think harder.
> >> I have to think "why is it not ready to probe?".  Maybe my version should have
> >> instead been something like:
> >>
> >>         * Device waiting for supplier to become available will return
> >>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
> >>
> >>>
> >>>>> +     mutex_lock(&wfs_lock);
> >>>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
> >>>>> +             mutex_unlock(&wfs_lock);
> >>>>> +             return -EPROBE_DEFER;
> >>>>> +     }
> >>>>> +     mutex_unlock(&wfs_lock);
> >>>>> +
> >>>>>       device_links_write_lock();
> >>>>
> >>>> Update Documentation/driver-api/device_link.rst to reflect the
> >>>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
> >>>
> >>> Thanks! Will do.
> >>>
> >>>>
> >>>>>
> >>>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
> >>>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
> >>>>>  {
> >>>>>       struct device_link *link, *ln;
> >>>>>
> >>>>> +     mutex_lock(&wfs_lock);
> >>>>> +     list_del(&dev->links.needs_suppliers);
> >>>>> +     mutex_unlock(&wfs_lock);
> >>>>> +
> >>>>>       /*
> >>>>>        * Delete all of the remaining links from this device to any other
> >>>>>        * devices (either consumers or suppliers).
> >>>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
> >>>>>  #endif
> >>>>>       INIT_LIST_HEAD(&dev->links.consumers);
> >>>>>       INIT_LIST_HEAD(&dev->links.suppliers);
> >>>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
> >>>>>       dev->links.status = DL_DEV_NO_DRIVER;
> >>>>>  }
> >>>>>  EXPORT_SYMBOL_GPL(device_initialize);
> >>>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
> >>>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
> >>>>>
> >>>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
> >>>>
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Check if any of the other devices (consumers) have been waiting for
> >>>>> +      * this device (supplier) to be added so that they can create a device
> >>>>> +      * link to it.
> >>>>> +      *
> >>>>> +      * This needs to happen after device_pm_add() because device_link_add()
> >>>>> +      * requires the supplier be registered before it's called.
> >>>>> +      *
> >>>>> +      * But this also needs to happe before bus_probe_device() to make sure
> >>>>> +      * waiting consumers can link to it before the driver is bound to the
> >>>>> +      * device and the driver sync_state callback is called for this device.
> >>>>> +      */
> >>>>
> >>>>         /*
> >>>>          * Add links to dev from any dependent consumer that has dev on it's
> >>>>          * list of needed suppliers
> >>>
> >>> There is no list of needed suppliers.
> >>
> >> "the other devices (consumers) have been waiting for this device (supplier)".
> >> Isn't that a list of needed suppliers?
> >
> > No, that's a list of consumers that needs_suppliers.
> >
> >>>
> >>>> (links.needs_suppliers).  Device_pm_add()
> >>>>          * must have previously registered dev to allow the links to be added.
> >>>>          *
> >>>>          * The consumer links must be created before dev is probed because the
> >>>>          * sync_state callback for dev will use the consumer links.
> >>>>          */
> >>>
> >>> I think what I wrote is just as clear.
> >>
> >> The original comment is vague.  It does not explain why consumer links must be
> >> created before the probe.  I had to go off and read other code to determine
> >> why that is true.
> >>
> >> And again, brevity is better if otherwise just as clear.
> >>
> >>
> >>>
> >>>>
> >>>>> +     device_link_check_waiting_consumers();
> >>>>> +
> >>>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> >>>>> +             device_link_wait_for_supplier(dev);
> >>>>> +
> >>>>>       bus_probe_device(dev);
> >>>>>       if (parent)
> >>>>>               klist_add_tail(&dev->p->knode_parent,
> >>>>> diff --git a/include/linux/device.h b/include/linux/device.h
> >>>>> index c330b75c6c57..5d70babb7462 100644
> >>>>> --- a/include/linux/device.h
> >>>>> +++ b/include/linux/device.h
> >>>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
> >>>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
> >>>>>   * @uevent:  Called when a device is added, removed, or a few other things
> >>>>>   *           that generate uevents to add the environment variables.
> >>>>
> >>>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
> >>>>> + *           added to this bus.  The function is expected to create device
> >>>>> + *           links to all the suppliers of the input device that are
> >>>>> + *           available at the time this function is called.  As in, the
> >>>>> + *           function should NOT stop at the first failed device link if
> >>>>> + *           other unlinked supplier devices are present in the system.
> >>>>
> >>>> * @add_links:   Called after a device is added to this bus.
> >>>
> >>> Why are you removing the "perhaps multiple times" part? that's true
> >>> and that's how some of the other ops are documented.
> >>
> >> I didn't remove it.  I rephrased it with a little bit more explanation as
> >> "If some suppliers are not yet available, this function will be
> >> called again when the suppliers become available." (below).
> >>
> >>
> >>>
> >>>>  The function is
> >>>> *               expected to create device links to all the suppliers of the
> >>>> *               device that are available at the time this function is called.
> >>>> *               The function must NOT stop at the first failed device link if
> >>>> *               other unlinked supplier devices are present in the system.
> >>>> *               If some suppliers are not yet available, this function will be
> >>>> *               called again when the suppliers become available.
> >>>>
> >>>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
> >>>
> >>> Sorry, I'm not sure I understand. Can you please explain what you are
> >>> trying to say? of_link_to_suppliers() is just one implementation of
> >>> add_links(). The comment above is try for any bus trying to implement
> >>> add_links().
> >>
> >> This is conflating bus with the source of the firmware description of the
> >> hardware topology.  For drivers that use various APIs to access firmware
> >> description of topology that may be either devicetree or ACPI the access
> >> is done via fwnode_operations, based on struct device.fwnode (if I recall
> >> properly).
> >>
> >> I failed to completely address why add_links() is not needed.  The answer
> >> is that there should be a single function called for all buses.  Then
> >> the proper firmware data source would be accessed via a struct fwnode_operations.
> >>
> >> I think I left this out because I had not yet asked why this feature is
> >> tied only to the platform bus.  Which I asked earlier in this reply.
> >
> > Thanks for the pointer about fwnode and fwnode_operations. I wasn't
> > aware of those. I see where you are going with this. I see a couple of
> > problems with this approach though:
> >
> > 1. How you interpret the properties of a fwnode is specific to the fw
> > type. The clocks DT property isn't going to have the same definition
> > in ACPI or some other firmware. Heck, I don't know if ACPI even has a
> > clocks like property. So have one function to parse all the FW types
> > doesn't make a lot of sense.
>
> The functions in fwnode_operations are specific to the proper firmware.
> So there is a set of functions in a struct fwnode_operations for
> devicetree that only know about devicetree.  And there is a different
> variable of type fwnode_operations that is initialized with ACPI
> specific functions.

Yes, I understand how ops work :) So I have one ops (fwnode ops) to
call that will read a property from DT or ACPI depending on where that
specific device's firmware is from. But that's not my point here.

My point is that clock bindings in DT are under a "clocks" property
that lists references (phandles) to the supplier. But in ACPI, the
property might be called "clk" and could list references to actual
clock IDs. So, you can't have one piece of code that works for all
firmware even if I have one ops that can read properties from any
firmware.

I'll still have to know what type the underlying firmware is before I
try to interpret the properties. So having one function that parses DT
and ACPI and whatever else would be a terrible and unnecessary design.

> > 2. If this common code is implemented as part of driver/base/, then at
> > a minimum, I'll have to check if a fwnode is a DT node before I start
> > interpreting the properties of a device's fwnode. But that means I'll
> > have to include linux/of.h to use is_of_node(). I don't like having
> > driver/base code depend on OF or platform or ACPI headers.
>
> You just use the function in the device's fwnode_operations (I think,
> I would have to go look at the precise way the code works because it
> has been quite a while since I've looked at it).

Because you missed my point in (1) you are missing my point in (2).
I'll wait for your updated reply.

> >
> > 3. The supplier info doesn't always need to come from a firmware. So I
> > don't want to limit it to that?
>
> If you can find another source of topology info, then I would expect
> that another set of fwnode_operations functions would be created
> for the info source.

The other source could just be C files in the kernel. Using fwnodes
for that would be hacky. But let's sort (1) and (2) out first.

> >
> > Also, I don't necessarily see this as conflating firmware (DT, ACPI,
> > etc) with the bus (platform bus, ACPI bus, PCI bus). Whoever creates
> > the device seems like the entity best suited to figure out the
> > suppliers of the device (apart from the driver, obviously). So the bus
> > deciding the suppliers doesn't seem wrong to me.
>
> Patch 3 assigns the devicetree add_links function to the platform bus.
> It seems incorrect to me for of_platform_default_populate_init() to be
> changing a field in platform_bus_type.
>
>
>    of_platform_default_populate_init()
>            ...
>            platform_bus_type.add_links = of_link_to_suppliers;
>

I didn't want to have platform bus include OF header files.

> >
> > In this specific case, I'm trying to address DT for now and leaving
> > ACPI to whoever else wants to add device links based on ACPI data.
> > Most OF/DT based devices end up in platform bus. So I'm just handling
> > this in platform bus. If some other person wants this to work for ACPI
> > bus or PCI bus, they are welcome to implement add_links() for those
> > busses? I'm nowhere close to an expert on those.
>
> Devicetree is not limited to the platform bus.

I know. That's why I said "most". PCI seems to have some DT support too.

Thanks,
Saravana
Frank Rowand Aug. 21, 2019, 1:06 a.m. UTC | #7
On 8/20/19 3:10 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 8/19/19 5:00 PM, Saravana Kannan wrote:
>>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>>>
>>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
>>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>>>>>>>  device addition
>>>>>>> From: Saravana Kannan <saravanak@google.com>
>>>>>>>
>>>>>>> When devices are added, the bus might want to create device links to track
>>>>>>> functional dependencies between supplier and consumer devices. This
>>>>>>> tracking of supplier-consumer relationship allows optimizing device probe
>>>>>>> order and tracking whether all consumers of a supplier are active. The
>>>>>>> add_links bus callback is added to support this.
>>>>>>
>>>>>> Change above to:
>>>>>>
>>>>>> When devices are added, the bus may create device links to track which
>>>>>> suppliers a consumer device depends upon.  This
>>>>>> tracking of supplier-consumer relationship may be used to defer probing
>>>>>> the driver of a consumer device before the driver(s) for its supplier device(s)
>>>>>> are probed.  It may also be used by a supplier driver to determine if
>>>>>> all of its consumers have been successfully probed.
>>>>>> The add_links bus callback is added to create the supplier device links
>>>>>>
>>>>>>>
>>>>>>> However, when consumer devices are added, they might not have a supplier
>>>>>>> device to link to despite needing mandatory resources/functionality from
>>>>>>> one or more suppliers. A waiting_for_suppliers list is created to track
>>>>>>> such consumers and retry linking them when new devices get added.
>>>>>>
>>>>>> Change above to:
>>>>>>
>>>>>> If a supplier device has not yet been created when the consumer device attempts
>>>>>> to link it, the consumer device is added to the wait_for_suppliers list.
>>>>>> When supplier devices are created, the supplier device link will be added to
>>>>>> the relevant consumer devices on the wait_for_suppliers list.
>>>>>>
>>>>>
>>>>> I'll take these commit text suggestions if we decide to revert the
>>>>> entire series at the end of this review.
>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>>>>>> ---
>>>>>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  include/linux/device.h | 14 +++++++
>>>>>>>  2 files changed, 97 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>>>> index da84a73f2ba6..1b4eb221968f 100644
>>>>>>> --- a/drivers/base/core.c
>>>>>>> +++ b/drivers/base/core.c
>>>>>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>>>>>>>  #endif
>>>>>>>
>>>>>>>  /* Device links support. */
>>>>>>> +static LIST_HEAD(wait_for_suppliers);
>>>>>>> +static DEFINE_MUTEX(wfs_lock);
>>>>>>>
>>>>>>>  #ifdef CONFIG_SRCU
>>>>>>>  static DEFINE_MUTEX(device_links_lock);
>>>>>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL_GPL(device_link_add);
>>>>>>>
>>>>>>> +/**
>>>>>>
>>>>>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
>>>>>>
>>>>>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
>>>>>
>>>>
>>>> As a meta-comment, I found this series very hard to understand in the context
>>>> of reading the new code for the first time.  When I read the code again in
>>>> six months or a year or two years it will not be in near term memory and it
>>>> will be as if I am reading it for the first time.  A lot of my suggestions
>>>> for changes of names are in that context -- the current names may be fine
>>>> when one has recently read the code, but not so much when trying to read
>>>> the whole thing again with a blank mind.
>>>
>>> Thanks for the context.
>>>
>>>> The code also inherits a good deal of complexity because it does not stand
>>>> alone in a nice discrete chunk, but instead delicately weaves into a more
>>>> complex body of code.
>>>
>>> I'll take this as a compliment :)
>>
>> Please do!
>>
>>
>>>
>>>> When I was trying to understand the code, I wrote a lot of additional
>>>> comments within my reply email to provide myself context, information
>>>> about various things, and questions that I needed to answer (or if I
>>>> could not answer to then ask you).  Then I ended up being able to remove
>>>> many of those notes before sending the reply.
>>>>
>>>>
>>>>> I intentionally chose "Mark device..." because that's a better
>>>>> description of the semantics of the function instead of trying to
>>>>> describe the implementation. Whether I'm using a linked list or some
>>>>> other data structure should not be the one line documentation of a
>>>>> function. Unless the function is explicitly about operating on that
>>>>> specific data structure.
>>>>
>>>> I agree with the intent of trying to describe the semantics of a function,
>>>> especially at the API level where other systems (or drivers) would be using
>>>> the function.  But for this case the function is at the implementation level
>>>> and describing explicitly what it is doing makes this much more readable for
>>>> me.
>>>
>>> Are you distinguishing between API level vs implementation level based
>>> on the function being "static"/not exported? I believe the earlier
>>
>> No, being static helps say a function is not API, but an function that is
>> not static may be intended to be used in a limited and constrained manner.
>> I distinguished based on the usage of the function.
>>
>>
>>> version of this series had this function as an exported API. So maybe
>>> that's why I had it as "Mark device".
>>>
>>>> I also find "Mark device" to be vague and not descriptive of what the
>>>> intent is.
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> + * @consumer: Consumer device
>>>>>>> + *
>>>>>>> + * Marks the consumer device as waiting for suppliers to become available. The
>>>>>>> + * consumer device will never be probed until it's unmarked as waiting for
>>>>>>> + * suppliers. The caller is responsible for adding the link to the supplier
>>>>>>> + * once the supplier device is present.
>>>>>>> + *
>>>>>>> + * This function is NOT meant to be called from the probe function of the
>>>>>>> + * consumer but rather from code that creates/adds the consumer device.
>>>>>>> + */
>>>>>>> +static void device_link_wait_for_supplier(struct device *consumer)
>>>>>>> +{
>>>>>>> +     mutex_lock(&wfs_lock);
>>>>>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
>>>>>>> +     mutex_unlock(&wfs_lock);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>
>>>>>>
>>>>>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
>>>>>>> + *
>>>>>>> + * Loops through all consumers waiting on suppliers and tries to add all their
>>>>>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
>>>>>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
>>>>>>> + *
>>>>>>> + * The add_links bus callback is expected to return 0 if it has found and added
>>>>>>> + * all the supplier links for the consumer device. It should return an error if
>>>>>>> + * it isn't able to do so.
>>>>>>> + *
>>>>>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
>>>>>>> + * it's aware of potential suppliers becoming available.
>>>>>>
>>>>>> Change above comment to:
>>>>>>
>>>>>>     * device_link_add_supplier_links - add links from consumer devices to
>>>>>>     *                                  supplier devices, leaving any consumer
>>>>>>     *                                  with inactive suppliers on the
>>>>>>     *                                  wait_for_suppliers list
>>>>>
>>>>> I didn't know that the first one line comment could span multiple
>>>>> lines. Good to know.
>>>>>
>>>>>
>>>>>>     * Scan all consumer devices in the devicetree.
>>>>>
>>>>> This function doesn't have anything to do with devicetree. I've
>>>>> intentionally kept all OF related parts out of the driver/core because
>>>>> I hope that other busses can start using this feature too. So I can't
>>>>> take this bit.
>>>>
>>>> My comment is left over from when I was taking notes, trying to understand the
>>>> code.
>>>>
>>>> At the moment, only devicetree is used as a source of the dependency information.
>>>> The comment would better be re-phrased as:
>>>>
>>>>         * Scan all consumer devices in the firmware description of the hardware topology
>>>>
>>>
>>> Ok
>>>
>>>> I did not ask why this feature is tied to _only_ the platform bus, but will now.
>>>
>>> Because devicetree and platform bus the only ones I'm familiar with.
>>> If other busses want to add this, I'd be happy to help with code
>>> and/or direction/review. But I won't pretend to know anything about
>>> ACPI.
>>
>> Sorry, you don't get to ignore other buses because you are not familiar
>> with them.
> 
> It's important that I don't design out other buses -- which I don't.
> But why would you want someone who has no idea of ACPI to write code
> for it? It's a futile effort that's going to be rejected by people who
> know ACPI anyway.

ACPI is not a bus.

Devicetree is not a bus.

A devicetree can contain multiple buses in the topology that is described.


> 
>> I am not aware of any reason to exclude devices that on other buses and your
>> answer below does not provide a valid technical reason why the new feature is
>> correct when it excludes all other buses.
>>>
>>>> I do not know of any reason that a consumer / supplier relationship can not be
>>>> between devices on different bus types.  Do you know of such a reason?
>>>
>>> Yes, it's hypothetically possible. But I haven't seen such a
>>> relationship being defined in DT. Nor somewhere else where this might
>>> be captured. So, how common/realistic is it?
>>
>> It is entirely legal.  I have no idea how common it is but that is not a valid
>> reason to exclude other buses from the feature.
> 
> I'm not going to write code for a hypothetical hardware scenario. Find
> one supported in upstream, show me that it'll benefit from this series
> and tell me how to interpret the dependency graph and then we'll talk
> about writing code for that.

You don't get to implement a general feature in a way that only supports
a subset of potential devicetree users.  Note the word "general".  This is
not a small isolated feature.

Now, am I being inconsistent if I say that it is ok for the feature to
only support devicetree systems, or only support ACPI systems?  I'll
have to ponder that.

But I don't think the question of only platform buses or all buses needs
to be resolved because I don't think that the add_links function is a bus
specific function.  The add_links function is specific to devicetree or
ACPI.

We seem to be talking past each other on this point right now.  I don't now
how to get our minds to the same place, but let's keep trying.


> 
>>>>>
>>>>>>  For any supplier device that
>>>>>>     * is not already linked to the consumer device, add the supplier to the
>>>>>>     * consumer device's device links.
>>>>>>     *
>>>>>>     * If all of a consumer device's suppliers are available then the consumer
>>>>>>     * is removed from the wait_for_suppliers list (if previously on the list).
>>>>>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
>>>>>>     * already on the list).
>>>>>
>>>>> Honestly, I don't think this is any better than what I already have.
>>>>
>>>> Note that my version of these comments was written while I was reading the code,
>>>> and did not have any big picture understanding yet.  This will likely also be
>>>> the mind set of most everyone who reads this code in the future, once it is
>>>> woven into the kernel.
>>>>
>>>> If you don't like the change, I can revisit it in a later version of the
>>>> patch set.
>>>
>>> I'll take in all the ones I feel are reasonable or don't feel strongly
>>> about. We can revisit the rest later.
>>>
>>>>>
>>>>>>     * The add_links bus callback must return 0 if it has found and added all
>>>>>>     * the supplier links for the consumer device. It must return an error if
>>>>>>     * it is not able to do so.
>>>>>>     *
>>>>>>     * The caller of device_link_wait_for_supplier() is expected to call this once
>>>>>>     * it is aware of potential suppliers becoming available.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> + */
>>>>>>> +static void device_link_check_waiting_consumers(void)
>>>>>>
>>>>>> Function name is misleading and hides side effects.
>>>>>>
>>>>>> I have not come up with a name that does not hide side effects, but a better
>>>>>> name would be:
>>>>>>
>>>>>>    device_link_add_supplier_links()
>>>>>
>>>>> I kinda agree that it could afford a better name. The current name is
>>>>> too similar to device_links_check_suppliers() and I never liked that.
>>>>
>>>> Naming new fields or variables related to device links looks pretty
>>>> challenging to me, because of the desire to be part of device links
>>>> and not a wart pasted on the side.  So I share the pain in trying
>>>> to find good names.
>>>>
>>>>>
>>>>> Maybe device_link_add_missing_suppliers()?
>>>>
>>>> My first reaction was "yes, that sounds good".  But then I stopped and
>>>> tried to read the name out of context.  The name is not adding the
>>>> missing suppliers, it is saving the information that a supplier is
>>>> not yet available (eg, is "missing").  I struggled in coming up with
>>
>> Reading what you say below, and looking at the code again, what I say
>> in that sentence is backwards.  It is not adding the missing supplier
>> device links, it is instead adding existing supplier device inks.
>>
>>
>>>> the name that I suggested.  We can keep thinking.
>>>
>>> No, this function _IS_ about adding links to suppliers. These
>>
>> You are mis-reading what I wrote.  I said the function "is not adding
>> the missing suppliers".  You are converting that to "is not adding
>> links to the missing suppliers".
>>
>> My suggested name was hinting "add_supplier_links", which is what you
>> say it does below.  The name you suggest is hinting "add_missing_suppliers".
>> Do you see the difference?
> 
> Yeah, which is why I said earlier that I didn't want to repeat "links"
> twice in a function name. As in
> device_links_add_missing_supplier_links() has too many "links". In the
> context of device_links_, "add missing suppliers" means "add missing
> supplier links". Anyway, I think we can come back to figuring out a
> good name once we agree on the more important discussions further
> below.

Yes, later is fine.  This is a detail.

> 
>>> consumers were "saved" as "not yet having the supplier" earlier by
>>> device_link_wait_for_supplier(). This function doesn't do that. This
>>> function is just trying to see if those missing suppliers are present
>>> now and if so adding a link to them from the "saved" consumers. I
>>> think device_link_add_missing_suppliers() is actually a pretty good
>>> name. Let me know what you think now.
>>>
>>>>
>>>>
>>>>>
>>>>> I don't think we need "links" repeated twice in the function name.
>>>>
>>>> Yeah, I didn't like that either.
>>>>
>>>>
>>>>> With this suggestion, what side effect is hidden in your opinion? That
>>>>> the fully linked consumer is removed from the "waiting for suppliers"
>>>>> list?
>>>>
>>>> The side effect is that the function does not merely do a check.  It also
>>>> adds missing suppliers to a list.
>>>
>>> No, it doesn't do that. I can't keep a list of things that aren't
>>> allocated yet :). In the whole patch series, we only keep a list of things
>>> (consumers) that are waiting on other things (missing suppliers).
>>
>> OK, as I noted above, I stated that backwards.  It is adding links for
>> existing suppliers, not for the missing suppliers.
>>
>>>
>>>>>
>>>>> Maybe device_link_try_removing_from_wfs()?
>>>>
>>>> I like that, other than the fact that it still does not provide a clue
>>>> that the function is potentially adding suppliers to a list.
>>>
>>> It doesn't. How would you add a supplier device to a list if the
>>> device itself isn't there? :)
>>
>> Again, that should be existing suppliers, as you noted.  But the point stands
>> that the function is potentially adding links.
>>
>>
>>>
>>>>  I think
>>>> part of the challenge is that the function does two things: (1) a check,
>>>> and (2) potentially adding missing suppliers to a list.  Maybe a simple
>>>> one line comment at the call site, something like:
>>>>
>>>>    /* adds missing suppliers to wfs */
>>>>
>>>>
>>>>>
>>>>> I'll wait for us to agree on a better name here before I change this.
>>>>>
>>>>>>> +{
>>>>>>> +     struct device *dev, *tmp;
>>>>>>> +
>>>>>>> +     mutex_lock(&wfs_lock);
>>>>>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
>>>>>>> +                              links.needs_suppliers)
>>>>>>> +             if (!dev->bus->add_links(dev))
>>>>>>> +                     list_del_init(&dev->links.needs_suppliers);
>>>>>>
>>>>>> Empties dev->links.needs_suppliers, but does not remove dev from
>>>>>> wait_for_suppliers list.  Where does that happen?
>>>>>
>>>>> I'll chalk this up to you having a long day or forgetting your coffee
>>>>> :) list_del_init() does both of those things because needs_suppliers
>>>>> is the node and wait_for_suppliers is the list.
>>>>
>>>> Yes, brain mis-fire on my part.  I'll have to go back and look at the
>>>> list related code again.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> +     mutex_unlock(&wfs_lock);
>>>>>>> +}
>>>>>>> +
>>>>>>>  static void device_link_free(struct device_link *link)
>>>>>>>  {
>>>>>>>       while (refcount_dec_not_one(&link->rpm_active))
>>>>>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
>>>>>>>       struct device_link *link;
>>>>>>>       int ret = 0;
>>>>>>>
>>>>>>> +     /*
>>>>>>> +      * If a device is waiting for one or more suppliers (in
>>>>>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
>>>>>>> +      * return -EPROBE_DEFER without having to check the links with existing
>>>>>>> +      * suppliers.
>>>>>>> +      */
>>>>>>
>>>>>> Change comment to:
>>>>>>
>>>>>>         /*
>>>>>>          * Device waiting for supplier to become available is not allowed
>>>>>>          * to probe
>>>>>>          */
>>>>>
>>>>> Po-tay-to. Po-tah-to? I think my comment is just as good.
>>>>
>>>> If just as good and shorter, then better.
>>>>
>>>> Also the original says "it is not ready to probe".  That is not correct.  It
>>>> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
>>>> Nit picky on my part, but tiny things like that mean I have to think harder.
>>>> I have to think "why is it not ready to probe?".  Maybe my version should have
>>>> instead been something like:
>>>>
>>>>         * Device waiting for supplier to become available will return
>>>>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
>>>>
>>>>>
>>>>>>> +     mutex_lock(&wfs_lock);
>>>>>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
>>>>>>> +             mutex_unlock(&wfs_lock);
>>>>>>> +             return -EPROBE_DEFER;
>>>>>>> +     }
>>>>>>> +     mutex_unlock(&wfs_lock);
>>>>>>> +
>>>>>>>       device_links_write_lock();
>>>>>>
>>>>>> Update Documentation/driver-api/device_link.rst to reflect the
>>>>>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
>>>>>
>>>>> Thanks! Will do.
>>>>>
>>>>>>
>>>>>>>
>>>>>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
>>>>>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
>>>>>>>  {
>>>>>>>       struct device_link *link, *ln;
>>>>>>>
>>>>>>> +     mutex_lock(&wfs_lock);
>>>>>>> +     list_del(&dev->links.needs_suppliers);
>>>>>>> +     mutex_unlock(&wfs_lock);
>>>>>>> +
>>>>>>>       /*
>>>>>>>        * Delete all of the remaining links from this device to any other
>>>>>>>        * devices (either consumers or suppliers).
>>>>>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
>>>>>>>  #endif
>>>>>>>       INIT_LIST_HEAD(&dev->links.consumers);
>>>>>>>       INIT_LIST_HEAD(&dev->links.suppliers);
>>>>>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
>>>>>>>       dev->links.status = DL_DEV_NO_DRIVER;
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL_GPL(device_initialize);
>>>>>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
>>>>>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
>>>>>>>
>>>>>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
>>>>>>
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Check if any of the other devices (consumers) have been waiting for
>>>>>>> +      * this device (supplier) to be added so that they can create a device
>>>>>>> +      * link to it.
>>>>>>> +      *
>>>>>>> +      * This needs to happen after device_pm_add() because device_link_add()
>>>>>>> +      * requires the supplier be registered before it's called.
>>>>>>> +      *
>>>>>>> +      * But this also needs to happe before bus_probe_device() to make sure
>>>>>>> +      * waiting consumers can link to it before the driver is bound to the
>>>>>>> +      * device and the driver sync_state callback is called for this device.
>>>>>>> +      */
>>>>>>
>>>>>>         /*
>>>>>>          * Add links to dev from any dependent consumer that has dev on it's
>>>>>>          * list of needed suppliers
>>>>>
>>>>> There is no list of needed suppliers.
>>>>
>>>> "the other devices (consumers) have been waiting for this device (supplier)".
>>>> Isn't that a list of needed suppliers?
>>>
>>> No, that's a list of consumers that needs_suppliers.
>>>
>>>>>
>>>>>> (links.needs_suppliers).  Device_pm_add()
>>>>>>          * must have previously registered dev to allow the links to be added.
>>>>>>          *
>>>>>>          * The consumer links must be created before dev is probed because the
>>>>>>          * sync_state callback for dev will use the consumer links.
>>>>>>          */
>>>>>
>>>>> I think what I wrote is just as clear.
>>>>
>>>> The original comment is vague.  It does not explain why consumer links must be
>>>> created before the probe.  I had to go off and read other code to determine
>>>> why that is true.
>>>>
>>>> And again, brevity is better if otherwise just as clear.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> +     device_link_check_waiting_consumers();
>>>>>>> +
>>>>>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
>>>>>>> +             device_link_wait_for_supplier(dev);
>>>>>>> +
>>>>>>>       bus_probe_device(dev);
>>>>>>>       if (parent)
>>>>>>>               klist_add_tail(&dev->p->knode_parent,
>>>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>>>>> index c330b75c6c57..5d70babb7462 100644
>>>>>>> --- a/include/linux/device.h
>>>>>>> +++ b/include/linux/device.h
>>>>>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
>>>>>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
>>>>>>>   * @uevent:  Called when a device is added, removed, or a few other things
>>>>>>>   *           that generate uevents to add the environment variables.
>>>>>>
>>>>>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
>>>>>>> + *           added to this bus.  The function is expected to create device
>>>>>>> + *           links to all the suppliers of the input device that are
>>>>>>> + *           available at the time this function is called.  As in, the
>>>>>>> + *           function should NOT stop at the first failed device link if
>>>>>>> + *           other unlinked supplier devices are present in the system.
>>>>>>
>>>>>> * @add_links:   Called after a device is added to this bus.
>>>>>
>>>>> Why are you removing the "perhaps multiple times" part? that's true
>>>>> and that's how some of the other ops are documented.
>>>>
>>>> I didn't remove it.  I rephrased it with a little bit more explanation as
>>>> "If some suppliers are not yet available, this function will be
>>>> called again when the suppliers become available." (below).
>>>>
>>>>
>>>>>
>>>>>>  The function is
>>>>>> *               expected to create device links to all the suppliers of the
>>>>>> *               device that are available at the time this function is called.
>>>>>> *               The function must NOT stop at the first failed device link if
>>>>>> *               other unlinked supplier devices are present in the system.
>>>>>> *               If some suppliers are not yet available, this function will be
>>>>>> *               called again when the suppliers become available.
>>>>>>
>>>>>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
>>>>>
>>>>> Sorry, I'm not sure I understand. Can you please explain what you are
>>>>> trying to say? of_link_to_suppliers() is just one implementation of
>>>>> add_links(). The comment above is try for any bus trying to implement
>>>>> add_links().
>>>>
>>>> This is conflating bus with the source of the firmware description of the
>>>> hardware topology.  For drivers that use various APIs to access firmware
>>>> description of topology that may be either devicetree or ACPI the access
>>>> is done via fwnode_operations, based on struct device.fwnode (if I recall
>>>> properly).
>>>>
>>>> I failed to completely address why add_links() is not needed.  The answer
>>>> is that there should be a single function called for all buses.  Then
>>>> the proper firmware data source would be accessed via a struct fwnode_operations.
>>>>
>>>> I think I left this out because I had not yet asked why this feature is
>>>> tied only to the platform bus.  Which I asked earlier in this reply.
>>>
>>> Thanks for the pointer about fwnode and fwnode_operations. I wasn't
>>> aware of those. I see where you are going with this. I see a couple of
>>> problems with this approach though:
>>>
>>> 1. How you interpret the properties of a fwnode is specific to the fw
>>> type. The clocks DT property isn't going to have the same definition
>>> in ACPI or some other firmware. Heck, I don't know if ACPI even has a
>>> clocks like property. So have one function to parse all the FW types
>>> doesn't make a lot of sense.
>>
>> The functions in fwnode_operations are specific to the proper firmware.
>> So there is a set of functions in a struct fwnode_operations for
>> devicetree that only know about devicetree.  And there is a different
>> variable of type fwnode_operations that is initialized with ACPI
>> specific functions.
> 
> Yes, I understand how ops work :) So I have one ops (fwnode ops) to
> call that will read a property from DT or ACPI depending on where that
> specific device's firmware is from. But that's not my point here.
> 
> My point is that clock bindings in DT are under a "clocks" property
> that lists references (phandles) to the supplier. But in ACPI, the
> property might be called "clk" and could list references to actual
> clock IDs. So, you can't have one piece of code that works for all
> firmware even if I have one ops that can read properties from any
> firmware.
> 
> I'll still have to know what type the underlying firmware is before I
> try to interpret the properties. So having one function that parses DT
> and ACPI and whatever else would be a terrible and unnecessary design.

You have already implemented the devicetree function, which is
of_link_to_suppliers().  The devicetree fwnode_operations would have
a pointer to of_link_to_suppliers().

If ACPI support is added, there would be an analogous ACPI aware function
that would essentially do the same thing that of_link_to_suppliers()
does.  This would be in the ACPI version of fwnode_operations.

There would not be a single function that is both devicetree aware and
ACPI aware.


> 
>>> 2. If this common code is implemented as part of driver/base/, then at
>>> a minimum, I'll have to check if a fwnode is a DT node before I start
>>> interpreting the properties of a device's fwnode. But that means I'll
>>> have to include linux/of.h to use is_of_node(). I don't like having
>>> driver/base code depend on OF or platform or ACPI headers.
>>
>> You just use the function in the device's fwnode_operations (I think,
>> I would have to go look at the precise way the code works because it
>> has been quite a while since I've looked at it).
> 
> Because you missed my point in (1) you are missing my point in (2).
> I'll wait for your updated reply.

We are still talking at cross purposes.  If my reply to (1) does not
change that, I'll have to go dig into how the fwnode framework figures
out which set of fwnode_operations to use for each device.


> 
>>>
>>> 3. The supplier info doesn't always need to come from a firmware. So I
>>> don't want to limit it to that?
>>
>> If you can find another source of topology info, then I would expect
>> that another set of fwnode_operations functions would be created
>> for the info source.
> 
> The other source could just be C files in the kernel. Using fwnodes
> for that would be hacky. But let's sort (1) and (2) out first.

I suspected that might be the other source.  But we have been trying
to deprecate this type of data.

> 
>>>
>>> Also, I don't necessarily see this as conflating firmware (DT, ACPI,
>>> etc) with the bus (platform bus, ACPI bus, PCI bus). Whoever creates
>>> the device seems like the entity best suited to figure out the
>>> suppliers of the device (apart from the driver, obviously). So the bus
>>> deciding the suppliers doesn't seem wrong to me.
>>
>> Patch 3 assigns the devicetree add_links function to the platform bus.
>> It seems incorrect to me for of_platform_default_populate_init() to be
>> changing a field in platform_bus_type.
>>
>>
>>    of_platform_default_populate_init()
>>            ...
>>            platform_bus_type.add_links = of_link_to_suppliers;
>>
> 
> I didn't want to have platform bus include OF header files.
Yet another clue that the function pointer should not belong to the bus.


> 
>>>
>>> In this specific case, I'm trying to address DT for now and leaving
>>> ACPI to whoever else wants to add device links based on ACPI data.
>>> Most OF/DT based devices end up in platform bus. So I'm just handling
>>> this in platform bus. If some other person wants this to work for ACPI
>>> bus or PCI bus, they are welcome to implement add_links() for those
>>> busses? I'm nowhere close to an expert on those.
>>
>> Devicetree is not limited to the platform bus.
> 
> I know. That's why I said "most". PCI seems to have some DT support too.
> 
> Thanks,
> Saravana
> 

-Frank
Greg KH Aug. 21, 2019, 1:56 a.m. UTC | #8
On Tue, Aug 20, 2019 at 06:06:55PM -0700, Frank Rowand wrote:
> On 8/20/19 3:10 PM, Saravana Kannan wrote:
> > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>>>
> >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >>>>>>>  device addition
> >>>>>>> From: Saravana Kannan <saravanak@google.com>

This is a "fun" thread :(

You two should get together in person this week and talk.  I think you
both will be at ELC, can we do this tomorrow or Thursday so we can hash
it out in a way that doesn't end up talking past each other, like I feel
is happening here right now?

thanks,

greg k-h
Saravana Kannan Aug. 21, 2019, 2:04 a.m. UTC | #9
On Tue, Aug 20, 2019 at 6:56 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Aug 20, 2019 at 06:06:55PM -0700, Frank Rowand wrote:
> > On 8/20/19 3:10 PM, Saravana Kannan wrote:
> > > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
> > >>
> > >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> > >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
> > >>>>
> > >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> > >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> > >>>>>>
> > >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> > >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> > >>>>>>>  device addition
> > >>>>>>> From: Saravana Kannan <saravanak@google.com>
>
> This is a "fun" thread :(
>
> You two should get together in person this week and talk.  I think you
> both will be at ELC, can we do this tomorrow or Thursday so we can hash
> it out in a way that doesn't end up talking past each other, like I feel
> is happening here right now?
>

Resending again due to HTML (Sorry, was a mobile reply)

That would be great. Wednesday would be better for me. I might not
make it to ELC on Thursday. Let us know Frank.

Thanks,
Saravana
Saravana Kannan Aug. 21, 2019, 2:22 a.m. UTC | #10
On Tue, Aug 20, 2019 at 6:07 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/20/19 3:10 PM, Saravana Kannan wrote:
> > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>>>
> >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >>>>>>>  device addition
> >>>>>>> From: Saravana Kannan <saravanak@google.com>
> >>>>>>>
> >>>>>>> When devices are added, the bus might want to create device links to track
> >>>>>>> functional dependencies between supplier and consumer devices. This
> >>>>>>> tracking of supplier-consumer relationship allows optimizing device probe
> >>>>>>> order and tracking whether all consumers of a supplier are active. The
> >>>>>>> add_links bus callback is added to support this.
> >>>>>>
> >>>>>> Change above to:
> >>>>>>
> >>>>>> When devices are added, the bus may create device links to track which
> >>>>>> suppliers a consumer device depends upon.  This
> >>>>>> tracking of supplier-consumer relationship may be used to defer probing
> >>>>>> the driver of a consumer device before the driver(s) for its supplier device(s)
> >>>>>> are probed.  It may also be used by a supplier driver to determine if
> >>>>>> all of its consumers have been successfully probed.
> >>>>>> The add_links bus callback is added to create the supplier device links
> >>>>>>
> >>>>>>>
> >>>>>>> However, when consumer devices are added, they might not have a supplier
> >>>>>>> device to link to despite needing mandatory resources/functionality from
> >>>>>>> one or more suppliers. A waiting_for_suppliers list is created to track
> >>>>>>> such consumers and retry linking them when new devices get added.
> >>>>>>
> >>>>>> Change above to:
> >>>>>>
> >>>>>> If a supplier device has not yet been created when the consumer device attempts
> >>>>>> to link it, the consumer device is added to the wait_for_suppliers list.
> >>>>>> When supplier devices are created, the supplier device link will be added to
> >>>>>> the relevant consumer devices on the wait_for_suppliers list.
> >>>>>>
> >>>>>
> >>>>> I'll take these commit text suggestions if we decide to revert the
> >>>>> entire series at the end of this review.
> >>>>>
> >>>>>>>
> >>>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>>>>>> ---
> >>>>>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>  include/linux/device.h | 14 +++++++
> >>>>>>>  2 files changed, 97 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>>>>>> index da84a73f2ba6..1b4eb221968f 100644
> >>>>>>> --- a/drivers/base/core.c
> >>>>>>> +++ b/drivers/base/core.c
> >>>>>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> >>>>>>>  #endif
> >>>>>>>
> >>>>>>>  /* Device links support. */
> >>>>>>> +static LIST_HEAD(wait_for_suppliers);
> >>>>>>> +static DEFINE_MUTEX(wfs_lock);
> >>>>>>>
> >>>>>>>  #ifdef CONFIG_SRCU
> >>>>>>>  static DEFINE_MUTEX(device_links_lock);
> >>>>>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
> >>>>>>>  }
> >>>>>>>  EXPORT_SYMBOL_GPL(device_link_add);
> >>>>>>>
> >>>>>>> +/**
> >>>>>>
> >>>>>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
> >>>>>>
> >>>>>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> >>>>>
> >>>>
> >>>> As a meta-comment, I found this series very hard to understand in the context
> >>>> of reading the new code for the first time.  When I read the code again in
> >>>> six months or a year or two years it will not be in near term memory and it
> >>>> will be as if I am reading it for the first time.  A lot of my suggestions
> >>>> for changes of names are in that context -- the current names may be fine
> >>>> when one has recently read the code, but not so much when trying to read
> >>>> the whole thing again with a blank mind.
> >>>
> >>> Thanks for the context.
> >>>
> >>>> The code also inherits a good deal of complexity because it does not stand
> >>>> alone in a nice discrete chunk, but instead delicately weaves into a more
> >>>> complex body of code.
> >>>
> >>> I'll take this as a compliment :)
> >>
> >> Please do!
> >>
> >>
> >>>
> >>>> When I was trying to understand the code, I wrote a lot of additional
> >>>> comments within my reply email to provide myself context, information
> >>>> about various things, and questions that I needed to answer (or if I
> >>>> could not answer to then ask you).  Then I ended up being able to remove
> >>>> many of those notes before sending the reply.
> >>>>
> >>>>
> >>>>> I intentionally chose "Mark device..." because that's a better
> >>>>> description of the semantics of the function instead of trying to
> >>>>> describe the implementation. Whether I'm using a linked list or some
> >>>>> other data structure should not be the one line documentation of a
> >>>>> function. Unless the function is explicitly about operating on that
> >>>>> specific data structure.
> >>>>
> >>>> I agree with the intent of trying to describe the semantics of a function,
> >>>> especially at the API level where other systems (or drivers) would be using
> >>>> the function.  But for this case the function is at the implementation level
> >>>> and describing explicitly what it is doing makes this much more readable for
> >>>> me.
> >>>
> >>> Are you distinguishing between API level vs implementation level based
> >>> on the function being "static"/not exported? I believe the earlier
> >>
> >> No, being static helps say a function is not API, but an function that is
> >> not static may be intended to be used in a limited and constrained manner.
> >> I distinguished based on the usage of the function.
> >>
> >>
> >>> version of this series had this function as an exported API. So maybe
> >>> that's why I had it as "Mark device".
> >>>
> >>>> I also find "Mark device" to be vague and not descriptive of what the
> >>>> intent is.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>> + * @consumer: Consumer device
> >>>>>>> + *
> >>>>>>> + * Marks the consumer device as waiting for suppliers to become available. The
> >>>>>>> + * consumer device will never be probed until it's unmarked as waiting for
> >>>>>>> + * suppliers. The caller is responsible for adding the link to the supplier
> >>>>>>> + * once the supplier device is present.
> >>>>>>> + *
> >>>>>>> + * This function is NOT meant to be called from the probe function of the
> >>>>>>> + * consumer but rather from code that creates/adds the consumer device.
> >>>>>>> + */
> >>>>>>> +static void device_link_wait_for_supplier(struct device *consumer)
> >>>>>>> +{
> >>>>>>> +     mutex_lock(&wfs_lock);
> >>>>>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> >>>>>>> +     mutex_unlock(&wfs_lock);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +/**
> >>>>>>
> >>>>>>
> >>>>>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> >>>>>>> + *
> >>>>>>> + * Loops through all consumers waiting on suppliers and tries to add all their
> >>>>>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> >>>>>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> >>>>>>> + *
> >>>>>>> + * The add_links bus callback is expected to return 0 if it has found and added
> >>>>>>> + * all the supplier links for the consumer device. It should return an error if
> >>>>>>> + * it isn't able to do so.
> >>>>>>> + *
> >>>>>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
> >>>>>>> + * it's aware of potential suppliers becoming available.
> >>>>>>
> >>>>>> Change above comment to:
> >>>>>>
> >>>>>>     * device_link_add_supplier_links - add links from consumer devices to
> >>>>>>     *                                  supplier devices, leaving any consumer
> >>>>>>     *                                  with inactive suppliers on the
> >>>>>>     *                                  wait_for_suppliers list
> >>>>>
> >>>>> I didn't know that the first one line comment could span multiple
> >>>>> lines. Good to know.
> >>>>>
> >>>>>
> >>>>>>     * Scan all consumer devices in the devicetree.
> >>>>>
> >>>>> This function doesn't have anything to do with devicetree. I've
> >>>>> intentionally kept all OF related parts out of the driver/core because
> >>>>> I hope that other busses can start using this feature too. So I can't
> >>>>> take this bit.
> >>>>
> >>>> My comment is left over from when I was taking notes, trying to understand the
> >>>> code.
> >>>>
> >>>> At the moment, only devicetree is used as a source of the dependency information.
> >>>> The comment would better be re-phrased as:
> >>>>
> >>>>         * Scan all consumer devices in the firmware description of the hardware topology
> >>>>
> >>>
> >>> Ok
> >>>
> >>>> I did not ask why this feature is tied to _only_ the platform bus, but will now.
> >>>
> >>> Because devicetree and platform bus the only ones I'm familiar with.
> >>> If other busses want to add this, I'd be happy to help with code
> >>> and/or direction/review. But I won't pretend to know anything about
> >>> ACPI.
> >>
> >> Sorry, you don't get to ignore other buses because you are not familiar
> >> with them.
> >
> > It's important that I don't design out other buses -- which I don't.
> > But why would you want someone who has no idea of ACPI to write code
> > for it? It's a futile effort that's going to be rejected by people who
> > know ACPI anyway.
>
> ACPI is not a bus.
>
> Devicetree is not a bus.
>
> A devicetree can contain multiple buses in the topology that is described.

I understand these aren't busses. But most devices from ACPI and DT
get put on acpi bus or platform bus? So I was trying to just handle
platform bus since that's the majority of the use cases I run into as
part of Android. Anyway, see comments further below. I think we are
lining up now.

> >
> >> I am not aware of any reason to exclude devices that on other buses and your
> >> answer below does not provide a valid technical reason why the new feature is
> >> correct when it excludes all other buses.
> >>>
> >>>> I do not know of any reason that a consumer / supplier relationship can not be
> >>>> between devices on different bus types.  Do you know of such a reason?
> >>>
> >>> Yes, it's hypothetically possible. But I haven't seen such a
> >>> relationship being defined in DT. Nor somewhere else where this might
> >>> be captured. So, how common/realistic is it?
> >>
> >> It is entirely legal.  I have no idea how common it is but that is not a valid
> >> reason to exclude other buses from the feature.
> >
> > I'm not going to write code for a hypothetical hardware scenario. Find
> > one supported in upstream, show me that it'll benefit from this series
> > and tell me how to interpret the dependency graph and then we'll talk
> > about writing code for that.
>
> You don't get to implement a general feature in a way that only supports
> a subset of potential devicetree users.  Note the word "general".  This is
> not a small isolated feature.
>
> Now, am I being inconsistent if I say that it is ok for the feature to
> only support devicetree systems, or only support ACPI systems?  I'll
> have to ponder that.
>
> But I don't think the question of only platform buses or all buses needs
> to be resolved because I don't think that the add_links function is a bus
> specific function.  The add_links function is specific to devicetree or
> ACPI.
>
> We seem to be talking past each other on this point right now.  I don't now
> how to get our minds to the same place, but let's keep trying.

See my reply further below. That should address some of the concerns
as "buses" won't be a concern anymore. Having said that, the main
point I was making here is that I can't design for a hypothetical case
that has no example with a proper definition of what's the expected
behavior.

> >
> >>>>>
> >>>>>>  For any supplier device that
> >>>>>>     * is not already linked to the consumer device, add the supplier to the
> >>>>>>     * consumer device's device links.
> >>>>>>     *
> >>>>>>     * If all of a consumer device's suppliers are available then the consumer
> >>>>>>     * is removed from the wait_for_suppliers list (if previously on the list).
> >>>>>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
> >>>>>>     * already on the list).
> >>>>>
> >>>>> Honestly, I don't think this is any better than what I already have.
> >>>>
> >>>> Note that my version of these comments was written while I was reading the code,
> >>>> and did not have any big picture understanding yet.  This will likely also be
> >>>> the mind set of most everyone who reads this code in the future, once it is
> >>>> woven into the kernel.
> >>>>
> >>>> If you don't like the change, I can revisit it in a later version of the
> >>>> patch set.
> >>>
> >>> I'll take in all the ones I feel are reasonable or don't feel strongly
> >>> about. We can revisit the rest later.
> >>>
> >>>>>
> >>>>>>     * The add_links bus callback must return 0 if it has found and added all
> >>>>>>     * the supplier links for the consumer device. It must return an error if
> >>>>>>     * it is not able to do so.
> >>>>>>     *
> >>>>>>     * The caller of device_link_wait_for_supplier() is expected to call this once
> >>>>>>     * it is aware of potential suppliers becoming available.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> + */
> >>>>>>> +static void device_link_check_waiting_consumers(void)
> >>>>>>
> >>>>>> Function name is misleading and hides side effects.
> >>>>>>
> >>>>>> I have not come up with a name that does not hide side effects, but a better
> >>>>>> name would be:
> >>>>>>
> >>>>>>    device_link_add_supplier_links()
> >>>>>
> >>>>> I kinda agree that it could afford a better name. The current name is
> >>>>> too similar to device_links_check_suppliers() and I never liked that.
> >>>>
> >>>> Naming new fields or variables related to device links looks pretty
> >>>> challenging to me, because of the desire to be part of device links
> >>>> and not a wart pasted on the side.  So I share the pain in trying
> >>>> to find good names.
> >>>>
> >>>>>
> >>>>> Maybe device_link_add_missing_suppliers()?
> >>>>
> >>>> My first reaction was "yes, that sounds good".  But then I stopped and
> >>>> tried to read the name out of context.  The name is not adding the
> >>>> missing suppliers, it is saving the information that a supplier is
> >>>> not yet available (eg, is "missing").  I struggled in coming up with
> >>
> >> Reading what you say below, and looking at the code again, what I say
> >> in that sentence is backwards.  It is not adding the missing supplier
> >> device links, it is instead adding existing supplier device inks.
> >>
> >>
> >>>> the name that I suggested.  We can keep thinking.
> >>>
> >>> No, this function _IS_ about adding links to suppliers. These
> >>
> >> You are mis-reading what I wrote.  I said the function "is not adding
> >> the missing suppliers".  You are converting that to "is not adding
> >> links to the missing suppliers".
> >>
> >> My suggested name was hinting "add_supplier_links", which is what you
> >> say it does below.  The name you suggest is hinting "add_missing_suppliers".
> >> Do you see the difference?
> >
> > Yeah, which is why I said earlier that I didn't want to repeat "links"
> > twice in a function name. As in
> > device_links_add_missing_supplier_links() has too many "links". In the
> > context of device_links_, "add missing suppliers" means "add missing
> > supplier links". Anyway, I think we can come back to figuring out a
> > good name once we agree on the more important discussions further
> > below.
>
> Yes, later is fine.  This is a detail.
>
> >
> >>> consumers were "saved" as "not yet having the supplier" earlier by
> >>> device_link_wait_for_supplier(). This function doesn't do that. This
> >>> function is just trying to see if those missing suppliers are present
> >>> now and if so adding a link to them from the "saved" consumers. I
> >>> think device_link_add_missing_suppliers() is actually a pretty good
> >>> name. Let me know what you think now.
> >>>
> >>>>
> >>>>
> >>>>>
> >>>>> I don't think we need "links" repeated twice in the function name.
> >>>>
> >>>> Yeah, I didn't like that either.
> >>>>
> >>>>
> >>>>> With this suggestion, what side effect is hidden in your opinion? That
> >>>>> the fully linked consumer is removed from the "waiting for suppliers"
> >>>>> list?
> >>>>
> >>>> The side effect is that the function does not merely do a check.  It also
> >>>> adds missing suppliers to a list.
> >>>
> >>> No, it doesn't do that. I can't keep a list of things that aren't
> >>> allocated yet :). In the whole patch series, we only keep a list of things
> >>> (consumers) that are waiting on other things (missing suppliers).
> >>
> >> OK, as I noted above, I stated that backwards.  It is adding links for
> >> existing suppliers, not for the missing suppliers.
> >>
> >>>
> >>>>>
> >>>>> Maybe device_link_try_removing_from_wfs()?
> >>>>
> >>>> I like that, other than the fact that it still does not provide a clue
> >>>> that the function is potentially adding suppliers to a list.
> >>>
> >>> It doesn't. How would you add a supplier device to a list if the
> >>> device itself isn't there? :)
> >>
> >> Again, that should be existing suppliers, as you noted.  But the point stands
> >> that the function is potentially adding links.
> >>
> >>
> >>>
> >>>>  I think
> >>>> part of the challenge is that the function does two things: (1) a check,
> >>>> and (2) potentially adding missing suppliers to a list.  Maybe a simple
> >>>> one line comment at the call site, something like:
> >>>>
> >>>>    /* adds missing suppliers to wfs */
> >>>>
> >>>>
> >>>>>
> >>>>> I'll wait for us to agree on a better name here before I change this.
> >>>>>
> >>>>>>> +{
> >>>>>>> +     struct device *dev, *tmp;
> >>>>>>> +
> >>>>>>> +     mutex_lock(&wfs_lock);
> >>>>>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> >>>>>>> +                              links.needs_suppliers)
> >>>>>>> +             if (!dev->bus->add_links(dev))
> >>>>>>> +                     list_del_init(&dev->links.needs_suppliers);
> >>>>>>
> >>>>>> Empties dev->links.needs_suppliers, but does not remove dev from
> >>>>>> wait_for_suppliers list.  Where does that happen?
> >>>>>
> >>>>> I'll chalk this up to you having a long day or forgetting your coffee
> >>>>> :) list_del_init() does both of those things because needs_suppliers
> >>>>> is the node and wait_for_suppliers is the list.
> >>>>
> >>>> Yes, brain mis-fire on my part.  I'll have to go back and look at the
> >>>> list related code again.
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>> +     mutex_unlock(&wfs_lock);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>  static void device_link_free(struct device_link *link)
> >>>>>>>  {
> >>>>>>>       while (refcount_dec_not_one(&link->rpm_active))
> >>>>>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
> >>>>>>>       struct device_link *link;
> >>>>>>>       int ret = 0;
> >>>>>>>
> >>>>>>> +     /*
> >>>>>>> +      * If a device is waiting for one or more suppliers (in
> >>>>>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
> >>>>>>> +      * return -EPROBE_DEFER without having to check the links with existing
> >>>>>>> +      * suppliers.
> >>>>>>> +      */
> >>>>>>
> >>>>>> Change comment to:
> >>>>>>
> >>>>>>         /*
> >>>>>>          * Device waiting for supplier to become available is not allowed
> >>>>>>          * to probe
> >>>>>>          */
> >>>>>
> >>>>> Po-tay-to. Po-tah-to? I think my comment is just as good.
> >>>>
> >>>> If just as good and shorter, then better.
> >>>>
> >>>> Also the original says "it is not ready to probe".  That is not correct.  It
> >>>> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
> >>>> Nit picky on my part, but tiny things like that mean I have to think harder.
> >>>> I have to think "why is it not ready to probe?".  Maybe my version should have
> >>>> instead been something like:
> >>>>
> >>>>         * Device waiting for supplier to become available will return
> >>>>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
> >>>>
> >>>>>
> >>>>>>> +     mutex_lock(&wfs_lock);
> >>>>>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
> >>>>>>> +             mutex_unlock(&wfs_lock);
> >>>>>>> +             return -EPROBE_DEFER;
> >>>>>>> +     }
> >>>>>>> +     mutex_unlock(&wfs_lock);
> >>>>>>> +
> >>>>>>>       device_links_write_lock();
> >>>>>>
> >>>>>> Update Documentation/driver-api/device_link.rst to reflect the
> >>>>>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
> >>>>>
> >>>>> Thanks! Will do.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
> >>>>>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
> >>>>>>>  {
> >>>>>>>       struct device_link *link, *ln;
> >>>>>>>
> >>>>>>> +     mutex_lock(&wfs_lock);
> >>>>>>> +     list_del(&dev->links.needs_suppliers);
> >>>>>>> +     mutex_unlock(&wfs_lock);
> >>>>>>> +
> >>>>>>>       /*
> >>>>>>>        * Delete all of the remaining links from this device to any other
> >>>>>>>        * devices (either consumers or suppliers).
> >>>>>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
> >>>>>>>  #endif
> >>>>>>>       INIT_LIST_HEAD(&dev->links.consumers);
> >>>>>>>       INIT_LIST_HEAD(&dev->links.suppliers);
> >>>>>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
> >>>>>>>       dev->links.status = DL_DEV_NO_DRIVER;
> >>>>>>>  }
> >>>>>>>  EXPORT_SYMBOL_GPL(device_initialize);
> >>>>>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
> >>>>>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
> >>>>>>>
> >>>>>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
> >>>>>>
> >>>>>>> +
> >>>>>>> +     /*
> >>>>>>> +      * Check if any of the other devices (consumers) have been waiting for
> >>>>>>> +      * this device (supplier) to be added so that they can create a device
> >>>>>>> +      * link to it.
> >>>>>>> +      *
> >>>>>>> +      * This needs to happen after device_pm_add() because device_link_add()
> >>>>>>> +      * requires the supplier be registered before it's called.
> >>>>>>> +      *
> >>>>>>> +      * But this also needs to happe before bus_probe_device() to make sure
> >>>>>>> +      * waiting consumers can link to it before the driver is bound to the
> >>>>>>> +      * device and the driver sync_state callback is called for this device.
> >>>>>>> +      */
> >>>>>>
> >>>>>>         /*
> >>>>>>          * Add links to dev from any dependent consumer that has dev on it's
> >>>>>>          * list of needed suppliers
> >>>>>
> >>>>> There is no list of needed suppliers.
> >>>>
> >>>> "the other devices (consumers) have been waiting for this device (supplier)".
> >>>> Isn't that a list of needed suppliers?
> >>>
> >>> No, that's a list of consumers that needs_suppliers.
> >>>
> >>>>>
> >>>>>> (links.needs_suppliers).  Device_pm_add()
> >>>>>>          * must have previously registered dev to allow the links to be added.
> >>>>>>          *
> >>>>>>          * The consumer links must be created before dev is probed because the
> >>>>>>          * sync_state callback for dev will use the consumer links.
> >>>>>>          */
> >>>>>
> >>>>> I think what I wrote is just as clear.
> >>>>
> >>>> The original comment is vague.  It does not explain why consumer links must be
> >>>> created before the probe.  I had to go off and read other code to determine
> >>>> why that is true.
> >>>>
> >>>> And again, brevity is better if otherwise just as clear.
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>> +     device_link_check_waiting_consumers();
> >>>>>>> +
> >>>>>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> >>>>>>> +             device_link_wait_for_supplier(dev);
> >>>>>>> +
> >>>>>>>       bus_probe_device(dev);
> >>>>>>>       if (parent)
> >>>>>>>               klist_add_tail(&dev->p->knode_parent,
> >>>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
> >>>>>>> index c330b75c6c57..5d70babb7462 100644
> >>>>>>> --- a/include/linux/device.h
> >>>>>>> +++ b/include/linux/device.h
> >>>>>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
> >>>>>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
> >>>>>>>   * @uevent:  Called when a device is added, removed, or a few other things
> >>>>>>>   *           that generate uevents to add the environment variables.
> >>>>>>
> >>>>>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
> >>>>>>> + *           added to this bus.  The function is expected to create device
> >>>>>>> + *           links to all the suppliers of the input device that are
> >>>>>>> + *           available at the time this function is called.  As in, the
> >>>>>>> + *           function should NOT stop at the first failed device link if
> >>>>>>> + *           other unlinked supplier devices are present in the system.
> >>>>>>
> >>>>>> * @add_links:   Called after a device is added to this bus.
> >>>>>
> >>>>> Why are you removing the "perhaps multiple times" part? that's true
> >>>>> and that's how some of the other ops are documented.
> >>>>
> >>>> I didn't remove it.  I rephrased it with a little bit more explanation as
> >>>> "If some suppliers are not yet available, this function will be
> >>>> called again when the suppliers become available." (below).
> >>>>
> >>>>
> >>>>>
> >>>>>>  The function is
> >>>>>> *               expected to create device links to all the suppliers of the
> >>>>>> *               device that are available at the time this function is called.
> >>>>>> *               The function must NOT stop at the first failed device link if
> >>>>>> *               other unlinked supplier devices are present in the system.
> >>>>>> *               If some suppliers are not yet available, this function will be
> >>>>>> *               called again when the suppliers become available.
> >>>>>>
> >>>>>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
> >>>>>
> >>>>> Sorry, I'm not sure I understand. Can you please explain what you are
> >>>>> trying to say? of_link_to_suppliers() is just one implementation of
> >>>>> add_links(). The comment above is try for any bus trying to implement
> >>>>> add_links().
> >>>>
> >>>> This is conflating bus with the source of the firmware description of the
> >>>> hardware topology.  For drivers that use various APIs to access firmware
> >>>> description of topology that may be either devicetree or ACPI the access
> >>>> is done via fwnode_operations, based on struct device.fwnode (if I recall
> >>>> properly).
> >>>>
> >>>> I failed to completely address why add_links() is not needed.  The answer
> >>>> is that there should be a single function called for all buses.  Then
> >>>> the proper firmware data source would be accessed via a struct fwnode_operations.
> >>>>
> >>>> I think I left this out because I had not yet asked why this feature is
> >>>> tied only to the platform bus.  Which I asked earlier in this reply.
> >>>
> >>> Thanks for the pointer about fwnode and fwnode_operations. I wasn't
> >>> aware of those. I see where you are going with this. I see a couple of
> >>> problems with this approach though:
> >>>
> >>> 1. How you interpret the properties of a fwnode is specific to the fw
> >>> type. The clocks DT property isn't going to have the same definition
> >>> in ACPI or some other firmware. Heck, I don't know if ACPI even has a
> >>> clocks like property. So have one function to parse all the FW types
> >>> doesn't make a lot of sense.
> >>
> >> The functions in fwnode_operations are specific to the proper firmware.
> >> So there is a set of functions in a struct fwnode_operations for
> >> devicetree that only know about devicetree.  And there is a different
> >> variable of type fwnode_operations that is initialized with ACPI
> >> specific functions.
> >
> > Yes, I understand how ops work :) So I have one ops (fwnode ops) to
> > call that will read a property from DT or ACPI depending on where that
> > specific device's firmware is from. But that's not my point here.
> >
> > My point is that clock bindings in DT are under a "clocks" property
> > that lists references (phandles) to the supplier. But in ACPI, the
> > property might be called "clk" and could list references to actual
> > clock IDs. So, you can't have one piece of code that works for all
> > firmware even if I have one ops that can read properties from any
> > firmware.
> >
> > I'll still have to know what type the underlying firmware is before I
> > try to interpret the properties. So having one function that parses DT
> > and ACPI and whatever else would be a terrible and unnecessary design.
>
> You have already implemented the devicetree function, which is
> of_link_to_suppliers().  The devicetree fwnode_operations would have
> a pointer to of_link_to_suppliers().

Ah, I didn't realize you were asking me to add to the fwnode ops. I
thought you wanted me to handle this at the driver core level by
moving of_link_to_suppliers to driver core and replacing of_* APIs
with fwnode ops. And that seemed like a terrible idea. Glad you
weren't suggesting that.

I'm definitely open to adding a add_links to fwnode ops, but I'm not
sure if fwnode ops changes are frowned upon or not. I'll still need
the device specific edit_links() but that's a separate issue.

> If ACPI support is added, there would be an analogous ACPI aware function
> that would essentially do the same thing that of_link_to_suppliers()
> does.  This would be in the ACPI version of fwnode_operations.

Agreed. As long as you don't ask me to implement the ACPI ops :)

> There would not be a single function that is both devicetree aware and
> ACPI aware.

Great.

> >>> 2. If this common code is implemented as part of driver/base/, then at
> >>> a minimum, I'll have to check if a fwnode is a DT node before I start
> >>> interpreting the properties of a device's fwnode. But that means I'll
> >>> have to include linux/of.h to use is_of_node(). I don't like having
> >>> driver/base code depend on OF or platform or ACPI headers.
> >>
> >> You just use the function in the device's fwnode_operations (I think,
> >> I would have to go look at the precise way the code works because it
> >> has been quite a while since I've looked at it).
> >
> > Because you missed my point in (1) you are missing my point in (2).
> > I'll wait for your updated reply.
>
> We are still talking at cross purposes.  If my reply to (1) does not
> change that, I'll have to go dig into how the fwnode framework figures
> out which set of fwnode_operations to use for each device.

I think we are lining up better now. But still have some more to go :)

Having said that, I'd still like to meet your tomorrow if that's
possible (see Greg's email).

-Saravana
Frank Rowand Aug. 21, 2019, 4:24 a.m. UTC | #11
On 8/20/19 7:01 PM, Saravana Kannan wrote:
> 
> 
> On Tue, Aug 20, 2019, 6:56 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org <mailto:gregkh@linuxfoundation.org>> wrote:
> 
>     On Tue, Aug 20, 2019 at 06:06:55PM -0700, Frank Rowand wrote:
>     > On 8/20/19 3:10 PM, Saravana Kannan wrote:
>     > > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com <mailto:frowand.list@gmail.com>> wrote:
>     > >>
>     > >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
>     > >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com <mailto:frowand.list@gmail.com>> wrote:
>     > >>>>
>     > >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>     > >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com <mailto:frowand.list@gmail.com>> wrote:
>     > >>>>>>
>     > >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
>     > >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>     > >>>>>>>  device addition
>     > >>>>>>> From: Saravana Kannan <saravanak@google.com <mailto:saravanak@google.com>>
> 
>     This is a "fun" thread :(
> 
>     You two should get together in person this week and talk.  I think you
>     both will be at ELC, can we do this tomorrow or Thursday so we can hash
>     it out in a way that doesn't end up talking past each other, like I feel
>     is happening here right now?
> 
> 
> That would be great. Wednesday would be better for me. I might not make it to ELC on Thursday. Let us know Frank.
> 
> Thanks,
> Saravana

I am really glad that you are here at ELC.  It should be very productive to
sit down together and figure some things out.

I'll send a separate reply with my phone number off list.

-Frank
Frank Rowand Aug. 21, 2019, 3:36 p.m. UTC | #12
On 8/20/19 3:10 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 8/19/19 5:00 PM, Saravana Kannan wrote:
>>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>

< snip >

>>>
>>> 3. The supplier info doesn't always need to come from a firmware. So I
>>> don't want to limit it to that?
>>
>> If you can find another source of topology info, then I would expect
>> that another set of fwnode_operations functions would be created
>> for the info source.
> 
> The other source could just be C files in the kernel. Using fwnodes
> for that would be hacky. But let's sort (1) and (2) out first.
> 

< snip >

Just a piece of trivia.  I got curious enough about this to search.
There is a third type of fwnode, software nodes.

See commit 59abd83672f70cac4b6bf9b237506c5bc6837606 for a description.

-Frank

Patch
diff mbox series

diff --git a/drivers/base/core.c b/drivers/base/core.c
index da84a73f2ba6..1b4eb221968f 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -44,6 +44,8 @@  early_param("sysfs.deprecated", sysfs_deprecated_setup);
 #endif
 
 /* Device links support. */
+static LIST_HEAD(wait_for_suppliers);
+static DEFINE_MUTEX(wfs_lock);
 
 #ifdef CONFIG_SRCU
 static DEFINE_MUTEX(device_links_lock);
@@ -401,6 +403,51 @@  struct device_link *device_link_add(struct device *consumer,
 }
 EXPORT_SYMBOL_GPL(device_link_add);
 
+/**
+ * device_link_wait_for_supplier - Mark device as waiting for supplier
+ * @consumer: Consumer device
+ *
+ * Marks the consumer device as waiting for suppliers to become available. The
+ * consumer device will never be probed until it's unmarked as waiting for
+ * suppliers. The caller is responsible for adding the link to the supplier
+ * once the supplier device is present.
+ *
+ * This function is NOT meant to be called from the probe function of the
+ * consumer but rather from code that creates/adds the consumer device.
+ */
+static void device_link_wait_for_supplier(struct device *consumer)
+{
+	mutex_lock(&wfs_lock);
+	list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
+	mutex_unlock(&wfs_lock);
+}
+
+/**
+ * device_link_check_waiting_consumers - Try to unmark waiting consumers
+ *
+ * Loops through all consumers waiting on suppliers and tries to add all their
+ * supplier links. If that succeeds, the consumer device is unmarked as waiting
+ * for suppliers. Otherwise, they are left marked as waiting on suppliers,
+ *
+ * The add_links bus callback is expected to return 0 if it has found and added
+ * all the supplier links for the consumer device. It should return an error if
+ * it isn't able to do so.
+ *
+ * The caller of device_link_wait_for_supplier() is expected to call this once
+ * it's aware of potential suppliers becoming available.
+ */
+static void device_link_check_waiting_consumers(void)
+{
+	struct device *dev, *tmp;
+
+	mutex_lock(&wfs_lock);
+	list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
+				 links.needs_suppliers)
+		if (!dev->bus->add_links(dev))
+			list_del_init(&dev->links.needs_suppliers);
+	mutex_unlock(&wfs_lock);
+}
+
 static void device_link_free(struct device_link *link)
 {
 	while (refcount_dec_not_one(&link->rpm_active))
@@ -535,6 +582,19 @@  int device_links_check_suppliers(struct device *dev)
 	struct device_link *link;
 	int ret = 0;
 
+	/*
+	 * If a device is waiting for one or more suppliers (in
+	 * wait_for_suppliers list), it is not ready to probe yet. So just
+	 * return -EPROBE_DEFER without having to check the links with existing
+	 * suppliers.
+	 */
+	mutex_lock(&wfs_lock);
+	if (!list_empty(&dev->links.needs_suppliers)) {
+		mutex_unlock(&wfs_lock);
+		return -EPROBE_DEFER;
+	}
+	mutex_unlock(&wfs_lock);
+
 	device_links_write_lock();
 
 	list_for_each_entry(link, &dev->links.suppliers, c_node) {
@@ -812,6 +872,10 @@  static void device_links_purge(struct device *dev)
 {
 	struct device_link *link, *ln;
 
+	mutex_lock(&wfs_lock);
+	list_del(&dev->links.needs_suppliers);
+	mutex_unlock(&wfs_lock);
+
 	/*
 	 * Delete all of the remaining links from this device to any other
 	 * devices (either consumers or suppliers).
@@ -1673,6 +1737,7 @@  void device_initialize(struct device *dev)
 #endif
 	INIT_LIST_HEAD(&dev->links.consumers);
 	INIT_LIST_HEAD(&dev->links.suppliers);
+	INIT_LIST_HEAD(&dev->links.needs_suppliers);
 	dev->links.status = DL_DEV_NO_DRIVER;
 }
 EXPORT_SYMBOL_GPL(device_initialize);
@@ -2108,6 +2173,24 @@  int device_add(struct device *dev)
 					     BUS_NOTIFY_ADD_DEVICE, dev);
 
 	kobject_uevent(&dev->kobj, KOBJ_ADD);
+
+	/*
+	 * Check if any of the other devices (consumers) have been waiting for
+	 * this device (supplier) to be added so that they can create a device
+	 * link to it.
+	 *
+	 * This needs to happen after device_pm_add() because device_link_add()
+	 * requires the supplier be registered before it's called.
+	 *
+	 * But this also needs to happe before bus_probe_device() to make sure
+	 * waiting consumers can link to it before the driver is bound to the
+	 * device and the driver sync_state callback is called for this device.
+	 */
+	device_link_check_waiting_consumers();
+
+	if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
+		device_link_wait_for_supplier(dev);
+
 	bus_probe_device(dev);
 	if (parent)
 		klist_add_tail(&dev->p->knode_parent,
diff --git a/include/linux/device.h b/include/linux/device.h
index c330b75c6c57..5d70babb7462 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -78,6 +78,17 @@  extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
  *		-EPROBE_DEFER it will queue the device for deferred probing.
  * @uevent:	Called when a device is added, removed, or a few other things
  *		that generate uevents to add the environment variables.
+ * @add_links:	Called, perhaps multiple times per device, after a device is
+ *		added to this bus.  The function is expected to create device
+ *		links to all the suppliers of the input device that are
+ *		available at the time this function is called.  As in, the
+ *		function should NOT stop at the first failed device link if
+ *		other unlinked supplier devices are present in the system.
+ *
+ *		Return 0 if device links have been successfully created to all
+ *		the suppliers of this device.  Return an error if some of the
+ *		suppliers are not yet available and this function needs to be
+ *		reattempted in the future.
  * @probe:	Called when a new device or driver add to this bus, and callback
  *		the specific driver's probe to initial the matched device.
  * @remove:	Called when a device removed from this bus.
@@ -122,6 +133,7 @@  struct bus_type {
 
 	int (*match)(struct device *dev, struct device_driver *drv);
 	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
+	int (*add_links)(struct device *dev);
 	int (*probe)(struct device *dev);
 	int (*remove)(struct device *dev);
 	void (*shutdown)(struct device *dev);
@@ -893,11 +905,13 @@  enum dl_dev_state {
  * struct dev_links_info - Device data related to device links.
  * @suppliers: List of links to supplier devices.
  * @consumers: List of links to consumer devices.
+ * @needs_suppliers: Hook to global list of devices waiting for suppliers.
  * @status: Driver status information.
  */
 struct dev_links_info {
 	struct list_head suppliers;
 	struct list_head consumers;
+	struct list_head needs_suppliers;
 	enum dl_dev_state status;
 };