[v1,0/5] Solve postboot supplier cleanup and optimize probe ordering
mbox series

Message ID 20190524010117.225219-1-saravanak@google.com
Headers show
Series
  • Solve postboot supplier cleanup and optimize probe ordering
Related show

Message

Saravana Kannan May 24, 2019, 1:01 a.m. UTC
Add a generic "depends-on" property that allows specifying mandatory
functional dependencies between devices. Add device-links after the
devices are created (but before they are probed) by looking at this
"depends-on" property.

This property is used instead of existing DT properties that specify
phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
is because not all resources referred to by existing DT properties are
mandatory functional dependencies. Some devices/drivers might be able
to operate with reduced functionality when some of the resources
aren't available. For example, a device could operate in polling mode
if no IRQ is available, a device could skip doing power management if
clock or voltage control isn't available and they are left on, etc.

So, adding mandatory functional dependency links between devices by
looking at referred phandles in DT properties won't work as it would
prevent probing devices that could be probed. By having an explicit
depends-on property, we can handle these cases correctly.

Having functional dependencies explicitly called out in DT and
automatically added before the devices are probed, provides the
following benefits:

- Optimizes device probe order and avoids the useless work of
  attempting probes of devices that will not probe successfully
  (because their suppliers aren't present or haven't probed yet).

  For example, in a commonly available mobile SoC, registering just
  one consumer device's driver at an initcall level earlier than the
  supplier device's driver causes 11 failed probe attempts before the
  consumer device probes successfully. This was with a kernel with all
  the drivers statically compiled in. This problem gets a lot worse if
  all the drivers are loaded as modules without direct symbol
  dependencies.

- Supplier devices like clock providers, regulators providers, etc
  need to keep the resources they provide active and at a particular
  state(s) during boot up even if their current set of consumers don't
  request the resource to be active. This is because the rest of the
  consumers might not have probed yet and turning off the resource
  before all the consumers have probed could lead to a hang or
  undesired user experience.

  Some frameworks (Eg: regulator) handle this today by turning off
  "unused" resources at late_initcall_sync and hoping all the devices
  have probed by then. This is not a valid assumption for systems with
  loadable modules. Other frameworks (Eg: clock) just don't handle
  this due to the lack of a clear signal for when they can turn off
  resources. This leads to downstream hacks to handle cases like this
  that can easily be solved in the upstream kernel.

  By linking devices before they are probed, we give suppliers a clear
  count of the number of dependent consumers. Once all of the
  consumers are active, the suppliers can turn off the unused
  resources without making assumptions about the number of consumers.

By default we just add device-links to track "driver presence" (probe
succeeded) of the supplier device. If any other functionality provided
by device-links are needed, it is left to the consumer/supplier
devices to change the link when they probe.
 

Saravana Kannan (5):
  of/platform: Speed up of_find_device_by_node()
  driver core: Add device links support for pending links to suppliers
  dt-bindings: Add depends-on property
  of/platform: Add functional dependency link from "depends-on" property
  driver core: Add sync_state driver/bus callback

 .../devicetree/bindings/depends-on.txt        |  26 +++++
 drivers/base/core.c                           | 106 ++++++++++++++++++
 drivers/of/platform.c                         |  75 ++++++++++++-
 include/linux/device.h                        |  24 ++++
 include/linux/of.h                            |   3 +
 5 files changed, 233 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/depends-on.txt

Comments

Greg Kroah-Hartman May 24, 2019, 5:52 a.m. UTC | #1
On Thu, May 23, 2019 at 06:01:11PM -0700, Saravana Kannan wrote:
> Add a generic "depends-on" property that allows specifying mandatory
> functional dependencies between devices. Add device-links after the
> devices are created (but before they are probed) by looking at this
> "depends-on" property.
> 
> This property is used instead of existing DT properties that specify
> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> is because not all resources referred to by existing DT properties are
> mandatory functional dependencies. Some devices/drivers might be able
> to operate with reduced functionality when some of the resources
> aren't available. For example, a device could operate in polling mode
> if no IRQ is available, a device could skip doing power management if
> clock or voltage control isn't available and they are left on, etc.
> 
> So, adding mandatory functional dependency links between devices by
> looking at referred phandles in DT properties won't work as it would
> prevent probing devices that could be probed. By having an explicit
> depends-on property, we can handle these cases correctly.
> 
> Having functional dependencies explicitly called out in DT and
> automatically added before the devices are probed, provides the
> following benefits:
> 
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
> 
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.
> 
> - Supplier devices like clock providers, regulators providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
> 
>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.
> 
>   By linking devices before they are probed, we give suppliers a clear
>   count of the number of dependent consumers. Once all of the
>   consumers are active, the suppliers can turn off the unused
>   resources without making assumptions about the number of consumers.
> 
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.

Somewhere in this wall of text you need to say:
	MAKES DEVICES BOOT FASTER!
right?  :)

So in short, this solves the issue of deferred probing with systems with
loads of modules for platform devices and device tree, in that now you
have a chance to probe devices in the correct order saving loads of busy
loops.

A good thing, I like this, very nice work, all of these are:
	Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
but odds are I'll take this through my tree, so I'll add my s-o-b then.
But only after the DT people agree on the new entry.

thanks,

greg k-h
Rob Herring May 24, 2019, 1:04 p.m. UTC | #2
On Thu, May 23, 2019 at 8:01 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Add a generic "depends-on" property that allows specifying mandatory
> functional dependencies between devices. Add device-links after the
> devices are created (but before they are probed) by looking at this
> "depends-on" property.

The DT already has dependency information. A node with 'clocks'
property has its dependency right there. We should use that. We don't
need to duplicate the information.

> This property is used instead of existing DT properties that specify
> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> is because not all resources referred to by existing DT properties are
> mandatory functional dependencies. Some devices/drivers might be able
> to operate with reduced functionality when some of the resources
> aren't available. For example, a device could operate in polling mode
> if no IRQ is available, a device could skip doing power management if
> clock or voltage control isn't available and they are left on, etc.

Yeah, but none of these examples are typically what you'd want to
happen. These cases are a property of the OS, not the DT. For example,
until recently, If you added pinctrl bindings to your DT, the kernel
would no longer boot because it would be looking for pinctrl driver.
That's wrong because the DT should not be coupled to the OS like that.
Adding this property will cause the same problem.

> So, adding mandatory functional dependency links between devices by
> looking at referred phandles in DT properties won't work as it would
> prevent probing devices that could be probed. By having an explicit
> depends-on property, we can handle these cases correctly.
>
> Having functional dependencies explicitly called out in DT and
> automatically added before the devices are probed, provides the
> following benefits:
>
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
>
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.

Do you have data on how much time is spent. Past 'smarter probing'
attempts have not shown a significant difference.

> - Supplier devices like clock providers, regulators providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.

We already know generally what devices are dependencies because you
just listed them. Why don't we make the kernel smarter by
instantiating these core devices/drivers first instead of relying on
initcall and link order.

>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.

IMO, we should get rid of this auto disabling.

Rob
Frank Rowand May 24, 2019, 5:49 p.m. UTC | #3
On 5/23/19 6:01 PM, Saravana Kannan wrote:
> Add a generic "depends-on" property that allows specifying mandatory
> functional dependencies between devices. Add device-links after the
> devices are created (but before they are probed) by looking at this
> "depends-on" property.
> 
> This property is used instead of existing DT properties that specify
> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> is because not all resources referred to by existing DT properties are
> mandatory functional dependencies. Some devices/drivers might be able> to operate with reduced functionality when some of the resources
> aren't available. For example, a device could operate in polling mode
> if no IRQ is available, a device could skip doing power management if
> clock or voltage control isn't available and they are left on, etc.
> 
> So, adding mandatory functional dependency links between devices by
> looking at referred phandles in DT properties won't work as it would
> prevent probing devices that could be probed. By having an explicit
> depends-on property, we can handle these cases correctly.

Trying to wrap my brain around the concept, this series seems to be
adding the ability to declare that an apparent dependency (eg an IRQ
specified by a phandle) is _not_ actually a dependency.

The phandle already implies the dependency.  Creating a separate
depends-on property provides a method of ignoring the implied
dependencies.

This is not just hardware description.  It is instead a combination
of hardware functionality and driver functionality.  An example
provided in the second paragraph of the email I am replying to
suggests a device could operate in polling mode if no IRQ is
available.  Using this example, the devicetree does not know
whether the driver requires the IRQ (currently an implied
dependency since the IRQ phandle exists).  My understanding
of this example is that the device node would _not_ have a
depends-on property for the IRQ phandle so the IRQ would be
optional.  But this is an attribute of the driver, not the
hardware.  This is also configuration, declaring whether the
system is willing to accept polling mode instead of interrupt
mode.

Devicetree is not the proper place for driver description or
for configuration.

Another flaw with this method is that existing device trees
will be broken after the kernel is modified, because existing
device trees do not have the depends-on property.  This breaks
the devicetree compatibility rules.


> Having functional dependencies explicitly called out in DT and
> automatically added before the devices are probed, provides the
> following benefits:
> 
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
> 
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.
> 
> - Supplier devices like clock providers, regulators providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
> 
>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.
> 
>   By linking devices before they are probed, we give suppliers a clear

By linking devices to suppliers before they are probed, we give suppliers a clear

>   count of the number of dependent consumers. Once all of the
>   consumers are active, the suppliers can turn off the unused
>   resources without making assumptions about the number of consumers.
> 
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
>  
> 
> Saravana Kannan (5):
>   of/platform: Speed up of_find_device_by_node()
>   driver core: Add device links support for pending links to suppliers
>   dt-bindings: Add depends-on property
>   of/platform: Add functional dependency link from "depends-on" property
>   driver core: Add sync_state driver/bus callback
> 
>  .../devicetree/bindings/depends-on.txt        |  26 +++++
>  drivers/base/core.c                           | 106 ++++++++++++++++++
>  drivers/of/platform.c                         |  75 ++++++++++++-
>  include/linux/device.h                        |  24 ++++
>  include/linux/of.h                            |   3 +
>  5 files changed, 233 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
> 

The above issues make this specific implementation not acceptable.

I like the analysis of the problem areas, and I like the concepts of
trying to solve not only probe ordering, but also the problem of
when to turn off resources that will not be needed.  But at the
moment, I don't have a suggestion of a way to implement a solution.

-Frank
Saravana Kannan May 24, 2019, 9:51 p.m. UTC | #4
Before I address all the comments, a friendly reminder: Whatever
solution we come up with needs to work on a system with loadable
modules and shouldn't depend on userspace for correctness.

On Fri, May 24, 2019 at 6:04 AM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Thu, May 23, 2019 at 8:01 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > Add a generic "depends-on" property that allows specifying mandatory
> > functional dependencies between devices. Add device-links after the
> > devices are created (but before they are probed) by looking at this
> > "depends-on" property.
>
> The DT already has dependency information. A node with 'clocks'
> property has its dependency right there. We should use that. We don't
> need to duplicate the information.

So, are you saying all clock bindings are mandatory for a device for
all possible users of DT + Linux? Do you think if we have a patch that
makes all clock bindings mandatory dependencies, no one would object
to that?

> > This property is used instead of existing DT properties that specify
> > phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> > is because not all resources referred to by existing DT properties are
> > mandatory functional dependencies. Some devices/drivers might be able
> > to operate with reduced functionality when some of the resources
> > aren't available. For example, a device could operate in polling mode
> > if no IRQ is available, a device could skip doing power management if
> > clock or voltage control isn't available and they are left on, etc.
>
> Yeah, but none of these examples are typically what you'd want to
> happen. These cases are a property of the OS, not the DT. For example,
> until recently, If you added pinctrl bindings to your DT, the kernel
> would no longer boot because it would be looking for pinctrl driver.
> That's wrong because the DT should not be coupled to the OS like that.
> Adding this property will cause the same problem.

Isn't the a perfect example of the pinctrl being an optional
dependency in that specific case? The kernel still booted if pinctrl
wasn't available?

I don't agree that the dependency is purely a property of the OS. If
there's no clock to clock the hardware core, then the hardware just
can't work. There's no question about that. However, there can be
clock bindings that aren't mandatory for functionality but are needed
just for performance/power control.

Another perfect example are clock providers. Clock providers often get
input clocks from multiple other clock providers and even have cyclic
clock bindings. But only some of them are mandatory for the clock
provider to work. For example, clock provider A has input clocks from
clock providers B and C, but it only needs B to function (provides
root clock to all clocks). Not having C would only affect 4 (out of
100s of clocks) from clock provider A and those 4 are clocks depend on
an input clock from C (basically clock from C going to A to have some
clock gates and dividers added and sent back to C). This isn't even a
made up scenario -- there are SoCs that actually have this.

The OS could still choose to not probe the device unless full
functionality is available or it could assume all clocks are left on
by the bootloader and provide basic functionality. THAT would be the
property of the OS. But that doesn't remove the fact that some of the
resources are absolutely mandatory for the hardware to function. I'm
proposing the depends-on to capture the true hardware dependency --
not what the SW chooses to do with it.

> > So, adding mandatory functional dependency links between devices by
> > looking at referred phandles in DT properties won't work as it would
> > prevent probing devices that could be probed. By having an explicit
> > depends-on property, we can handle these cases correctly.
> >
> > Having functional dependencies explicitly called out in DT and
> > automatically added before the devices are probed, provides the
> > following benefits:
> >
> > - Optimizes device probe order and avoids the useless work of
> >   attempting probes of devices that will not probe successfully
> >   (because their suppliers aren't present or haven't probed yet).
> >
> >   For example, in a commonly available mobile SoC, registering just
> >   one consumer device's driver at an initcall level earlier than the
> >   supplier device's driver causes 11 failed probe attempts before the
> >   consumer device probes successfully. This was with a kernel with all
> >   the drivers statically compiled in. This problem gets a lot worse if
> >   all the drivers are loaded as modules without direct symbol
> >   dependencies.
>
> Do you have data on how much time is spent. Past 'smarter probing'
> attempts have not shown a significant difference.

"avoids the useless work attempting probes of devices that will not
probe successfully" -- I never claimed to save boot up time. Your
argument about having to save wall clock time is a moot point as a ton
of kernel features that optimize code won't save wall clock time (the
CPU would just run faster to make up for the inefficiency). Those
features just make the kernel less resource hungry and more efficient.
I'd understand your argument if this patch series is insanely complex
-- but that's not the case here.

> > - Supplier devices like clock providers, regulators providers, etc
> >   need to keep the resources they provide active and at a particular
> >   state(s) during boot up even if their current set of consumers don't
> >   request the resource to be active. This is because the rest of the
> >   consumers might not have probed yet and turning off the resource
> >   before all the consumers have probed could lead to a hang or
> >   undesired user experience.
>
> We already know generally what devices are dependencies because you
> just listed them. Why don't we make the kernel smarter by
> instantiating these core devices/drivers first instead of relying on
> initcall and link order.

That's what this patch series is -- it makes the kernel smarter by
just using the data from DT instead of relying on manual tweaking of
initcall and link order.

> >   Some frameworks (Eg: regulator) handle this today by turning off
> >   "unused" resources at late_initcall_sync and hoping all the devices
> >   have probed by then. This is not a valid assumption for systems with
> >   loadable modules. Other frameworks (Eg: clock) just don't handle
> >   this due to the lack of a clear signal for when they can turn off
> >   resources. This leads to downstream hacks to handle cases like this
> >   that can easily be solved in the upstream kernel.
>
> IMO, we should get rid of this auto disabling.

Well, you need to back that opinion with reasoning. IMO we should
disable unused resources so that we don't waste power -- especially on
devices operating on batteries.

Also, I explicitly said "need to keep the resources they provide
active and at a particular state(s) during boot up". So it's not even
about auto disabling. For example, in the case of a voltage regulator
supplying multiple devices, if the first device probes and says it
only need the lowest voltage level, you can't just drop the voltage.
Because the other devices in the same voltage rail haven't probed yet
and you can crash the system if you just drop the voltage. You need to
wait for all the devices to be probed and then you can let the voltage
regulator operate normally. And you can't depend on late_initcall
because it falls apart on systems with modules.


-Saravana
Saravana Kannan May 24, 2019, 9:53 p.m. UTC | #5
On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> > Add a generic "depends-on" property that allows specifying mandatory
> > functional dependencies between devices. Add device-links after the
> > devices are created (but before they are probed) by looking at this
> > "depends-on" property.
> >
> > This property is used instead of existing DT properties that specify
> > phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> > is because not all resources referred to by existing DT properties are
> > mandatory functional dependencies. Some devices/drivers might be able> to operate with reduced functionality when some of the resources
> > aren't available. For example, a device could operate in polling mode
> > if no IRQ is available, a device could skip doing power management if
> > clock or voltage control isn't available and they are left on, etc.
> >
> > So, adding mandatory functional dependency links between devices by
> > looking at referred phandles in DT properties won't work as it would
> > prevent probing devices that could be probed. By having an explicit
> > depends-on property, we can handle these cases correctly.
>
> Trying to wrap my brain around the concept, this series seems to be
> adding the ability to declare that an apparent dependency (eg an IRQ
> specified by a phandle) is _not_ actually a dependency.

The current implementation completely ignores existing bindings for
dependencies and so does the current tip of the kernel. So it's not
really overriding anything. However, if I change the implementation so
that depends-on becomes the source of truth if it exists and falls
back to existing common bindings if "depends-on" isn't present -- then
depends-on would truly be overriding existing bindings for
dependencies. It depends on how we want to define the DT property.

> The phandle already implies the dependency.

Sure, it might imply, but it's not always true.

> Creating a separate
> depends-on property provides a method of ignoring the implied
> dependencies.

implied != true

> This is not just hardware description.  It is instead a combination
> of hardware functionality and driver functionality.  An example
> provided in the second paragraph of the email I am replying to
> suggests a device could operate in polling mode if no IRQ is
> available.  Using this example, the devicetree does not know
> whether the driver requires the IRQ (currently an implied
> dependency since the IRQ phandle exists).  My understanding
> of this example is that the device node would _not_ have a
> depends-on property for the IRQ phandle so the IRQ would be
> optional.  But this is an attribute of the driver, not the
> hardware.

Not really. The interrupt could be for "SD card plugged in". That's
never a mandatory dependency for the SD card controller to work. So
the IRQ provider won't be a "depends-on" in this case. But if there is
no power supply or clock for the SD card controller, it isn't going to
work -- so they'd be listed in the "depends-on". So, this is still
defining the hardware and not the OS.

> This is also configuration, declaring whether the
> system is willing to accept polling mode instead of interrupt
> mode.

Whether the driver will choose to operate without the IRQ is up to it.
The OS could also assume the power supply is never turned off and
still try to use the device. Depending on the hardware configuration,
that might or might not work.

> Devicetree is not the proper place for driver description or
> for configuration.

But depends-on isn't describing the driver configuration though.

Overall, the clock provider example I gave in another reply is a much
better example. If you just assume implied dependencies are mandatory
dependencies, some devices will never be probe because the kernel is
using them incorrectly (they aren't meant to list mandatory
dependencies).

> Another flaw with this method is that existing device trees
> will be broken after the kernel is modified, because existing
> device trees do not have the depends-on property.  This breaks
> the devicetree compatibility rules.

This is 100% not true with the current implementation. I actually
tested this. This is fully backwards compatible. That's another reason
for adding depends-on and going by just what it says. The existing
bindings were never meant to describe only mandatory dependencies. So
using them as such is what would break backwards compatibility.

> > Having functional dependencies explicitly called out in DT and
> > automatically added before the devices are probed, provides the
> > following benefits:
> >
> > - Optimizes device probe order and avoids the useless work of
> >   attempting probes of devices that will not probe successfully
> >   (because their suppliers aren't present or haven't probed yet).
> >
> >   For example, in a commonly available mobile SoC, registering just
> >   one consumer device's driver at an initcall level earlier than the
> >   supplier device's driver causes 11 failed probe attempts before the
> >   consumer device probes successfully. This was with a kernel with all
> >   the drivers statically compiled in. This problem gets a lot worse if
> >   all the drivers are loaded as modules without direct symbol
> >   dependencies.
> >
> > - Supplier devices like clock providers, regulators providers, etc
> >   need to keep the resources they provide active and at a particular
> >   state(s) during boot up even if their current set of consumers don't
> >   request the resource to be active. This is because the rest of the
> >   consumers might not have probed yet and turning off the resource
> >   before all the consumers have probed could lead to a hang or
> >   undesired user experience.
> >
> >   Some frameworks (Eg: regulator) handle this today by turning off
> >   "unused" resources at late_initcall_sync and hoping all the devices
> >   have probed by then. This is not a valid assumption for systems with
> >   loadable modules. Other frameworks (Eg: clock) just don't handle
> >   this due to the lack of a clear signal for when they can turn off
> >   resources. This leads to downstream hacks to handle cases like this
> >   that can easily be solved in the upstream kernel.
> >
> >   By linking devices before they are probed, we give suppliers a clear
>
> By linking devices to suppliers before they are probed, we give suppliers a clear

Ack

> >   count of the number of dependent consumers. Once all of the
> >   consumers are active, the suppliers can turn off the unused
> >   resources without making assumptions about the number of consumers.
> >
> > By default we just add device-links to track "driver presence" (probe
> > succeeded) of the supplier device. If any other functionality provided
> > by device-links are needed, it is left to the consumer/supplier
> > devices to change the link when they probe.
> >
> >
> > Saravana Kannan (5):
> >   of/platform: Speed up of_find_device_by_node()
> >   driver core: Add device links support for pending links to suppliers
> >   dt-bindings: Add depends-on property
> >   of/platform: Add functional dependency link from "depends-on" property
> >   driver core: Add sync_state driver/bus callback
> >
> >  .../devicetree/bindings/depends-on.txt        |  26 +++++
> >  drivers/base/core.c                           | 106 ++++++++++++++++++
> >  drivers/of/platform.c                         |  75 ++++++++++++-
> >  include/linux/device.h                        |  24 ++++
> >  include/linux/of.h                            |   3 +
> >  5 files changed, 233 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
> >
>
> The above issues make this specific implementation not acceptable.
> I like the analysis of the problem areas, and I like the concepts of
> trying to solve not only probe ordering, but also the problem of
> when to turn off resources that will not be needed.

Beating a dead horse here, but I want to make sure I get this into as
many minds as possible:
It is NOT just about turning off resources. It's about the kernel
taking full control of resources (allowing the full range of voltages,
clock frequencies, bus configurations, etc) and syncing the HW state
to the SW state as determined by the consumers.

> But at the
> moment, I don't have a suggestion of a way to implement a solution.

The problem of syncing resources to SW state after boot up completed
has been broken for a long time in the kernel. It's high time we fix
it. I'm open to other suggestions, but we can't just say "we don't
have a solution".

For example, we can have a kernel command line argument that'll use
all common implicit bindings as mandatory dependencies and allow
"depends-on" to override them for the few cases where the implicit
dependencies don't match mandatory dependencies. Then:
- The kernel will be 100% backwards compatible with existing DT if the
command line arg isn't provided.
- New DT + old kernel will be no worse than today because old kernel
doesn't do any dependency tracking.
- Command line arg + new kernel + hardware where all implicit
dependencies are actually mandatory dependencies == things work
better.
- Command line arg + new kernel + hardware where all implicit
dependencies don't match mandatory dependencies + depends-on for those
exception case == things work better.

Would that be an acceptable use of "depends-on" to track mandatory dependencies?



-Saravana
Frank Rowand May 25, 2019, 12:16 a.m. UTC | #6
Hi Saravana,

On 5/24/19 2:53 PM, Saravana Kannan wrote:
> On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 5/23/19 6:01 PM, Saravana Kannan wrote:

< snip >

> 
> -Saravana
> 

There were several different topics in your email.  I am going to do
separate replies for different topics so that each topic is contained
in a single sub-thread instead of possibly having a many topic
sub-thread where any of the topics might get lost.

If I drop any key context out of any of the replies, please feel
free to add it back in.

-Frank
Frank Rowand May 25, 2019, 12:22 a.m. UTC | #7
On 5/24/19 2:53 PM, Saravana Kannan wrote:
> On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 5/23/19 6:01 PM, Saravana Kannan wrote:

< snip >

>> Another flaw with this method is that existing device trees
>> will be broken after the kernel is modified, because existing
>> device trees do not have the depends-on property.  This breaks
>> the devicetree compatibility rules.
> 
> This is 100% not true with the current implementation. I actually
> tested this. This is fully backwards compatible. That's another reason
> for adding depends-on and going by just what it says. The existing
> bindings were never meant to describe only mandatory dependencies. So
> using them as such is what would break backwards compatibility.

Are you saying that an existing, already compiled, devicetree (an FDT)
can be used to boot a new kernel that has implemented this patch set?

The new kernel will boot with the existing FDT that does not have
any depends-on properties?

-Frank
Frank Rowand May 25, 2019, 12:25 a.m. UTC | #8
On 5/24/19 5:22 PM, Frank Rowand wrote:
> On 5/24/19 2:53 PM, Saravana Kannan wrote:
>> On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>
>>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> 
> < snip >
> 
>>> Another flaw with this method is that existing device trees
>>> will be broken after the kernel is modified, because existing
>>> device trees do not have the depends-on property.  This breaks
>>> the devicetree compatibility rules.
>>
>> This is 100% not true with the current implementation. I actually
>> tested this. This is fully backwards compatible. That's another reason
>> for adding depends-on and going by just what it says. The existing
>> bindings were never meant to describe only mandatory dependencies. So
>> using them as such is what would break backwards compatibility.
> 
> Are you saying that an existing, already compiled, devicetree (an FDT)
> can be used to boot a new kernel that has implemented this patch set?
> 
> The new kernel will boot with the existing FDT that does not have
> any depends-on properties?

I overlooked something you said in the email I replied to.  You said:

   "that depends-on becomes the source of truth if it exists and falls
   back to existing common bindings if "depends-on" isn't present"

Let me go back to look at the patch series to see how it falls back
to the existing bindings.

> 
> -Frank
>
Saravana Kannan May 25, 2019, 2:08 a.m. UTC | #9
Ugh... mobile app is sending HTML emails. Replying again.

On Fri, May 24, 2019 at 5:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 5/24/19 5:22 PM, Frank Rowand wrote:
> > On 5/24/19 2:53 PM, Saravana Kannan wrote:
> >> On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>
> >>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> >
> > < snip >
> >
> >>> Another flaw with this method is that existing device trees
> >>> will be broken after the kernel is modified, because existing
> >>> device trees do not have the depends-on property.  This breaks
> >>> the devicetree compatibility rules.
> >>
> >> This is 100% not true with the current implementation. I actually
> >> tested this. This is fully backwards compatible. That's another reason
> >> for adding depends-on and going by just what it says. The existing
> >> bindings were never meant to describe only mandatory dependencies. So
> >> using them as such is what would break backwards compatibility.
> >
> > Are you saying that an existing, already compiled, devicetree (an FDT)
> > can be used to boot a new kernel that has implemented this patch set?
> >
> > The new kernel will boot with the existing FDT that does not have
> > any depends-on properties?

You sent out a lot of emails on this topic. But to answer them all.
The existing implementation is 100% backwards compatible.

> I overlooked something you said in the email I replied to.  You said:
>
>    "that depends-on becomes the source of truth if it exists and falls
>    back to existing common bindings if "depends-on" isn't present"

This is referring to an alternate implementation where implicit
dependencies are used by the kernel but new "depends-on" property
would allow overriding in cases where the implicit dependencies are
wrong. But this will need a kernel command line flag to enable this
feature so that we can be backwards compatible. Otherwise it won't be.

> Let me go back to look at the patch series to see how it falls back
> to the existing bindings.

Current patch series doesn't really "fallback" but rather it only acts
on this new property. Existing FDT binaries simply don't have this. So
it won't have any impact on the kernel behavior. But yes, looking at
the patches again will help :)

-Saravana
Saravana Kannan May 25, 2019, 2:17 a.m. UTC | #10
On Thu, May 23, 2019 at 10:52 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, May 23, 2019 at 06:01:11PM -0700, Saravana Kannan wrote:
> > Add a generic "depends-on" property that allows specifying mandatory
> > functional dependencies between devices. Add device-links after the
> > devices are created (but before they are probed) by looking at this
> > "depends-on" property.
> >
> > This property is used instead of existing DT properties that specify
> > phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> > is because not all resources referred to by existing DT properties are
> > mandatory functional dependencies. Some devices/drivers might be able
> > to operate with reduced functionality when some of the resources
> > aren't available. For example, a device could operate in polling mode
> > if no IRQ is available, a device could skip doing power management if
> > clock or voltage control isn't available and they are left on, etc.
> >
> > So, adding mandatory functional dependency links between devices by
> > looking at referred phandles in DT properties won't work as it would
> > prevent probing devices that could be probed. By having an explicit
> > depends-on property, we can handle these cases correctly.
> >
> > Having functional dependencies explicitly called out in DT and
> > automatically added before the devices are probed, provides the
> > following benefits:
> >
> > - Optimizes device probe order and avoids the useless work of
> >   attempting probes of devices that will not probe successfully
> >   (because their suppliers aren't present or haven't probed yet).
> >
> >   For example, in a commonly available mobile SoC, registering just
> >   one consumer device's driver at an initcall level earlier than the
> >   supplier device's driver causes 11 failed probe attempts before the
> >   consumer device probes successfully. This was with a kernel with all
> >   the drivers statically compiled in. This problem gets a lot worse if
> >   all the drivers are loaded as modules without direct symbol
> >   dependencies.
> >
> > - Supplier devices like clock providers, regulators providers, etc
> >   need to keep the resources they provide active and at a particular
> >   state(s) during boot up even if their current set of consumers don't
> >   request the resource to be active. This is because the rest of the
> >   consumers might not have probed yet and turning off the resource
> >   before all the consumers have probed could lead to a hang or
> >   undesired user experience.
> >
> >   Some frameworks (Eg: regulator) handle this today by turning off
> >   "unused" resources at late_initcall_sync and hoping all the devices
> >   have probed by then. This is not a valid assumption for systems with
> >   loadable modules. Other frameworks (Eg: clock) just don't handle
> >   this due to the lack of a clear signal for when they can turn off
> >   resources. This leads to downstream hacks to handle cases like this
> >   that can easily be solved in the upstream kernel.
> >
> >   By linking devices before they are probed, we give suppliers a clear
> >   count of the number of dependent consumers. Once all of the
> >   consumers are active, the suppliers can turn off the unused
> >   resources without making assumptions about the number of consumers.
> >
> > By default we just add device-links to track "driver presence" (probe
> > succeeded) of the supplier device. If any other functionality provided
> > by device-links are needed, it is left to the consumer/supplier
> > devices to change the link when they probe.
>
> Somewhere in this wall of text you need to say:
>         MAKES DEVICES BOOT FASTER!
> right?  :)

I'm sure it will, but I can't easily test and measure this number
because I don't have a device with 100s of devices (common in mobile
SoCs) where I can load all the drivers as modules and are supported
upstream. And the current ones I have mostly workaround this in their
downstream tree by manually ordering with initcalls and link order.
But I see the avoidance of useless probes that'll fail as more of a
free side benefit and not the main goal of this patch series. Getting
modules to actually work and crash the system while booting is the
main goal.

> So in short, this solves the issue of deferred probing with systems with
> loads of modules for platform devices and device tree, in that now you
> have a chance to probe devices in the correct order saving loads of busy
> loops.

Yes, definitely saves loads of busy work.

> A good thing, I like this, very nice work, all of these are:
>         Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Thanks!

> but odds are I'll take this through my tree, so I'll add my s-o-b then.
> But only after the DT people agree on the new entry.

Yup! Trying to do that. :)

-Saravana
Frank Rowand May 25, 2019, 2:40 a.m. UTC | #11
Hi Saranova,

I'll try to address the other portions of this email that I <snipped>
in my previous replies.


On 5/24/19 2:53 PM, Saravana Kannan wrote:
> On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
>>> Add a generic "depends-on" property that allows specifying mandatory
>>> functional dependencies between devices. Add device-links after the
>>> devices are created (but before they are probed) by looking at this
>>> "depends-on" property.
>>>
>>> This property is used instead of existing DT properties that specify
>>> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
>>> is because not all resources referred to by existing DT properties are
>>> mandatory functional dependencies. Some devices/drivers might be able> to operate with reduced functionality when some of the resources
>>> aren't available. For example, a device could operate in polling mode
>>> if no IRQ is available, a device could skip doing power management if
>>> clock or voltage control isn't available and they are left on, etc.
>>>
>>> So, adding mandatory functional dependency links between devices by
>>> looking at referred phandles in DT properties won't work as it would
>>> prevent probing devices that could be probed. By having an explicit
>>> depends-on property, we can handle these cases correctly.
>>
>> Trying to wrap my brain around the concept, this series seems to be
>> adding the ability to declare that an apparent dependency (eg an IRQ
>> specified by a phandle) is _not_ actually a dependency.
> 
> The current implementation completely ignores existing bindings for
> dependencies and so does the current tip of the kernel. So it's not
> really overriding anything. However, if I change the implementation so
> that depends-on becomes the source of truth if it exists and falls
> back to existing common bindings if "depends-on" isn't present -- then
> depends-on would truly be overriding existing bindings for
> dependencies. It depends on how we want to define the DT property.
> 
>> The phandle already implies the dependency.
> 
> Sure, it might imply, but it's not always true.
> 
>> Creating a separate
>> depends-on property provides a method of ignoring the implied
>> dependencies.
> 
> implied != true
> 
>> This is not just hardware description.  It is instead a combination
>> of hardware functionality and driver functionality.  An example
>> provided in the second paragraph of the email I am replying to
>> suggests a device could operate in polling mode if no IRQ is
>> available.  Using this example, the devicetree does not know
>> whether the driver requires the IRQ (currently an implied
>> dependency since the IRQ phandle exists).  My understanding
>> of this example is that the device node would _not_ have a
>> depends-on property for the IRQ phandle so the IRQ would be
>> optional.  But this is an attribute of the driver, not the
>> hardware.
> 

> Not really. The interrupt could be for "SD card plugged in". That's
> never a mandatory dependency for the SD card controller to work. So
> the IRQ provider won't be a "depends-on" in this case. But if there is
> no power supply or clock for the SD card controller, it isn't going to
> work -- so they'd be listed in the "depends-on". So, this is still
> defining the hardware and not the OS.

Please comment on my observation that was based on an IRQ for a device
will polling mode vs interrupt driven mode.  You described a different
case and did not address my comment.


>> This is also configuration, declaring whether the
>> system is willing to accept polling mode instead of interrupt
>> mode.
> 
> Whether the driver will choose to operate without the IRQ is up to it.
> The OS could also assume the power supply is never turned off and
> still try to use the device. Depending on the hardware configuration,
> that might or might not work.
> 
>> Devicetree is not the proper place for driver description or
>> for configuration.
> 
> But depends-on isn't describing the driver configuration though.
> 
> Overall, the clock provider example I gave in another reply is a much
> better example. If you just assume implied dependencies are mandatory
> dependencies, some devices will never be probe because the kernel is
> using them incorrectly (they aren't meant to list mandatory
> dependencies).
> 
>> Another flaw with this method is that existing device trees
>> will be broken after the kernel is modified, because existing
>> device trees do not have the depends-on property.  This breaks
>> the devicetree compatibility rules.
> 
> This is 100% not true with the current implementation. I actually
> tested this. This is fully backwards compatible. That's another reason
> for adding depends-on and going by just what it says. The existing
> bindings were never meant to describe only mandatory dependencies. So
> using them as such is what would break backwards compatibility.
> 
>>> Having functional dependencies explicitly called out in DT and
>>> automatically added before the devices are probed, provides the
>>> following benefits:
>>>
>>> - Optimizes device probe order and avoids the useless work of
>>>   attempting probes of devices that will not probe successfully
>>>   (because their suppliers aren't present or haven't probed yet).
>>>
>>>   For example, in a commonly available mobile SoC, registering just
>>>   one consumer device's driver at an initcall level earlier than the
>>>   supplier device's driver causes 11 failed probe attempts before the
>>>   consumer device probes successfully. This was with a kernel with all
>>>   the drivers statically compiled in. This problem gets a lot worse if
>>>   all the drivers are loaded as modules without direct symbol
>>>   dependencies.
>>>
>>> - Supplier devices like clock providers, regulators providers, etc
>>>   need to keep the resources they provide active and at a particular
>>>   state(s) during boot up even if their current set of consumers don't
>>>   request the resource to be active. This is because the rest of the
>>>   consumers might not have probed yet and turning off the resource
>>>   before all the consumers have probed could lead to a hang or
>>>   undesired user experience.
>>>
>>>   Some frameworks (Eg: regulator) handle this today by turning off
>>>   "unused" resources at late_initcall_sync and hoping all the devices
>>>   have probed by then. This is not a valid assumption for systems with
>>>   loadable modules. Other frameworks (Eg: clock) just don't handle
>>>   this due to the lack of a clear signal for when they can turn off
>>>   resources. This leads to downstream hacks to handle cases like this
>>>   that can easily be solved in the upstream kernel.
>>>
>>>   By linking devices before they are probed, we give suppliers a clear
>>
>> By linking devices to suppliers before they are probed, we give suppliers a clear
> 
> Ack
> 
>>>   count of the number of dependent consumers. Once all of the
>>>   consumers are active, the suppliers can turn off the unused
>>>   resources without making assumptions about the number of consumers.
>>>
>>> By default we just add device-links to track "driver presence" (probe
>>> succeeded) of the supplier device. If any other functionality provided
>>> by device-links are needed, it is left to the consumer/supplier
>>> devices to change the link when they probe.
>>>
>>>
>>> Saravana Kannan (5):
>>>   of/platform: Speed up of_find_device_by_node()
>>>   driver core: Add device links support for pending links to suppliers
>>>   dt-bindings: Add depends-on property
>>>   of/platform: Add functional dependency link from "depends-on" property
>>>   driver core: Add sync_state driver/bus callback
>>>
>>>  .../devicetree/bindings/depends-on.txt        |  26 +++++
>>>  drivers/base/core.c                           | 106 ++++++++++++++++++
>>>  drivers/of/platform.c                         |  75 ++++++++++++-
>>>  include/linux/device.h                        |  24 ++++
>>>  include/linux/of.h                            |   3 +
>>>  5 files changed, 233 insertions(+), 1 deletion(-)
>>>  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
>>>
>>
>> The above issues make this specific implementation not acceptable.
>> I like the analysis of the problem areas, and I like the concepts of
>> trying to solve not only probe ordering, but also the problem of
>> when to turn off resources that will not be needed.
> 
> Beating a dead horse here, but I want to make sure I get this into as
> many minds as possible:
> It is NOT just about turning off resources. It's about the kernel
> taking full control of resources (allowing the full range of voltages,
> clock frequencies, bus configurations, etc) and syncing the HW state
> to the SW state as determined by the consumers.
> 
>> But at the
>> moment, I don't have a suggestion of a way to implement a solution.
> 
> The problem of syncing resources to SW state after boot up completed
> has been broken for a long time in the kernel. It's high time we fix
> it. I'm open to other suggestions, but we can't just say "we don't
> have a solution".
> 
> For example, we can have a kernel command line argument that'll use
> all common implicit bindings as mandatory dependencies and allow
> "depends-on" to override them for the few cases where the implicit
> dependencies don't match mandatory dependencies. Then:
> - The kernel will be 100% backwards compatible with existing DT if the
> command line arg isn't provided.
> - New DT + old kernel will be no worse than today because old kernel
> doesn't do any dependency tracking.
> - Command line arg + new kernel + hardware where all implicit
> dependencies are actually mandatory dependencies == things work
> better.
> - Command line arg + new kernel + hardware where all implicit
> dependencies don't match mandatory dependencies + depends-on for those
> exception case == things work better.

Using a command line argument for this purpose just seems to be a
hack and bad architecture.

It also seems like an invitation to mis-configure a system (in other
words, increases the complexity and difficulty of properly configuring
and administering a system).

The is not a hard no (yet), but it will take some convincing for me
to accept the command line approach to add the feature, yet maintain
compatibility.  Please do not spend any time replying to this concern
yet - we will have plenty of time to discuss later if need be.

> 
> Would that be an acceptable use of "depends-on" to track mandatory dependencies?
> 
> 
> 
> -Saravana
> 

-Frank
Saravana Kannan May 25, 2019, 4:04 a.m. UTC | #12
On Fri, May 24, 2019 at 7:40 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> Hi Saranova,
>
> I'll try to address the other portions of this email that I <snipped>
> in my previous replies.
>
>
> On 5/24/19 2:53 PM, Saravana Kannan wrote:
> > On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> >>> Add a generic "depends-on" property that allows specifying mandatory
> >>> functional dependencies between devices. Add device-links after the
> >>> devices are created (but before they are probed) by looking at this
> >>> "depends-on" property.
> >>>
> >>> This property is used instead of existing DT properties that specify
> >>> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> >>> is because not all resources referred to by existing DT properties are
> >>> mandatory functional dependencies. Some devices/drivers might be able> to operate with reduced functionality when some of the resources
> >>> aren't available. For example, a device could operate in polling mode
> >>> if no IRQ is available, a device could skip doing power management if
> >>> clock or voltage control isn't available and they are left on, etc.
> >>>
> >>> So, adding mandatory functional dependency links between devices by
> >>> looking at referred phandles in DT properties won't work as it would
> >>> prevent probing devices that could be probed. By having an explicit
> >>> depends-on property, we can handle these cases correctly.
> >>
> >> Trying to wrap my brain around the concept, this series seems to be
> >> adding the ability to declare that an apparent dependency (eg an IRQ
> >> specified by a phandle) is _not_ actually a dependency.
> >
> > The current implementation completely ignores existing bindings for
> > dependencies and so does the current tip of the kernel. So it's not
> > really overriding anything. However, if I change the implementation so
> > that depends-on becomes the source of truth if it exists and falls
> > back to existing common bindings if "depends-on" isn't present -- then
> > depends-on would truly be overriding existing bindings for
> > dependencies. It depends on how we want to define the DT property.
> >
> >> The phandle already implies the dependency.
> >
> > Sure, it might imply, but it's not always true.
> >
> >> Creating a separate
> >> depends-on property provides a method of ignoring the implied
> >> dependencies.
> >
> > implied != true
> >
> >> This is not just hardware description.  It is instead a combination
> >> of hardware functionality and driver functionality.  An example
> >> provided in the second paragraph of the email I am replying to
> >> suggests a device could operate in polling mode if no IRQ is
> >> available.  Using this example, the devicetree does not know
> >> whether the driver requires the IRQ (currently an implied
> >> dependency since the IRQ phandle exists).  My understanding
> >> of this example is that the device node would _not_ have a
> >> depends-on property for the IRQ phandle so the IRQ would be
> >> optional.  But this is an attribute of the driver, not the
> >> hardware.
> >
>
> > Not really. The interrupt could be for "SD card plugged in". That's
> > never a mandatory dependency for the SD card controller to work. So
> > the IRQ provider won't be a "depends-on" in this case. But if there is
> > no power supply or clock for the SD card controller, it isn't going to
> > work -- so they'd be listed in the "depends-on". So, this is still
> > defining the hardware and not the OS.
>
> Please comment on my observation that was based on an IRQ for a device
> will polling mode vs interrupt driven mode.
> You described a different
> case and did not address my comment.

I thought I did reply -- not sure what part you are looking for so
I'll rephrase. I was just picking the SD card controller as a concrete
example of device that can work with or without an interrupt. But
sure, I can call it "the device".

And yes, the device won't have a "depends-on" on the IRQ provider
because the device can still work without a working (as in bound to
driver) IRQ provider. Whether the driver insists on waiting on an IRQ
provider or not is up to the driver and the depends-on property is NOT
trying to dictate what the driver should do in this case. Does that
answer your implied question?

>
> >> This is also configuration, declaring whether the
> >> system is willing to accept polling mode instead of interrupt
> >> mode.
> >
> > Whether the driver will choose to operate without the IRQ is up to it.
> > The OS could also assume the power supply is never turned off and
> > still try to use the device. Depending on the hardware configuration,
> > that might or might not work.
> >
> >> Devicetree is not the proper place for driver description or
> >> for configuration.
> >
> > But depends-on isn't describing the driver configuration though.
> >
> > Overall, the clock provider example I gave in another reply is a much
> > better example. If you just assume implied dependencies are mandatory
> > dependencies, some devices will never be probe because the kernel is
> > using them incorrectly (they aren't meant to list mandatory
> > dependencies).
> >
> >> Another flaw with this method is that existing device trees
> >> will be broken after the kernel is modified, because existing
> >> device trees do not have the depends-on property.  This breaks
> >> the devicetree compatibility rules.
> >
> > This is 100% not true with the current implementation. I actually
> > tested this. This is fully backwards compatible. That's another reason
> > for adding depends-on and going by just what it says. The existing
> > bindings were never meant to describe only mandatory dependencies. So
> > using them as such is what would break backwards compatibility.
> >
> >>> Having functional dependencies explicitly called out in DT and
> >>> automatically added before the devices are probed, provides the
> >>> following benefits:
> >>>
> >>> - Optimizes device probe order and avoids the useless work of
> >>>   attempting probes of devices that will not probe successfully
> >>>   (because their suppliers aren't present or haven't probed yet).
> >>>
> >>>   For example, in a commonly available mobile SoC, registering just
> >>>   one consumer device's driver at an initcall level earlier than the
> >>>   supplier device's driver causes 11 failed probe attempts before the
> >>>   consumer device probes successfully. This was with a kernel with all
> >>>   the drivers statically compiled in. This problem gets a lot worse if
> >>>   all the drivers are loaded as modules without direct symbol
> >>>   dependencies.
> >>>
> >>> - Supplier devices like clock providers, regulators providers, etc
> >>>   need to keep the resources they provide active and at a particular
> >>>   state(s) during boot up even if their current set of consumers don't
> >>>   request the resource to be active. This is because the rest of the
> >>>   consumers might not have probed yet and turning off the resource
> >>>   before all the consumers have probed could lead to a hang or
> >>>   undesired user experience.
> >>>
> >>>   Some frameworks (Eg: regulator) handle this today by turning off
> >>>   "unused" resources at late_initcall_sync and hoping all the devices
> >>>   have probed by then. This is not a valid assumption for systems with
> >>>   loadable modules. Other frameworks (Eg: clock) just don't handle
> >>>   this due to the lack of a clear signal for when they can turn off
> >>>   resources. This leads to downstream hacks to handle cases like this
> >>>   that can easily be solved in the upstream kernel.
> >>>
> >>>   By linking devices before they are probed, we give suppliers a clear
> >>
> >> By linking devices to suppliers before they are probed, we give suppliers a clear
> >
> > Ack
> >
> >>>   count of the number of dependent consumers. Once all of the
> >>>   consumers are active, the suppliers can turn off the unused
> >>>   resources without making assumptions about the number of consumers.
> >>>
> >>> By default we just add device-links to track "driver presence" (probe
> >>> succeeded) of the supplier device. If any other functionality provided
> >>> by device-links are needed, it is left to the consumer/supplier
> >>> devices to change the link when they probe.
> >>>
> >>>
> >>> Saravana Kannan (5):
> >>>   of/platform: Speed up of_find_device_by_node()
> >>>   driver core: Add device links support for pending links to suppliers
> >>>   dt-bindings: Add depends-on property
> >>>   of/platform: Add functional dependency link from "depends-on" property
> >>>   driver core: Add sync_state driver/bus callback
> >>>
> >>>  .../devicetree/bindings/depends-on.txt        |  26 +++++
> >>>  drivers/base/core.c                           | 106 ++++++++++++++++++
> >>>  drivers/of/platform.c                         |  75 ++++++++++++-
> >>>  include/linux/device.h                        |  24 ++++
> >>>  include/linux/of.h                            |   3 +
> >>>  5 files changed, 233 insertions(+), 1 deletion(-)
> >>>  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
> >>>
> >>
> >> The above issues make this specific implementation not acceptable.
> >> I like the analysis of the problem areas, and I like the concepts of
> >> trying to solve not only probe ordering, but also the problem of
> >> when to turn off resources that will not be needed.
> >
> > Beating a dead horse here, but I want to make sure I get this into as
> > many minds as possible:
> > It is NOT just about turning off resources. It's about the kernel
> > taking full control of resources (allowing the full range of voltages,
> > clock frequencies, bus configurations, etc) and syncing the HW state
> > to the SW state as determined by the consumers.
> >
> >> But at the
> >> moment, I don't have a suggestion of a way to implement a solution.
> >
> > The problem of syncing resources to SW state after boot up completed
> > has been broken for a long time in the kernel. It's high time we fix
> > it. I'm open to other suggestions, but we can't just say "we don't
> > have a solution".
> >
> > For example, we can have a kernel command line argument that'll use
> > all common implicit bindings as mandatory dependencies and allow
> > "depends-on" to override them for the few cases where the implicit
> > dependencies don't match mandatory dependencies. Then:
> > - The kernel will be 100% backwards compatible with existing DT if the
> > command line arg isn't provided.
> > - New DT + old kernel will be no worse than today because old kernel
> > doesn't do any dependency tracking.
> > - Command line arg + new kernel + hardware where all implicit
> > dependencies are actually mandatory dependencies == things work
> > better.
> > - Command line arg + new kernel + hardware where all implicit
> > dependencies don't match mandatory dependencies + depends-on for those
> > exception case == things work better.
>
> Using a command line argument for this purpose just seems to be a
> hack and bad architecture.

I agree -- which is why the current implementation doesn't try to form
mandatory dependency from implicit bindings and makes the case that
all mandatory dependencies should be called out explicitly. But if
people insist on that implicit bindings be used as such, then a CONFIG
or commandline option would be an acceptable compromise for me.

> It also seems like an invitation to mis-configure a system (in other
> words, increases the complexity and difficulty of properly configuring
> and administering a system).

Just want to point out that, as of today, we have a broken system --
module loading is not compatible with proper handling of shared
mandatory resources during boot up. To be clear, I'm not arguing for a
commandline arg.

> The is not a hard no (yet), but it will take some convincing for me
> to accept the command line approach to add the feature, yet maintain
> compatibility.  Please do not spend any time replying to this concern
> yet - we will have plenty of time to discuss later if need be.

Sounds good.

-Saravana
Saravana Kannan May 29, 2019, 8:02 p.m. UTC | #13
Sending again because email client somehow reverted to HTML.
Frank, Rob, Mark,

Gentle reminder. I've replied to your emails spread across the
different patches in the series. Hoping they address your questions
and concerns. Please let me know what you think.

Thanks,
Saravana


On Wed, May 29, 2019 at 1:00 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Frank, Rob, Mark,
>
> Gentle reminder. I've replied to your emails spread across the different patches in the series. Hoping they address your questions and concerns. Please let me know what you think.
>
> Thanks,
> Saravana
>
> On Thu, May 23, 2019 at 6:01 PM Saravana Kannan <saravanak@google.com> wrote:
>>
>> Add a generic "depends-on" property that allows specifying mandatory
>> functional dependencies between devices. Add device-links after the
>> devices are created (but before they are probed) by looking at this
>> "depends-on" property.
>>
>> This property is used instead of existing DT properties that specify
>> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
>> is because not all resources referred to by existing DT properties are
>> mandatory functional dependencies. Some devices/drivers might be able
>> to operate with reduced functionality when some of the resources
>> aren't available. For example, a device could operate in polling mode
>> if no IRQ is available, a device could skip doing power management if
>> clock or voltage control isn't available and they are left on, etc.
>>
>> So, adding mandatory functional dependency links between devices by
>> looking at referred phandles in DT properties won't work as it would
>> prevent probing devices that could be probed. By having an explicit
>> depends-on property, we can handle these cases correctly.
>>
>> Having functional dependencies explicitly called out in DT and
>> automatically added before the devices are probed, provides the
>> following benefits:
>>
>> - Optimizes device probe order and avoids the useless work of
>>   attempting probes of devices that will not probe successfully
>>   (because their suppliers aren't present or haven't probed yet).
>>
>>   For example, in a commonly available mobile SoC, registering just
>>   one consumer device's driver at an initcall level earlier than the
>>   supplier device's driver causes 11 failed probe attempts before the
>>   consumer device probes successfully. This was with a kernel with all
>>   the drivers statically compiled in. This problem gets a lot worse if
>>   all the drivers are loaded as modules without direct symbol
>>   dependencies.
>>
>> - Supplier devices like clock providers, regulators providers, etc
>>   need to keep the resources they provide active and at a particular
>>   state(s) during boot up even if their current set of consumers don't
>>   request the resource to be active. This is because the rest of the
>>   consumers might not have probed yet and turning off the resource
>>   before all the consumers have probed could lead to a hang or
>>   undesired user experience.
>>
>>   Some frameworks (Eg: regulator) handle this today by turning off
>>   "unused" resources at late_initcall_sync and hoping all the devices
>>   have probed by then. This is not a valid assumption for systems with
>>   loadable modules. Other frameworks (Eg: clock) just don't handle
>>   this due to the lack of a clear signal for when they can turn off
>>   resources. This leads to downstream hacks to handle cases like this
>>   that can easily be solved in the upstream kernel.
>>
>>   By linking devices before they are probed, we give suppliers a clear
>>   count of the number of dependent consumers. Once all of the
>>   consumers are active, the suppliers can turn off the unused
>>   resources without making assumptions about the number of consumers.
>>
>> By default we just add device-links to track "driver presence" (probe
>> succeeded) of the supplier device. If any other functionality provided
>> by device-links are needed, it is left to the consumer/supplier
>> devices to change the link when they probe.
>>
>>
>> Saravana Kannan (5):
>>   of/platform: Speed up of_find_device_by_node()
>>   driver core: Add device links support for pending links to suppliers
>>   dt-bindings: Add depends-on property
>>   of/platform: Add functional dependency link from "depends-on" property
>>   driver core: Add sync_state driver/bus callback
>>
>>  .../devicetree/bindings/depends-on.txt        |  26 +++++
>>  drivers/base/core.c                           | 106 ++++++++++++++++++
>>  drivers/of/platform.c                         |  75 ++++++++++++-
>>  include/linux/device.h                        |  24 ++++
>>  include/linux/of.h                            |   3 +
>>  5 files changed, 233 insertions(+), 1 deletion(-)
>>  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
>>
>> --
>> 2.22.0.rc1.257.g3120a18244-goog
>>
David Collins May 31, 2019, 11:27 p.m. UTC | #14
Hello Saravana,

On 5/23/19 6:01 PM, Saravana Kannan wrote:
...
> Having functional dependencies explicitly called out in DT and
> automatically added before the devices are probed, provides the
> following benefits:
...
> - Supplier devices like clock providers, regulators providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
This benefit provided by the sync_state() callback function introduced in
this series gives us a mechanism to solve a specific problem encountered
on Qualcomm Technologies, Inc. (QTI) boards when booting with drivers
compiled as modules.  QTI boards have a regulator that powers the PHYs for
display, camera, USB, UFS, and PCIe.  When these boards boot up, the boot
loader enables this regulator along with other resources in order to
display a splash screen image.  The regulator must remain enabled until
the Linux display driver has probed and made a request with the regulator
framework to keep the regulator enabled.  If the regulator is disabled
prematurely, then the screen image is corrupted and the display hardware
enters a bad state.

We have observed that when the camera driver probes before the display
driver, it performs this sequence: regulator_enable(), camera register IO,
regulator_disable().  Since it is the first consumer of the shared
regulator, the regulator is physically disabled (even though display
hardware still requires it to be enabled).  We have solved this problem
when compiling drivers statically with a downstream regulator
proxy-consumer driver.  This proxy-consumer is able to make an enable
request for the shared regulator before any other consumer.  It then
removes its request at late_initcall_sync.

Unfortunately, when drivers are compiled as modules instead of compiled
statically into the kernel image, late_initcall_sync is not a meaningful
marker of driver probe completion.  This means that our existing proxy
voting system will not work when drivers are compiled as modules.  The
sync_state() callback resolves this issue by providing a notification that
is guaranteed to arrive only after all consumers of the shared regulator
have probed.

QTI boards have other cases of shared resources such as bus bandwidth
which must remain at least at a level set by boot loaders in order to
properly support hardware blocks that are enabled before the Linux kernel
starts booting.

Take care,
David
Frank Rowand June 11, 2019, 2:56 p.m. UTC | #15
Hi Saravana,

On 5/24/19 9:04 PM, Saravana Kannan wrote:
> On Fri, May 24, 2019 at 7:40 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> Hi Saranova,
>>
>> I'll try to address the other portions of this email that I <snipped>
>> in my previous replies.
>>
>>
>> On 5/24/19 2:53 PM, Saravana Kannan wrote:
>>> On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
>>>>> Add a generic "depends-on" property that allows specifying mandatory
>>>>> functional dependencies between devices. Add device-links after the
>>>>> devices are created (but before they are probed) by looking at this
>>>>> "depends-on" property.
>>>>>
>>>>> This property is used instead of existing DT properties that specify
>>>>> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
>>>>> is because not all resources referred to by existing DT properties are
>>>>> mandatory functional dependencies. Some devices/drivers might be able>>>>> to operate with reduced functionality when some of the resources


In your original email, you say this:

>>>>> aren't available. For example, a device could operate in polling mode
>>>>> if no IRQ is available, a device could skip doing power management if
>>>>> clock or voltage control isn't available and they are left on, etc.


>>>>>
>>>>> So, adding mandatory functional dependency links between devices by
>>>>> looking at referred phandles in DT properties won't work as it would
>>>>> prevent probing devices that could be probed. By having an explicit
>>>>> depends-on property, we can handle these cases correctly.
>>>>
>>>> Trying to wrap my brain around the concept, this series seems to be
>>>> adding the ability to declare that an apparent dependency (eg an IRQ
>>>> specified by a phandle) is _not_ actually a dependency.
>>>
>>> The current implementation completely ignores existing bindings for
>>> dependencies and so does the current tip of the kernel. So it's not
>>> really overriding anything. However, if I change the implementation so
>>> that depends-on becomes the source of truth if it exists and falls
>>> back to existing common bindings if "depends-on" isn't present -- then
>>> depends-on would truly be overriding existing bindings for
>>> dependencies. It depends on how we want to define the DT property.
>>>
>>>> The phandle already implies the dependency.
>>>
>>> Sure, it might imply, but it's not always true.
>>>
>>>> Creating a separate
>>>> depends-on property provides a method of ignoring the implied
>>>> dependencies.
>>>
>>> implied != true
>>>

I refer to your irq mode vs polled mode device example:

>>>> This is not just hardware description.  It is instead a combination
>>>> of hardware functionality and driver functionality.  An example
>>>> provided in the second paragraph of the email I am replying to
>>>> suggests a device could operate in polling mode if no IRQ is
>>>> available.  Using this example, the devicetree does not know
>>>> whether the driver requires the IRQ (currently an implied
>>>> dependency since the IRQ phandle exists).  My understanding
>>>> of this example is that the device node would _not_ have a
>>>> depends-on property for the IRQ phandle so the IRQ would be
>>>> optional.  But this is an attribute of the driver, not the
>>>> hardware.
>>>
>>

You change the subject from irq mode vs polled mode device to some
other type of device:

>>> Not really. The interrupt could be for "SD card plugged in". That's
>>> never a mandatory dependency for the SD card controller to work. So
>>> the IRQ provider won't be a "depends-on" in this case. But if there is
>>> no power supply or clock for the SD card controller, it isn't going to
>>> work -- so they'd be listed in the "depends-on". So, this is still
>>> defining the hardware and not the OS.
>>

I again try to get you to discuss the irq mode vs polled mode device:

>> Please comment on my observation that was based on an IRQ for a device
>> will polling mode vs interrupt driven mode.
>> You described a different
>> case and did not address my comment.
> > I thought I did reply -- not sure what part you are looking for so
> I'll rephrase. I was just picking the SD card controller as a concrete
> example of device that can work with or without an interrupt. But
> sure, I can call it "the device".
> 

And the thread is so deeply nested that you are missing the original
point that I made.

> And yes, the device won't have a "depends-on" on the IRQ provider
> because the device can still work without a working (as in bound to
> driver) IRQ provider. Whether the driver insists on waiting on an IRQ
> provider or not is up to the driver and the depends-on property is NOT
> trying to dictate what the driver should do in this case. Does that
> answer your implied question?

If the device _must_ operate in irq mode to achieve the throughput
that is _required_ for the system to be functional then that system
would need a devicetree to have a "depends-on" property for the irq.
But another system using the same exact hardware might be able to
tolerate the reduced throughput of operating in polled mode.  This
second system would need a devicetree that does _not_ have a
depends-on property for that same irq, as used by that same device.

This then becomes configuration, not hardware description, as noted
in my next paragraph:

> 
>>
>>>> This is also configuration, declaring whether the
>>>> system is willing to accept polling mode instead of interrupt
>>>> mode.

-Frank

>>>
>>> Whether the driver will choose to operate without the IRQ is up to it.
>>> The OS could also assume the power supply is never turned off and
>>> still try to use the device. Depending on the hardware configuration,
>>> that might or might not work.
>>>
>>>> Devicetree is not the proper place for driver description or
>>>> for configuration.
>>>
>>> But depends-on isn't describing the driver configuration though.
>>>
>>> Overall, the clock provider example I gave in another reply is a much
>>> better example. If you just assume implied dependencies are mandatory
>>> dependencies, some devices will never be probe because the kernel is
>>> using them incorrectly (they aren't meant to list mandatory
>>> dependencies).
>>>
>>>> Another flaw with this method is that existing device trees
>>>> will be broken after the kernel is modified, because existing
>>>> device trees do not have the depends-on property.  This breaks
>>>> the devicetree compatibility rules.
>>>
>>> This is 100% not true with the current implementation. I actually
>>> tested this. This is fully backwards compatible. That's another reason
>>> for adding depends-on and going by just what it says. The existing
>>> bindings were never meant to describe only mandatory dependencies. So
>>> using them as such is what would break backwards compatibility.
>>>
>>>>> Having functional dependencies explicitly called out in DT and
>>>>> automatically added before the devices are probed, provides the
>>>>> following benefits:
>>>>>
>>>>> - Optimizes device probe order and avoids the useless work of
>>>>>   attempting probes of devices that will not probe successfully
>>>>>   (because their suppliers aren't present or haven't probed yet).
>>>>>
>>>>>   For example, in a commonly available mobile SoC, registering just
>>>>>   one consumer device's driver at an initcall level earlier than the
>>>>>   supplier device's driver causes 11 failed probe attempts before the
>>>>>   consumer device probes successfully. This was with a kernel with all
>>>>>   the drivers statically compiled in. This problem gets a lot worse if
>>>>>   all the drivers are loaded as modules without direct symbol
>>>>>   dependencies.
>>>>>
>>>>> - Supplier devices like clock providers, regulators providers, etc
>>>>>   need to keep the resources they provide active and at a particular
>>>>>   state(s) during boot up even if their current set of consumers don't
>>>>>   request the resource to be active. This is because the rest of the
>>>>>   consumers might not have probed yet and turning off the resource
>>>>>   before all the consumers have probed could lead to a hang or
>>>>>   undesired user experience.
>>>>>
>>>>>   Some frameworks (Eg: regulator) handle this today by turning off
>>>>>   "unused" resources at late_initcall_sync and hoping all the devices
>>>>>   have probed by then. This is not a valid assumption for systems with
>>>>>   loadable modules. Other frameworks (Eg: clock) just don't handle
>>>>>   this due to the lack of a clear signal for when they can turn off
>>>>>   resources. This leads to downstream hacks to handle cases like this
>>>>>   that can easily be solved in the upstream kernel.
>>>>>
>>>>>   By linking devices before they are probed, we give suppliers a clear
>>>>
>>>> By linking devices to suppliers before they are probed, we give suppliers a clear
>>>
>>> Ack
>>>
>>>>>   count of the number of dependent consumers. Once all of the
>>>>>   consumers are active, the suppliers can turn off the unused
>>>>>   resources without making assumptions about the number of consumers.
>>>>>
>>>>> By default we just add device-links to track "driver presence" (probe
>>>>> succeeded) of the supplier device. If any other functionality provided
>>>>> by device-links are needed, it is left to the consumer/supplier
>>>>> devices to change the link when they probe.
>>>>>
>>>>>
>>>>> Saravana Kannan (5):
>>>>>   of/platform: Speed up of_find_device_by_node()
>>>>>   driver core: Add device links support for pending links to suppliers
>>>>>   dt-bindings: Add depends-on property
>>>>>   of/platform: Add functional dependency link from "depends-on" property
>>>>>   driver core: Add sync_state driver/bus callback
>>>>>
>>>>>  .../devicetree/bindings/depends-on.txt        |  26 +++++
>>>>>  drivers/base/core.c                           | 106 ++++++++++++++++++
>>>>>  drivers/of/platform.c                         |  75 ++++++++++++-
>>>>>  include/linux/device.h                        |  24 ++++
>>>>>  include/linux/of.h                            |   3 +
>>>>>  5 files changed, 233 insertions(+), 1 deletion(-)
>>>>>  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
>>>>>
>>>>
>>>> The above issues make this specific implementation not acceptable.
>>>> I like the analysis of the problem areas, and I like the concepts of
>>>> trying to solve not only probe ordering, but also the problem of
>>>> when to turn off resources that will not be needed.
>>>
>>> Beating a dead horse here, but I want to make sure I get this into as
>>> many minds as possible:
>>> It is NOT just about turning off resources. It's about the kernel
>>> taking full control of resources (allowing the full range of voltages,
>>> clock frequencies, bus configurations, etc) and syncing the HW state
>>> to the SW state as determined by the consumers.
>>>
>>>> But at the
>>>> moment, I don't have a suggestion of a way to implement a solution.
>>>
>>> The problem of syncing resources to SW state after boot up completed
>>> has been broken for a long time in the kernel. It's high time we fix
>>> it. I'm open to other suggestions, but we can't just say "we don't
>>> have a solution".
>>>
>>> For example, we can have a kernel command line argument that'll use
>>> all common implicit bindings as mandatory dependencies and allow
>>> "depends-on" to override them for the few cases where the implicit
>>> dependencies don't match mandatory dependencies. Then:
>>> - The kernel will be 100% backwards compatible with existing DT if the
>>> command line arg isn't provided.
>>> - New DT + old kernel will be no worse than today because old kernel
>>> doesn't do any dependency tracking.
>>> - Command line arg + new kernel + hardware where all implicit
>>> dependencies are actually mandatory dependencies == things work
>>> better.
>>> - Command line arg + new kernel + hardware where all implicit
>>> dependencies don't match mandatory dependencies + depends-on for those
>>> exception case == things work better.
>>
>> Using a command line argument for this purpose just seems to be a
>> hack and bad architecture.
> 
> I agree -- which is why the current implementation doesn't try to form
> mandatory dependency from implicit bindings and makes the case that
> all mandatory dependencies should be called out explicitly. But if
> people insist on that implicit bindings be used as such, then a CONFIG
> or commandline option would be an acceptable compromise for me.
> 
>> It also seems like an invitation to mis-configure a system (in other
>> words, increases the complexity and difficulty of properly configuring
>> and administering a system).
> 
> Just want to point out that, as of today, we have a broken system --
> module loading is not compatible with proper handling of shared
> mandatory resources during boot up. To be clear, I'm not arguing for a
> commandline arg.
> 
>> The is not a hard no (yet), but it will take some convincing for me
>> to accept the command line approach to add the feature, yet maintain
>> compatibility.  Please do not spend any time replying to this concern
>> yet - we will have plenty of time to discuss later if need be.
> 
> Sounds good.
> 
> -Saravana
>
Saravana Kannan June 11, 2019, 8:19 p.m. UTC | #16
On Tue, Jun 11, 2019 at 7:56 AM Frank Rowand <frowand.list@gmail.com> wrote:
>
> Hi Saravana,
>
> On 5/24/19 9:04 PM, Saravana Kannan wrote:
> > On Fri, May 24, 2019 at 7:40 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> Hi Saranova,
> >>
> >> I'll try to address the other portions of this email that I <snipped>
> >> in my previous replies.
> >>
> >>
> >> On 5/24/19 2:53 PM, Saravana Kannan wrote:
> >>> On Fri, May 24, 2019 at 10:49 AM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> >>>>> Add a generic "depends-on" property that allows specifying mandatory
> >>>>> functional dependencies between devices. Add device-links after the
> >>>>> devices are created (but before they are probed) by looking at this
> >>>>> "depends-on" property.
> >>>>>
> >>>>> This property is used instead of existing DT properties that specify
> >>>>> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> >>>>> is because not all resources referred to by existing DT properties are
> >>>>> mandatory functional dependencies. Some devices/drivers might be able>>>>> to operate with reduced functionality when some of the resources
>
>
> In your original email, you say this:
>
> >>>>> aren't available. For example, a device could operate in polling mode
> >>>>> if no IRQ is available, a device could skip doing power management if
> >>>>> clock or voltage control isn't available and they are left on, etc.
>
>
> >>>>>
> >>>>> So, adding mandatory functional dependency links between devices by
> >>>>> looking at referred phandles in DT properties won't work as it would
> >>>>> prevent probing devices that could be probed. By having an explicit
> >>>>> depends-on property, we can handle these cases correctly.
> >>>>
> >>>> Trying to wrap my brain around the concept, this series seems to be
> >>>> adding the ability to declare that an apparent dependency (eg an IRQ
> >>>> specified by a phandle) is _not_ actually a dependency.
> >>>
> >>> The current implementation completely ignores existing bindings for
> >>> dependencies and so does the current tip of the kernel. So it's not
> >>> really overriding anything. However, if I change the implementation so
> >>> that depends-on becomes the source of truth if it exists and falls
> >>> back to existing common bindings if "depends-on" isn't present -- then
> >>> depends-on would truly be overriding existing bindings for
> >>> dependencies. It depends on how we want to define the DT property.
> >>>
> >>>> The phandle already implies the dependency.
> >>>
> >>> Sure, it might imply, but it's not always true.
> >>>
> >>>> Creating a separate
> >>>> depends-on property provides a method of ignoring the implied
> >>>> dependencies.
> >>>
> >>> implied != true
> >>>
>
> I refer to your irq mode vs polled mode device example:
>
> >>>> This is not just hardware description.  It is instead a combination
> >>>> of hardware functionality and driver functionality.  An example
> >>>> provided in the second paragraph of the email I am replying to
> >>>> suggests a device could operate in polling mode if no IRQ is
> >>>> available.  Using this example, the devicetree does not know
> >>>> whether the driver requires the IRQ (currently an implied
> >>>> dependency since the IRQ phandle exists).  My understanding
> >>>> of this example is that the device node would _not_ have a
> >>>> depends-on property for the IRQ phandle so the IRQ would be
> >>>> optional.  But this is an attribute of the driver, not the
> >>>> hardware.
> >>>
> >>
>
> You change the subject from irq mode vs polled mode device to some
> other type of device:
>
> >>> Not really. The interrupt could be for "SD card plugged in". That's
> >>> never a mandatory dependency for the SD card controller to work. So
> >>> the IRQ provider won't be a "depends-on" in this case. But if there is
> >>> no power supply or clock for the SD card controller, it isn't going to
> >>> work -- so they'd be listed in the "depends-on". So, this is still
> >>> defining the hardware and not the OS.
> >>
>
> I again try to get you to discuss the irq mode vs polled mode device:
>
> >> Please comment on my observation that was based on an IRQ for a device
> >> will polling mode vs interrupt driven mode.
> >> You described a different
> >> case and did not address my comment.
> > > I thought I did reply -- not sure what part you are looking for so
> > I'll rephrase. I was just picking the SD card controller as a concrete
> > example of device that can work with or without an interrupt. But
> > sure, I can call it "the device".
> >
>
> And the thread is so deeply nested that you are missing the original
> point that I made.
>
> > And yes, the device won't have a "depends-on" on the IRQ provider
> > because the device can still work without a working (as in bound to
> > driver) IRQ provider. Whether the driver insists on waiting on an IRQ
> > provider or not is up to the driver and the depends-on property is NOT
> > trying to dictate what the driver should do in this case. Does that
> > answer your implied question?
>
> If the device _must_ operate in irq mode to achieve the throughput
> that is _required_ for the system to be functional then that system
> would need a devicetree to have a "depends-on" property for the irq.
> But another system using the same exact hardware might be able to
> tolerate the reduced throughput of operating in polled mode.  This
> second system would need a devicetree that does _not_ have a
> depends-on property for that same irq, as used by that same device.

Thanks for clarifying the point. I see the difference in our view points.

The way I see it, on the system where the device must operate in IRQ
mode to meet performance requirements, the DRIVER would choose to
always -EPROBE_DEFER till it gets the IRQ. Or if it's BSD or Windows
or whatever other theoretical OS that is trying to use the same DT
blob, it'll follow whatever mechanism that OS provides for waiting for
the IRQ to become available.
On the system where not having the IRQ is okay, the DRIVER would not
EPROBE_DEFER and just go to using polling mode.
In both these cases I don't expect the depends-on to list the IRQ provider.

I have more to say but I'll say that in the RESEND thread as a reply
to your other email.

-Saravana