All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-08  7:41 ` Stephen Boyd
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-08  7:41 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Rafael J. Wysocki, Daniel Vetter, Russell King,
	Rob Clark, dri-devel

The device lists are poorly ordered when the component device code is
used. This is because component_master_add_with_match() returns 0
regardless of component devices calling component_add() first. It can
really only fail if an allocation fails, in which case everything is
going bad and we're out of memory. The host device (called master_dev in
the code), can succeed at probe and be put on the device lists before
any of the component devices are probed and put on the lists.

Within the component device framework this usually isn't that bad
because the real driver work is done at bind time via
component{,master}_ops::bind(). It becomes a problem when the driver
core, or host driver, wants to operate on the component device outside
of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
driver core doesn't understand the relationship between the host device
and the component devices and could possibly try to operate on component
devices when they're already removed from the system or shut down.

Normally, device links or probe defer would reorder the lists and put
devices that depend on other devices in the lists at the correct
location, but with component devices this doesn't happen because this
information isn't expressed anywhere. Drivers simply succeed at
registering their component or host with the component framework and
wait for their bind() callback to be called once the other components
are ready. We could make various device links between 'master_dev' and
'component->dev' but it's not necessary. Let's simply move the hosting
device to the end of the device lists when the component device fully
binds. This way we know that all components are present and have probed
properly and now the host device has really probed so it's safe to
assume the host driver ops can operate on any component device.

This fixes the msm display driver shutdown path when the DSI controller
is connected to a DSI bridge that is controlled via i2c. In this case,
the msm display driver wants to tear down the display pipeline on
shutdown at msm_pdev_shutdown() by calling drm_atomic_helper_shutdown(),
and it can't do that unless the whole display chain is still probed and
active in the system. When a display bridge is on i2c, the i2c device
for the bridge will be created whenever the i2c controller probes, which
could be before or after the msm display driver probes. If the i2c
controller probes after the display driver, then the i2c controller will
be shutdown before the display controller during system wide shutdown
and thus i2c transactions will stop working before the display pipeline
is shut down. This means we'll have the display bridge trying to access
an i2c bus that's shut down because drm_atomic_helper_shutdown() is
trying to disable the bridge after the bridge is off.

Moving the host device to the end of the lists at bind time moves the
drm_atomic_helper_shutdown() call before the i2c bus is shutdown.
This fixes the immediate problem, but we could improve it a bit by
modeling device links from the component devices to the host device
indicating that they supply something, although it is slightly different
because the consumer doesn't need the suppliers to probe to succeed.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Rob Clark <robdclark@gmail.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
---
 drivers/base/component.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/base/component.c b/drivers/base/component.c
index dcfbe7251dc4..de645420bae2 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -15,6 +15,8 @@
 #include <linux/slab.h>
 #include <linux/debugfs.h>
 
+#include "base.h"
+
 /**
  * DOC: overview
  *
@@ -657,6 +659,14 @@ int component_bind_all(struct device *master_dev, void *data)
 				c = master->match->compare[i - 1].component;
 				component_unbind(c, master, data);
 			}
+	} else {
+		/*
+		 * Move to the tail of the list so that master_dev driver ops
+		 * like 'shutdown' or 'remove' are called before any of the
+		 * dependencies that the components have are shutdown or
+		 * removed.
+		 */
+		device_pm_move_to_tail(master_dev);
 	}
 
 	return ret;

base-commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
-- 
https://chromeos.dev


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-08  7:41 ` Stephen Boyd
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-08  7:41 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, Daniel Vetter, linux-kernel, dri-devel, Russell King

The device lists are poorly ordered when the component device code is
used. This is because component_master_add_with_match() returns 0
regardless of component devices calling component_add() first. It can
really only fail if an allocation fails, in which case everything is
going bad and we're out of memory. The host device (called master_dev in
the code), can succeed at probe and be put on the device lists before
any of the component devices are probed and put on the lists.

Within the component device framework this usually isn't that bad
because the real driver work is done at bind time via
component{,master}_ops::bind(). It becomes a problem when the driver
core, or host driver, wants to operate on the component device outside
of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
driver core doesn't understand the relationship between the host device
and the component devices and could possibly try to operate on component
devices when they're already removed from the system or shut down.

Normally, device links or probe defer would reorder the lists and put
devices that depend on other devices in the lists at the correct
location, but with component devices this doesn't happen because this
information isn't expressed anywhere. Drivers simply succeed at
registering their component or host with the component framework and
wait for their bind() callback to be called once the other components
are ready. We could make various device links between 'master_dev' and
'component->dev' but it's not necessary. Let's simply move the hosting
device to the end of the device lists when the component device fully
binds. This way we know that all components are present and have probed
properly and now the host device has really probed so it's safe to
assume the host driver ops can operate on any component device.

This fixes the msm display driver shutdown path when the DSI controller
is connected to a DSI bridge that is controlled via i2c. In this case,
the msm display driver wants to tear down the display pipeline on
shutdown at msm_pdev_shutdown() by calling drm_atomic_helper_shutdown(),
and it can't do that unless the whole display chain is still probed and
active in the system. When a display bridge is on i2c, the i2c device
for the bridge will be created whenever the i2c controller probes, which
could be before or after the msm display driver probes. If the i2c
controller probes after the display driver, then the i2c controller will
be shutdown before the display controller during system wide shutdown
and thus i2c transactions will stop working before the display pipeline
is shut down. This means we'll have the display bridge trying to access
an i2c bus that's shut down because drm_atomic_helper_shutdown() is
trying to disable the bridge after the bridge is off.

Moving the host device to the end of the lists at bind time moves the
drm_atomic_helper_shutdown() call before the i2c bus is shutdown.
This fixes the immediate problem, but we could improve it a bit by
modeling device links from the component devices to the host device
indicating that they supply something, although it is slightly different
because the consumer doesn't need the suppliers to probe to succeed.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Rob Clark <robdclark@gmail.com>
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
---
 drivers/base/component.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/base/component.c b/drivers/base/component.c
index dcfbe7251dc4..de645420bae2 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -15,6 +15,8 @@
 #include <linux/slab.h>
 #include <linux/debugfs.h>
 
+#include "base.h"
+
 /**
  * DOC: overview
  *
@@ -657,6 +659,14 @@ int component_bind_all(struct device *master_dev, void *data)
 				c = master->match->compare[i - 1].component;
 				component_unbind(c, master, data);
 			}
+	} else {
+		/*
+		 * Move to the tail of the list so that master_dev driver ops
+		 * like 'shutdown' or 'remove' are called before any of the
+		 * dependencies that the components have are shutdown or
+		 * removed.
+		 */
+		device_pm_move_to_tail(master_dev);
 	}
 
 	return ret;

base-commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
-- 
https://chromeos.dev


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-08  7:41 ` Stephen Boyd
@ 2021-05-10 11:59   ` Rafael J. Wysocki
  -1 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2021-05-10 11:59 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Rafael J. Wysocki,
	Daniel Vetter, Russell King, Rob Clark, dri-devel,
	Saravana Kannan

On Sat, May 8, 2021 at 9:41 AM Stephen Boyd <swboyd@chromium.org> wrote:
>
> The device lists are poorly ordered when the component device code is
> used. This is because component_master_add_with_match() returns 0
> regardless of component devices calling component_add() first. It can
> really only fail if an allocation fails, in which case everything is
> going bad and we're out of memory. The host device (called master_dev in
> the code), can succeed at probe and be put on the device lists before
> any of the component devices are probed and put on the lists.
>
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.
>
> Normally, device links or probe defer would reorder the lists and put
> devices that depend on other devices in the lists at the correct
> location, but with component devices this doesn't happen because this
> information isn't expressed anywhere. Drivers simply succeed at
> registering their component or host with the component framework and
> wait for their bind() callback to be called once the other components
> are ready. We could make various device links between 'master_dev' and
> 'component->dev' but it's not necessary. Let's simply move the hosting
> device to the end of the device lists when the component device fully
> binds. This way we know that all components are present and have probed
> properly and now the host device has really probed so it's safe to
> assume the host driver ops can operate on any component device.

Moving a device to the end of dpm_list is generally risky in cases
when some dependency information may be missing.

For example, if there is a device depending on the hosting one, but
that dependency is not represented by a device link or a direct
ancestor-descendant relationship (or generally a path in the device
dependency graph leading from one of them to the other), then moving
it to the end of dpm_list would cause system-wide suspend to fail (the
hosting device would be suspended before the one depending on it).

That may not be a concern here, but at least it would be good to
document why it is not a concern.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-10 11:59   ` Rafael J. Wysocki
  0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2021-05-10 11:59 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Saravana Kannan, Rafael J. Wysocki, Daniel Vetter,
	Linux Kernel Mailing List, dri-devel, Greg Kroah-Hartman,
	Russell King

On Sat, May 8, 2021 at 9:41 AM Stephen Boyd <swboyd@chromium.org> wrote:
>
> The device lists are poorly ordered when the component device code is
> used. This is because component_master_add_with_match() returns 0
> regardless of component devices calling component_add() first. It can
> really only fail if an allocation fails, in which case everything is
> going bad and we're out of memory. The host device (called master_dev in
> the code), can succeed at probe and be put on the device lists before
> any of the component devices are probed and put on the lists.
>
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.
>
> Normally, device links or probe defer would reorder the lists and put
> devices that depend on other devices in the lists at the correct
> location, but with component devices this doesn't happen because this
> information isn't expressed anywhere. Drivers simply succeed at
> registering their component or host with the component framework and
> wait for their bind() callback to be called once the other components
> are ready. We could make various device links between 'master_dev' and
> 'component->dev' but it's not necessary. Let's simply move the hosting
> device to the end of the device lists when the component device fully
> binds. This way we know that all components are present and have probed
> properly and now the host device has really probed so it's safe to
> assume the host driver ops can operate on any component device.

Moving a device to the end of dpm_list is generally risky in cases
when some dependency information may be missing.

For example, if there is a device depending on the hosting one, but
that dependency is not represented by a device link or a direct
ancestor-descendant relationship (or generally a path in the device
dependency graph leading from one of them to the other), then moving
it to the end of dpm_list would cause system-wide suspend to fail (the
hosting device would be suspended before the one depending on it).

That may not be a concern here, but at least it would be good to
document why it is not a concern.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-08  7:41 ` Stephen Boyd
@ 2021-05-10 16:05   ` Daniel Vetter
  -1 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2021-05-10 16:05 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Greg Kroah-Hartman, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, Russell King, Rob Clark, dri-devel

On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> The device lists are poorly ordered when the component device code is
> used. This is because component_master_add_with_match() returns 0
> regardless of component devices calling component_add() first. It can
> really only fail if an allocation fails, in which case everything is
> going bad and we're out of memory. The host device (called master_dev in
> the code), can succeed at probe and be put on the device lists before
> any of the component devices are probed and put on the lists.
> 
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.
> 
> Normally, device links or probe defer would reorder the lists and put
> devices that depend on other devices in the lists at the correct
> location, but with component devices this doesn't happen because this
> information isn't expressed anywhere. Drivers simply succeed at
> registering their component or host with the component framework and
> wait for their bind() callback to be called once the other components
> are ready. We could make various device links between 'master_dev' and
> 'component->dev' but it's not necessary. Let's simply move the hosting
> device to the end of the device lists when the component device fully
> binds. This way we know that all components are present and have probed
> properly and now the host device has really probed so it's safe to
> assume the host driver ops can operate on any component device.
> 
> This fixes the msm display driver shutdown path when the DSI controller
> is connected to a DSI bridge that is controlled via i2c. In this case,
> the msm display driver wants to tear down the display pipeline on
> shutdown at msm_pdev_shutdown() by calling drm_atomic_helper_shutdown(),
> and it can't do that unless the whole display chain is still probed and
> active in the system. When a display bridge is on i2c, the i2c device
> for the bridge will be created whenever the i2c controller probes, which
> could be before or after the msm display driver probes. If the i2c
> controller probes after the display driver, then the i2c controller will
> be shutdown before the display controller during system wide shutdown
> and thus i2c transactions will stop working before the display pipeline
> is shut down. This means we'll have the display bridge trying to access
> an i2c bus that's shut down because drm_atomic_helper_shutdown() is
> trying to disable the bridge after the bridge is off.
> 
> Moving the host device to the end of the lists at bind time moves the
> drm_atomic_helper_shutdown() call before the i2c bus is shutdown.
> This fixes the immediate problem, but we could improve it a bit by
> modeling device links from the component devices to the host device
> indicating that they supply something, although it is slightly different
> because the consumer doesn't need the suppliers to probe to succeed.
> 
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Rob Clark <robdclark@gmail.com>
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Stephen Boyd <swboyd@chromium.org>

Entirely aside, but an s/master/aggregate/ or similar over the entire
component.c codebase would help a pile in making it easier to understand
which part does what. Or at least I'm always terribly confused about which
bind binds what and all that, so maybe an additional review whether we
have a clear split into aggregate and individual components after that
initial fix is needed.

On the actual topic: I agree there's a problem here, but I'm honestly not
sure how it should be fixed. That's way over my understanding of all the
device probe and pm interactions. Of which there are plenty.

One question I have: Why is the bridge component driver not correctly
ordered wrt the i2c driver it needs? The idea is that the aggregate driver
doesn't access any hw itself, but entirely relies on all its components.
So as long as all the component drivers are sorted correctly in the device
list, things /should/ work. And as soon as we drop out a single component,
the aggregate gets unbound (and then does all the
drm_atomic_helper_shutdown and all the other drm teardown). So is the bug
perhaps that msm does the drm teardown in the wrong callback?
-Daniel

> ---
>  drivers/base/component.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index dcfbe7251dc4..de645420bae2 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -15,6 +15,8 @@
>  #include <linux/slab.h>
>  #include <linux/debugfs.h>
>  
> +#include "base.h"
> +
>  /**
>   * DOC: overview
>   *
> @@ -657,6 +659,14 @@ int component_bind_all(struct device *master_dev, void *data)
>  				c = master->match->compare[i - 1].component;
>  				component_unbind(c, master, data);
>  			}
> +	} else {
> +		/*
> +		 * Move to the tail of the list so that master_dev driver ops
> +		 * like 'shutdown' or 'remove' are called before any of the
> +		 * dependencies that the components have are shutdown or
> +		 * removed.
> +		 */
> +		device_pm_move_to_tail(master_dev);
>  	}
>  
>  	return ret;
> 
> base-commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
> -- 
> https://chromeos.dev
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-10 16:05   ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2021-05-10 16:05 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Rafael J. Wysocki, Daniel Vetter, linux-kernel, dri-devel,
	Greg Kroah-Hartman, Russell King

On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> The device lists are poorly ordered when the component device code is
> used. This is because component_master_add_with_match() returns 0
> regardless of component devices calling component_add() first. It can
> really only fail if an allocation fails, in which case everything is
> going bad and we're out of memory. The host device (called master_dev in
> the code), can succeed at probe and be put on the device lists before
> any of the component devices are probed and put on the lists.
> 
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.
> 
> Normally, device links or probe defer would reorder the lists and put
> devices that depend on other devices in the lists at the correct
> location, but with component devices this doesn't happen because this
> information isn't expressed anywhere. Drivers simply succeed at
> registering their component or host with the component framework and
> wait for their bind() callback to be called once the other components
> are ready. We could make various device links between 'master_dev' and
> 'component->dev' but it's not necessary. Let's simply move the hosting
> device to the end of the device lists when the component device fully
> binds. This way we know that all components are present and have probed
> properly and now the host device has really probed so it's safe to
> assume the host driver ops can operate on any component device.
> 
> This fixes the msm display driver shutdown path when the DSI controller
> is connected to a DSI bridge that is controlled via i2c. In this case,
> the msm display driver wants to tear down the display pipeline on
> shutdown at msm_pdev_shutdown() by calling drm_atomic_helper_shutdown(),
> and it can't do that unless the whole display chain is still probed and
> active in the system. When a display bridge is on i2c, the i2c device
> for the bridge will be created whenever the i2c controller probes, which
> could be before or after the msm display driver probes. If the i2c
> controller probes after the display driver, then the i2c controller will
> be shutdown before the display controller during system wide shutdown
> and thus i2c transactions will stop working before the display pipeline
> is shut down. This means we'll have the display bridge trying to access
> an i2c bus that's shut down because drm_atomic_helper_shutdown() is
> trying to disable the bridge after the bridge is off.
> 
> Moving the host device to the end of the lists at bind time moves the
> drm_atomic_helper_shutdown() call before the i2c bus is shutdown.
> This fixes the immediate problem, but we could improve it a bit by
> modeling device links from the component devices to the host device
> indicating that they supply something, although it is slightly different
> because the consumer doesn't need the suppliers to probe to succeed.
> 
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Rob Clark <robdclark@gmail.com>
> Cc: <dri-devel@lists.freedesktop.org>
> Signed-off-by: Stephen Boyd <swboyd@chromium.org>

Entirely aside, but an s/master/aggregate/ or similar over the entire
component.c codebase would help a pile in making it easier to understand
which part does what. Or at least I'm always terribly confused about which
bind binds what and all that, so maybe an additional review whether we
have a clear split into aggregate and individual components after that
initial fix is needed.

On the actual topic: I agree there's a problem here, but I'm honestly not
sure how it should be fixed. That's way over my understanding of all the
device probe and pm interactions. Of which there are plenty.

One question I have: Why is the bridge component driver not correctly
ordered wrt the i2c driver it needs? The idea is that the aggregate driver
doesn't access any hw itself, but entirely relies on all its components.
So as long as all the component drivers are sorted correctly in the device
list, things /should/ work. And as soon as we drop out a single component,
the aggregate gets unbound (and then does all the
drm_atomic_helper_shutdown and all the other drm teardown). So is the bug
perhaps that msm does the drm teardown in the wrong callback?
-Daniel

> ---
>  drivers/base/component.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/base/component.c b/drivers/base/component.c
> index dcfbe7251dc4..de645420bae2 100644
> --- a/drivers/base/component.c
> +++ b/drivers/base/component.c
> @@ -15,6 +15,8 @@
>  #include <linux/slab.h>
>  #include <linux/debugfs.h>
>  
> +#include "base.h"
> +
>  /**
>   * DOC: overview
>   *
> @@ -657,6 +659,14 @@ int component_bind_all(struct device *master_dev, void *data)
>  				c = master->match->compare[i - 1].component;
>  				component_unbind(c, master, data);
>  			}
> +	} else {
> +		/*
> +		 * Move to the tail of the list so that master_dev driver ops
> +		 * like 'shutdown' or 'remove' are called before any of the
> +		 * dependencies that the components have are shutdown or
> +		 * removed.
> +		 */
> +		device_pm_move_to_tail(master_dev);
>  	}
>  
>  	return ret;
> 
> base-commit: 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
> -- 
> https://chromeos.dev
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-10 16:05   ` Daniel Vetter
@ 2021-05-10 17:52     ` Stephen Boyd
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-10 17:52 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Greg Kroah-Hartman, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, Russell King, Rob Clark, dri-devel

Quoting Daniel Vetter (2021-05-10 09:05:21)
> On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> > The device lists are poorly ordered when the component device code is
> > used. This is because component_master_add_with_match() returns 0
> > regardless of component devices calling component_add() first. It can
> > really only fail if an allocation fails, in which case everything is
> > going bad and we're out of memory. The host device (called master_dev in
> > the code), can succeed at probe and be put on the device lists before
> > any of the component devices are probed and put on the lists.
> >
> > Within the component device framework this usually isn't that bad
> > because the real driver work is done at bind time via
> > component{,master}_ops::bind(). It becomes a problem when the driver
> > core, or host driver, wants to operate on the component device outside
> > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > driver core doesn't understand the relationship between the host device
> > and the component devices and could possibly try to operate on component
> > devices when they're already removed from the system or shut down.
> >
> > Normally, device links or probe defer would reorder the lists and put
> > devices that depend on other devices in the lists at the correct
> > location, but with component devices this doesn't happen because this
> > information isn't expressed anywhere. Drivers simply succeed at
> > registering their component or host with the component framework and
> > wait for their bind() callback to be called once the other components
> > are ready. We could make various device links between 'master_dev' and
> > 'component->dev' but it's not necessary. Let's simply move the hosting
> > device to the end of the device lists when the component device fully
> > binds. This way we know that all components are present and have probed
> > properly and now the host device has really probed so it's safe to
> > assume the host driver ops can operate on any component device.
> >
> > This fixes the msm display driver shutdown path when the DSI controller
> > is connected to a DSI bridge that is controlled via i2c. In this case,
> > the msm display driver wants to tear down the display pipeline on
> > shutdown at msm_pdev_shutdown() by calling drm_atomic_helper_shutdown(),
> > and it can't do that unless the whole display chain is still probed and
> > active in the system. When a display bridge is on i2c, the i2c device
> > for the bridge will be created whenever the i2c controller probes, which
> > could be before or after the msm display driver probes. If the i2c
> > controller probes after the display driver, then the i2c controller will
> > be shutdown before the display controller during system wide shutdown
> > and thus i2c transactions will stop working before the display pipeline
> > is shut down. This means we'll have the display bridge trying to access
> > an i2c bus that's shut down because drm_atomic_helper_shutdown() is
> > trying to disable the bridge after the bridge is off.
> >
> > Moving the host device to the end of the lists at bind time moves the
> > drm_atomic_helper_shutdown() call before the i2c bus is shutdown.
> > This fixes the immediate problem, but we could improve it a bit by
> > modeling device links from the component devices to the host device
> > indicating that they supply something, although it is slightly different
> > because the consumer doesn't need the suppliers to probe to succeed.
> >
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> > Cc: Rob Clark <robdclark@gmail.com>
> > Cc: <dri-devel@lists.freedesktop.org>
> > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
>
> Entirely aside, but an s/master/aggregate/ or similar over the entire
> component.c codebase would help a pile in making it easier to understand
> which part does what. Or at least I'm always terribly confused about which
> bind binds what and all that, so maybe an additional review whether we
> have a clear split into aggregate and individual components after that
> initial fix is needed.

Agreed.

>
> On the actual topic: I agree there's a problem here, but I'm honestly not
> sure how it should be fixed. That's way over my understanding of all the
> device probe and pm interactions. Of which there are plenty.
>
> One question I have: Why is the bridge component driver not correctly
> ordered wrt the i2c driver it needs? The idea is that the aggregate driver
> doesn't access any hw itself, but entirely relies on all its components.
> So as long as all the component drivers are sorted correctly in the device
> list, things /should/ work. And as soon as we drop out a single component,
> the aggregate gets unbound (and then does all the
> drm_atomic_helper_shutdown and all the other drm teardown). So is the bug
> perhaps that msm does the drm teardown in the wrong callback?

I see my explanation of the problem wasn't sufficient :|

The bridge driver is not a component device. It is connected to the
aggregate device via the DSI device, where the DSI device is a component
device. The i2c bus/controller must probe before the i2c bridge probes,
so the device list is already ordered correctly for those two devices
(i2c controller and i2c bridge). The problem is the aggregate device
doesn't know that the bridge is part of the display pipeline/encoder
chain.

I thought that this design was intentional. Bridge devices can be mixed
and matched with display drivers because they're (always?) off the SoC
and so the aggregate device can't know which components it needs. I see
that the msm driver has some logic to traverse from the display
controller to the display phy, like DSI or HDMI, but it doesn't go
beyond that.

The crucially important part is that the DSI encoder will fail probe
until the end of the encoder chain is probed, see
msm_dsi_host_register() and how it checks for a panel and a bridge.

The order of operations is like this

 1. msm driver probes, creates aggregate device driver
 2. i2c controller probes, creates i2c bus
 3. i2c bridge probes, creates drm bridge and adds to drm
 4. rest of component devices probe and aggregate device is bound

The device list now has msm, i2c, bridge in that order. When we go to
system wide shutdown the bridge is shutdown first, then the i2c bus, and
then msm calls drm_atomic_helper_shutdown(). That tries to call the i2c
bridge ops because it's attached to the end of the DSI encoder and
things don't go well because i2c is gone. This patch fixes the order of
the list so that msm is moved on the device list after all the
components that make up the aggregate device have probed. This only
works to move the aggregate device after the i2c bridge because the
msm_dsi_host_register() function won't return success until the bridge
device is probed.

It's an interesting idea to trigger shutdown when the component device
is unbound. Are you suggesting that the i2c bridge device have a
'shutdown' callback, that essentially removes the bridge from the
encoder chain via mipi_dsi_detach() and then drm_bridge_remove()?
Presumably that would somehow tell the DSI encoder that it should stop
trying to use the i2c bridge and then drm_atomic_helper_shutdown()
wouldn't try to traverse beyond the DSI to shut things down.

I will try it, but then I wonder about things like system wide
suspend/resume too. The drm encoder chain would need to reimplement the
logic for system wide suspend/resume so that any PM ops attached to the
msm device run in the correct order. Right now the bridge PM ops will
run, the i2c bus PM ops will run, and then the msm PM ops will run.
After this change, the msm PM ops will run, the bridge PM ops will run,
and then the i2c bus PM ops will run. It feels like that could be a
problem if we're suspending the DSI encoder while the bridge is still
active.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-10 17:52     ` Stephen Boyd
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-10 17:52 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rafael J. Wysocki, Daniel Vetter, linux-kernel, dri-devel,
	Greg Kroah-Hartman, Russell King

Quoting Daniel Vetter (2021-05-10 09:05:21)
> On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> > The device lists are poorly ordered when the component device code is
> > used. This is because component_master_add_with_match() returns 0
> > regardless of component devices calling component_add() first. It can
> > really only fail if an allocation fails, in which case everything is
> > going bad and we're out of memory. The host device (called master_dev in
> > the code), can succeed at probe and be put on the device lists before
> > any of the component devices are probed and put on the lists.
> >
> > Within the component device framework this usually isn't that bad
> > because the real driver work is done at bind time via
> > component{,master}_ops::bind(). It becomes a problem when the driver
> > core, or host driver, wants to operate on the component device outside
> > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > driver core doesn't understand the relationship between the host device
> > and the component devices and could possibly try to operate on component
> > devices when they're already removed from the system or shut down.
> >
> > Normally, device links or probe defer would reorder the lists and put
> > devices that depend on other devices in the lists at the correct
> > location, but with component devices this doesn't happen because this
> > information isn't expressed anywhere. Drivers simply succeed at
> > registering their component or host with the component framework and
> > wait for their bind() callback to be called once the other components
> > are ready. We could make various device links between 'master_dev' and
> > 'component->dev' but it's not necessary. Let's simply move the hosting
> > device to the end of the device lists when the component device fully
> > binds. This way we know that all components are present and have probed
> > properly and now the host device has really probed so it's safe to
> > assume the host driver ops can operate on any component device.
> >
> > This fixes the msm display driver shutdown path when the DSI controller
> > is connected to a DSI bridge that is controlled via i2c. In this case,
> > the msm display driver wants to tear down the display pipeline on
> > shutdown at msm_pdev_shutdown() by calling drm_atomic_helper_shutdown(),
> > and it can't do that unless the whole display chain is still probed and
> > active in the system. When a display bridge is on i2c, the i2c device
> > for the bridge will be created whenever the i2c controller probes, which
> > could be before or after the msm display driver probes. If the i2c
> > controller probes after the display driver, then the i2c controller will
> > be shutdown before the display controller during system wide shutdown
> > and thus i2c transactions will stop working before the display pipeline
> > is shut down. This means we'll have the display bridge trying to access
> > an i2c bus that's shut down because drm_atomic_helper_shutdown() is
> > trying to disable the bridge after the bridge is off.
> >
> > Moving the host device to the end of the lists at bind time moves the
> > drm_atomic_helper_shutdown() call before the i2c bus is shutdown.
> > This fixes the immediate problem, but we could improve it a bit by
> > modeling device links from the component devices to the host device
> > indicating that they supply something, although it is slightly different
> > because the consumer doesn't need the suppliers to probe to succeed.
> >
> > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> > Cc: Rob Clark <robdclark@gmail.com>
> > Cc: <dri-devel@lists.freedesktop.org>
> > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
>
> Entirely aside, but an s/master/aggregate/ or similar over the entire
> component.c codebase would help a pile in making it easier to understand
> which part does what. Or at least I'm always terribly confused about which
> bind binds what and all that, so maybe an additional review whether we
> have a clear split into aggregate and individual components after that
> initial fix is needed.

Agreed.

>
> On the actual topic: I agree there's a problem here, but I'm honestly not
> sure how it should be fixed. That's way over my understanding of all the
> device probe and pm interactions. Of which there are plenty.
>
> One question I have: Why is the bridge component driver not correctly
> ordered wrt the i2c driver it needs? The idea is that the aggregate driver
> doesn't access any hw itself, but entirely relies on all its components.
> So as long as all the component drivers are sorted correctly in the device
> list, things /should/ work. And as soon as we drop out a single component,
> the aggregate gets unbound (and then does all the
> drm_atomic_helper_shutdown and all the other drm teardown). So is the bug
> perhaps that msm does the drm teardown in the wrong callback?

I see my explanation of the problem wasn't sufficient :|

The bridge driver is not a component device. It is connected to the
aggregate device via the DSI device, where the DSI device is a component
device. The i2c bus/controller must probe before the i2c bridge probes,
so the device list is already ordered correctly for those two devices
(i2c controller and i2c bridge). The problem is the aggregate device
doesn't know that the bridge is part of the display pipeline/encoder
chain.

I thought that this design was intentional. Bridge devices can be mixed
and matched with display drivers because they're (always?) off the SoC
and so the aggregate device can't know which components it needs. I see
that the msm driver has some logic to traverse from the display
controller to the display phy, like DSI or HDMI, but it doesn't go
beyond that.

The crucially important part is that the DSI encoder will fail probe
until the end of the encoder chain is probed, see
msm_dsi_host_register() and how it checks for a panel and a bridge.

The order of operations is like this

 1. msm driver probes, creates aggregate device driver
 2. i2c controller probes, creates i2c bus
 3. i2c bridge probes, creates drm bridge and adds to drm
 4. rest of component devices probe and aggregate device is bound

The device list now has msm, i2c, bridge in that order. When we go to
system wide shutdown the bridge is shutdown first, then the i2c bus, and
then msm calls drm_atomic_helper_shutdown(). That tries to call the i2c
bridge ops because it's attached to the end of the DSI encoder and
things don't go well because i2c is gone. This patch fixes the order of
the list so that msm is moved on the device list after all the
components that make up the aggregate device have probed. This only
works to move the aggregate device after the i2c bridge because the
msm_dsi_host_register() function won't return success until the bridge
device is probed.

It's an interesting idea to trigger shutdown when the component device
is unbound. Are you suggesting that the i2c bridge device have a
'shutdown' callback, that essentially removes the bridge from the
encoder chain via mipi_dsi_detach() and then drm_bridge_remove()?
Presumably that would somehow tell the DSI encoder that it should stop
trying to use the i2c bridge and then drm_atomic_helper_shutdown()
wouldn't try to traverse beyond the DSI to shut things down.

I will try it, but then I wonder about things like system wide
suspend/resume too. The drm encoder chain would need to reimplement the
logic for system wide suspend/resume so that any PM ops attached to the
msm device run in the correct order. Right now the bridge PM ops will
run, the i2c bus PM ops will run, and then the msm PM ops will run.
After this change, the msm PM ops will run, the bridge PM ops will run,
and then the i2c bus PM ops will run. It feels like that could be a
problem if we're suspending the DSI encoder while the bridge is still
active.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-10 17:52     ` Stephen Boyd
@ 2021-05-10 18:26       ` Daniel Vetter
  -1 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2021-05-10 18:26 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Rafael J. Wysocki,
	Russell King, Rob Clark, dri-devel

On Mon, May 10, 2021 at 7:52 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Daniel Vetter (2021-05-10 09:05:21)
> > On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> > > The device lists are poorly ordered when the component device code is
> > > used. This is because component_master_add_with_match() returns 0
> > > regardless of component devices calling component_add() first. It can
> > > really only fail if an allocation fails, in which case everything is
> > > going bad and we're out of memory. The host device (called master_dev in
> > > the code), can succeed at probe and be put on the device lists before
> > > any of the component devices are probed and put on the lists.
> > >
> > > Within the component device framework this usually isn't that bad
> > > because the real driver work is done at bind time via
> > > component{,master}_ops::bind(). It becomes a problem when the driver
> > > core, or host driver, wants to operate on the component device outside
> > > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > > driver core doesn't understand the relationship between the host device
> > > and the component devices and could possibly try to operate on component
> > > devices when they're already removed from the system or shut down.
> > >
> > > Normally, device links or probe defer would reorder the lists and put
> > > devices that depend on other devices in the lists at the correct
> > > location, but with component devices this doesn't happen because this
> > > information isn't expressed anywhere. Drivers simply succeed at
> > > registering their component or host with the component framework and
> > > wait for their bind() callback to be called once the other components
> > > are ready. We could make various device links between 'master_dev' and
> > > 'component->dev' but it's not necessary. Let's simply move the hosting
> > > device to the end of the device lists when the component device fully
> > > binds. This way we know that all components are present and have probed
> > > properly and now the host device has really probed so it's safe to
> > > assume the host driver ops can operate on any component device.
> > >
> > > This fixes the msm display driver shutdown path when the DSI controller
> > > is connected to a DSI bridge that is controlled via i2c. In this case,
> > > the msm display driver wants to tear down the display pipeline on
> > > shutdown at msm_pdev_shutdown() by calling drm_atomic_helper_shutdown(),
> > > and it can't do that unless the whole display chain is still probed and
> > > active in the system. When a display bridge is on i2c, the i2c device
> > > for the bridge will be created whenever the i2c controller probes, which
> > > could be before or after the msm display driver probes. If the i2c
> > > controller probes after the display driver, then the i2c controller will
> > > be shutdown before the display controller during system wide shutdown
> > > and thus i2c transactions will stop working before the display pipeline
> > > is shut down. This means we'll have the display bridge trying to access
> > > an i2c bus that's shut down because drm_atomic_helper_shutdown() is
> > > trying to disable the bridge after the bridge is off.
> > >
> > > Moving the host device to the end of the lists at bind time moves the
> > > drm_atomic_helper_shutdown() call before the i2c bus is shutdown.
> > > This fixes the immediate problem, but we could improve it a bit by
> > > modeling device links from the component devices to the host device
> > > indicating that they supply something, although it is slightly different
> > > because the consumer doesn't need the suppliers to probe to succeed.
> > >
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> > > Cc: Rob Clark <robdclark@gmail.com>
> > > Cc: <dri-devel@lists.freedesktop.org>
> > > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> >
> > Entirely aside, but an s/master/aggregate/ or similar over the entire
> > component.c codebase would help a pile in making it easier to understand
> > which part does what. Or at least I'm always terribly confused about which
> > bind binds what and all that, so maybe an additional review whether we
> > have a clear split into aggregate and individual components after that
> > initial fix is needed.
>
> Agreed.
>
> >
> > On the actual topic: I agree there's a problem here, but I'm honestly not
> > sure how it should be fixed. That's way over my understanding of all the
> > device probe and pm interactions. Of which there are plenty.
> >
> > One question I have: Why is the bridge component driver not correctly
> > ordered wrt the i2c driver it needs? The idea is that the aggregate driver
> > doesn't access any hw itself, but entirely relies on all its components.
> > So as long as all the component drivers are sorted correctly in the device
> > list, things /should/ work. And as soon as we drop out a single component,
> > the aggregate gets unbound (and then does all the
> > drm_atomic_helper_shutdown and all the other drm teardown). So is the bug
> > perhaps that msm does the drm teardown in the wrong callback?
>
> I see my explanation of the problem wasn't sufficient :|
>
> The bridge driver is not a component device. It is connected to the
> aggregate device via the DSI device, where the DSI device is a component
> device. The i2c bus/controller must probe before the i2c bridge probes,
> so the device list is already ordered correctly for those two devices
> (i2c controller and i2c bridge). The problem is the aggregate device
> doesn't know that the bridge is part of the display pipeline/encoder
> chain.
>
> I thought that this design was intentional. Bridge devices can be mixed
> and matched with display drivers because they're (always?) off the SoC
> and so the aggregate device can't know which components it needs. I see
> that the msm driver has some logic to traverse from the display
> controller to the display phy, like DSI or HDMI, but it doesn't go
> beyond that.
>
> The crucially important part is that the DSI encoder will fail probe
> until the end of the encoder chain is probed, see
> msm_dsi_host_register() and how it checks for a panel and a bridge.
>
> The order of operations is like this
>
>  1. msm driver probes, creates aggregate device driver
>  2. i2c controller probes, creates i2c bus
>  3. i2c bridge probes, creates drm bridge and adds to drm
>  4. rest of component devices probe and aggregate device is bound
>
> The device list now has msm, i2c, bridge in that order. When we go to
> system wide shutdown the bridge is shutdown first, then the i2c bus, and
> then msm calls drm_atomic_helper_shutdown(). That tries to call the i2c
> bridge ops because it's attached to the end of the DSI encoder and
> things don't go well because i2c is gone. This patch fixes the order of
> the list so that msm is moved on the device list after all the
> components that make up the aggregate device have probed. This only
> works to move the aggregate device after the i2c bridge because the
> msm_dsi_host_register() function won't return success until the bridge
> device is probed.

Ah I think I get this now. There is indeed a design problem:
component.c only has bind/unbind hooks for all its things. Which means
driver load/unload will work correctly because in your above sequence:

1. drm_brige unbinds
-> this triggers the unbind of the entire aggregate of components
2. i2c unbinds
3. msm unbinds, but there's nothing to clean up anymore except the
aggregate/master struct

Now for runtime pm this also all works out, because each component
grabs the right runtime pm references. But for the system-wide pm
changes, where we rely on the device list order to make sure things
happen in the right way, it all blows up.

1. drm_bringe shutdown
2. i2c shutdown
3. msm shutdown, and with very sad thrombones because we blow up

I think the right fix is to make component.c more of  a driver model
thing, which probably means either the aggregate must get tied closer
to the main struct device, or it needs to gain its own struct device.
Or minimally at least, the aggregate needs to gain an entire set of
pm_ops, which gets called in the right order if any of the component's
pm_ops gets called. Wiring that all up will be major surgery I think.

I guess another option would be trying to figure out how the aggreate
registration could fail with EPROBE_DEFER until all the parts are
there, to guarantee the right ordering. Not sure that will work with
the current component users though.

> It's an interesting idea to trigger shutdown when the component device
> is unbound. Are you suggesting that the i2c bridge device have a
> 'shutdown' callback, that essentially removes the bridge from the
> encoder chain via mipi_dsi_detach() and then drm_bridge_remove()?
> Presumably that would somehow tell the DSI encoder that it should stop
> trying to use the i2c bridge and then drm_atomic_helper_shutdown()
> wouldn't try to traverse beyond the DSI to shut things down.

Nope, we don't want to unbind the driver on shutdown. I somehow
thought you're dying in there, which is why I wondered what's going
on. But since you're dying in pm_ops->shutdown, that's a different
thing.

> I will try it, but then I wonder about things like system wide
> suspend/resume too. The drm encoder chain would need to reimplement the
> logic for system wide suspend/resume so that any PM ops attached to the
> msm device run in the correct order. Right now the bridge PM ops will
> run, the i2c bus PM ops will run, and then the msm PM ops will run.
> After this change, the msm PM ops will run, the bridge PM ops will run,
> and then the i2c bus PM ops will run. It feels like that could be a
> problem if we're suspending the DSI encoder while the bridge is still
> active.

Yup suspend/resume has the exact same problem as shutdown.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-10 18:26       ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2021-05-10 18:26 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Linux Kernel Mailing List,
	dri-devel, Russell King

On Mon, May 10, 2021 at 7:52 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Daniel Vetter (2021-05-10 09:05:21)
> > On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> > > The device lists are poorly ordered when the component device code is
> > > used. This is because component_master_add_with_match() returns 0
> > > regardless of component devices calling component_add() first. It can
> > > really only fail if an allocation fails, in which case everything is
> > > going bad and we're out of memory. The host device (called master_dev in
> > > the code), can succeed at probe and be put on the device lists before
> > > any of the component devices are probed and put on the lists.
> > >
> > > Within the component device framework this usually isn't that bad
> > > because the real driver work is done at bind time via
> > > component{,master}_ops::bind(). It becomes a problem when the driver
> > > core, or host driver, wants to operate on the component device outside
> > > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > > driver core doesn't understand the relationship between the host device
> > > and the component devices and could possibly try to operate on component
> > > devices when they're already removed from the system or shut down.
> > >
> > > Normally, device links or probe defer would reorder the lists and put
> > > devices that depend on other devices in the lists at the correct
> > > location, but with component devices this doesn't happen because this
> > > information isn't expressed anywhere. Drivers simply succeed at
> > > registering their component or host with the component framework and
> > > wait for their bind() callback to be called once the other components
> > > are ready. We could make various device links between 'master_dev' and
> > > 'component->dev' but it's not necessary. Let's simply move the hosting
> > > device to the end of the device lists when the component device fully
> > > binds. This way we know that all components are present and have probed
> > > properly and now the host device has really probed so it's safe to
> > > assume the host driver ops can operate on any component device.
> > >
> > > This fixes the msm display driver shutdown path when the DSI controller
> > > is connected to a DSI bridge that is controlled via i2c. In this case,
> > > the msm display driver wants to tear down the display pipeline on
> > > shutdown at msm_pdev_shutdown() by calling drm_atomic_helper_shutdown(),
> > > and it can't do that unless the whole display chain is still probed and
> > > active in the system. When a display bridge is on i2c, the i2c device
> > > for the bridge will be created whenever the i2c controller probes, which
> > > could be before or after the msm display driver probes. If the i2c
> > > controller probes after the display driver, then the i2c controller will
> > > be shutdown before the display controller during system wide shutdown
> > > and thus i2c transactions will stop working before the display pipeline
> > > is shut down. This means we'll have the display bridge trying to access
> > > an i2c bus that's shut down because drm_atomic_helper_shutdown() is
> > > trying to disable the bridge after the bridge is off.
> > >
> > > Moving the host device to the end of the lists at bind time moves the
> > > drm_atomic_helper_shutdown() call before the i2c bus is shutdown.
> > > This fixes the immediate problem, but we could improve it a bit by
> > > modeling device links from the component devices to the host device
> > > indicating that they supply something, although it is slightly different
> > > because the consumer doesn't need the suppliers to probe to succeed.
> > >
> > > Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> > > Cc: Rob Clark <robdclark@gmail.com>
> > > Cc: <dri-devel@lists.freedesktop.org>
> > > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> >
> > Entirely aside, but an s/master/aggregate/ or similar over the entire
> > component.c codebase would help a pile in making it easier to understand
> > which part does what. Or at least I'm always terribly confused about which
> > bind binds what and all that, so maybe an additional review whether we
> > have a clear split into aggregate and individual components after that
> > initial fix is needed.
>
> Agreed.
>
> >
> > On the actual topic: I agree there's a problem here, but I'm honestly not
> > sure how it should be fixed. That's way over my understanding of all the
> > device probe and pm interactions. Of which there are plenty.
> >
> > One question I have: Why is the bridge component driver not correctly
> > ordered wrt the i2c driver it needs? The idea is that the aggregate driver
> > doesn't access any hw itself, but entirely relies on all its components.
> > So as long as all the component drivers are sorted correctly in the device
> > list, things /should/ work. And as soon as we drop out a single component,
> > the aggregate gets unbound (and then does all the
> > drm_atomic_helper_shutdown and all the other drm teardown). So is the bug
> > perhaps that msm does the drm teardown in the wrong callback?
>
> I see my explanation of the problem wasn't sufficient :|
>
> The bridge driver is not a component device. It is connected to the
> aggregate device via the DSI device, where the DSI device is a component
> device. The i2c bus/controller must probe before the i2c bridge probes,
> so the device list is already ordered correctly for those two devices
> (i2c controller and i2c bridge). The problem is the aggregate device
> doesn't know that the bridge is part of the display pipeline/encoder
> chain.
>
> I thought that this design was intentional. Bridge devices can be mixed
> and matched with display drivers because they're (always?) off the SoC
> and so the aggregate device can't know which components it needs. I see
> that the msm driver has some logic to traverse from the display
> controller to the display phy, like DSI or HDMI, but it doesn't go
> beyond that.
>
> The crucially important part is that the DSI encoder will fail probe
> until the end of the encoder chain is probed, see
> msm_dsi_host_register() and how it checks for a panel and a bridge.
>
> The order of operations is like this
>
>  1. msm driver probes, creates aggregate device driver
>  2. i2c controller probes, creates i2c bus
>  3. i2c bridge probes, creates drm bridge and adds to drm
>  4. rest of component devices probe and aggregate device is bound
>
> The device list now has msm, i2c, bridge in that order. When we go to
> system wide shutdown the bridge is shutdown first, then the i2c bus, and
> then msm calls drm_atomic_helper_shutdown(). That tries to call the i2c
> bridge ops because it's attached to the end of the DSI encoder and
> things don't go well because i2c is gone. This patch fixes the order of
> the list so that msm is moved on the device list after all the
> components that make up the aggregate device have probed. This only
> works to move the aggregate device after the i2c bridge because the
> msm_dsi_host_register() function won't return success until the bridge
> device is probed.

Ah I think I get this now. There is indeed a design problem:
component.c only has bind/unbind hooks for all its things. Which means
driver load/unload will work correctly because in your above sequence:

1. drm_brige unbinds
-> this triggers the unbind of the entire aggregate of components
2. i2c unbinds
3. msm unbinds, but there's nothing to clean up anymore except the
aggregate/master struct

Now for runtime pm this also all works out, because each component
grabs the right runtime pm references. But for the system-wide pm
changes, where we rely on the device list order to make sure things
happen in the right way, it all blows up.

1. drm_bringe shutdown
2. i2c shutdown
3. msm shutdown, and with very sad thrombones because we blow up

I think the right fix is to make component.c more of  a driver model
thing, which probably means either the aggregate must get tied closer
to the main struct device, or it needs to gain its own struct device.
Or minimally at least, the aggregate needs to gain an entire set of
pm_ops, which gets called in the right order if any of the component's
pm_ops gets called. Wiring that all up will be major surgery I think.

I guess another option would be trying to figure out how the aggreate
registration could fail with EPROBE_DEFER until all the parts are
there, to guarantee the right ordering. Not sure that will work with
the current component users though.

> It's an interesting idea to trigger shutdown when the component device
> is unbound. Are you suggesting that the i2c bridge device have a
> 'shutdown' callback, that essentially removes the bridge from the
> encoder chain via mipi_dsi_detach() and then drm_bridge_remove()?
> Presumably that would somehow tell the DSI encoder that it should stop
> trying to use the i2c bridge and then drm_atomic_helper_shutdown()
> wouldn't try to traverse beyond the DSI to shut things down.

Nope, we don't want to unbind the driver on shutdown. I somehow
thought you're dying in there, which is why I wondered what's going
on. But since you're dying in pm_ops->shutdown, that's a different
thing.

> I will try it, but then I wonder about things like system wide
> suspend/resume too. The drm encoder chain would need to reimplement the
> logic for system wide suspend/resume so that any PM ops attached to the
> msm device run in the correct order. Right now the bridge PM ops will
> run, the i2c bus PM ops will run, and then the msm PM ops will run.
> After this change, the msm PM ops will run, the bridge PM ops will run,
> and then the i2c bus PM ops will run. It feels like that could be a
> problem if we're suspending the DSI encoder while the bridge is still
> active.

Yup suspend/resume has the exact same problem as shutdown.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-10 18:26       ` Daniel Vetter
@ 2021-05-10 19:08         ` Stephen Boyd
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-10 19:08 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Rafael J. Wysocki,
	Russell King, Rob Clark, dri-devel

Quoting Daniel Vetter (2021-05-10 11:26:40)
> On Mon, May 10, 2021 at 7:52 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > The device list now has msm, i2c, bridge in that order. When we go to
> > system wide shutdown the bridge is shutdown first, then the i2c bus, and
> > then msm calls drm_atomic_helper_shutdown(). That tries to call the i2c
> > bridge ops because it's attached to the end of the DSI encoder and
> > things don't go well because i2c is gone. This patch fixes the order of
> > the list so that msm is moved on the device list after all the
> > components that make up the aggregate device have probed. This only
> > works to move the aggregate device after the i2c bridge because the
> > msm_dsi_host_register() function won't return success until the bridge
> > device is probed.
>
> Ah I think I get this now. There is indeed a design problem:
> component.c only has bind/unbind hooks for all its things. Which means
> driver load/unload will work correctly because in your above sequence:
>
> 1. drm_brige unbinds
> -> this triggers the unbind of the entire aggregate of components
> 2. i2c unbinds
> 3. msm unbinds, but there's nothing to clean up anymore except the
> aggregate/master struct

Yes. I just tried this though and it didn't work, so I suspect there are
bugs in bridge unbind. Another rabbit hole.

>
> Now for runtime pm this also all works out, because each component
> grabs the right runtime pm references. But for the system-wide pm
> changes, where we rely on the device list order to make sure things
> happen in the right way, it all blows up.
>
> 1. drm_bringe shutdown
> 2. i2c shutdown
> 3. msm shutdown, and with very sad thrombones because we blow up
>
> I think the right fix is to make component.c more of  a driver model
> thing, which probably means either the aggregate must get tied closer
> to the main struct device, or it needs to gain its own struct device.
> Or minimally at least, the aggregate needs to gain an entire set of
> pm_ops, which gets called in the right order if any of the component's
> pm_ops gets called. Wiring that all up will be major surgery I think.

Yes the root of the problem is that the aggregate device is not part of
the kernel's driver model. It's basically a pair of probe and remove
functions and nothing else.

>
> I guess another option would be trying to figure out how the aggreate
> registration could fail with EPROBE_DEFER until all the parts are
> there, to guarantee the right ordering. Not sure that will work with
> the current component users though.

I had that written up and it worked for me but I was concerned it would
break other users, plus it didn't feel correct to defer probe just
because the components weren't probed yet. The aggregate device wasn't
waiting for the components to probe, so why change that? For msm it led
to more work too, because we have some child devices that are removed if
the aggregate device fails to probe, meaning we go through a few cycles
of add/remove of the components this way. If the aggregate doesn't defer
probe then we can avoid the other components adding/removing over and
over again until the final component, DSI that is waiting for the
bridge, can probe.

That's why I opted to move the device on the list to the tail. I'm
hoping that most component users (which is basically drm?) don't do much
with the device they're using to host the aggregate device besides tell
drm that the display pipeline is here now. Everything else would be in
the bind/unbind callbacks. If there was a 'struct device', or maybe a
'struct class', that was associated with the whole display pipeline and
aggregate device we could attach the pm ops to that. Would 'struct
drm_device' be that? If yes we could make some drm functions that let
you attach PM ops to a struct device inside of that and make it a child
of the device that calls drm_dev_alloc().

>
> > It's an interesting idea to trigger shutdown when the component device
> > is unbound. Are you suggesting that the i2c bridge device have a
> > 'shutdown' callback, that essentially removes the bridge from the
> > encoder chain via mipi_dsi_detach() and then drm_bridge_remove()?
> > Presumably that would somehow tell the DSI encoder that it should stop
> > trying to use the i2c bridge and then drm_atomic_helper_shutdown()
> > wouldn't try to traverse beyond the DSI to shut things down.
>
> Nope, we don't want to unbind the driver on shutdown. I somehow
> thought you're dying in there, which is why I wondered what's going
> on. But since you're dying in pm_ops->shutdown, that's a different
> thing.

I'm dying in msm_pdev_shutdown(), but yes pm_ops are similar.

>
> > I will try it, but then I wonder about things like system wide
> > suspend/resume too. The drm encoder chain would need to reimplement the
> > logic for system wide suspend/resume so that any PM ops attached to the
> > msm device run in the correct order. Right now the bridge PM ops will
> > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > After this change, the msm PM ops will run, the bridge PM ops will run,
> > and then the i2c bus PM ops will run. It feels like that could be a
> > problem if we're suspending the DSI encoder while the bridge is still
> > active.
>
> Yup suspend/resume has the exact same problem as shutdown.

I think suspend/resume has the exact opposite problem. At least I think
the correct order is to suspend the bridge, then the encoder, i.e. DSI,
like is happening today. It looks like drm_atomic_helper_shutdown()
operates from the top down when we want bottom up? I admit I have no
idea what is supposed to happen here.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-10 19:08         ` Stephen Boyd
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-10 19:08 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Linux Kernel Mailing List,
	dri-devel, Russell King

Quoting Daniel Vetter (2021-05-10 11:26:40)
> On Mon, May 10, 2021 at 7:52 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > The device list now has msm, i2c, bridge in that order. When we go to
> > system wide shutdown the bridge is shutdown first, then the i2c bus, and
> > then msm calls drm_atomic_helper_shutdown(). That tries to call the i2c
> > bridge ops because it's attached to the end of the DSI encoder and
> > things don't go well because i2c is gone. This patch fixes the order of
> > the list so that msm is moved on the device list after all the
> > components that make up the aggregate device have probed. This only
> > works to move the aggregate device after the i2c bridge because the
> > msm_dsi_host_register() function won't return success until the bridge
> > device is probed.
>
> Ah I think I get this now. There is indeed a design problem:
> component.c only has bind/unbind hooks for all its things. Which means
> driver load/unload will work correctly because in your above sequence:
>
> 1. drm_brige unbinds
> -> this triggers the unbind of the entire aggregate of components
> 2. i2c unbinds
> 3. msm unbinds, but there's nothing to clean up anymore except the
> aggregate/master struct

Yes. I just tried this though and it didn't work, so I suspect there are
bugs in bridge unbind. Another rabbit hole.

>
> Now for runtime pm this also all works out, because each component
> grabs the right runtime pm references. But for the system-wide pm
> changes, where we rely on the device list order to make sure things
> happen in the right way, it all blows up.
>
> 1. drm_bringe shutdown
> 2. i2c shutdown
> 3. msm shutdown, and with very sad thrombones because we blow up
>
> I think the right fix is to make component.c more of  a driver model
> thing, which probably means either the aggregate must get tied closer
> to the main struct device, or it needs to gain its own struct device.
> Or minimally at least, the aggregate needs to gain an entire set of
> pm_ops, which gets called in the right order if any of the component's
> pm_ops gets called. Wiring that all up will be major surgery I think.

Yes the root of the problem is that the aggregate device is not part of
the kernel's driver model. It's basically a pair of probe and remove
functions and nothing else.

>
> I guess another option would be trying to figure out how the aggreate
> registration could fail with EPROBE_DEFER until all the parts are
> there, to guarantee the right ordering. Not sure that will work with
> the current component users though.

I had that written up and it worked for me but I was concerned it would
break other users, plus it didn't feel correct to defer probe just
because the components weren't probed yet. The aggregate device wasn't
waiting for the components to probe, so why change that? For msm it led
to more work too, because we have some child devices that are removed if
the aggregate device fails to probe, meaning we go through a few cycles
of add/remove of the components this way. If the aggregate doesn't defer
probe then we can avoid the other components adding/removing over and
over again until the final component, DSI that is waiting for the
bridge, can probe.

That's why I opted to move the device on the list to the tail. I'm
hoping that most component users (which is basically drm?) don't do much
with the device they're using to host the aggregate device besides tell
drm that the display pipeline is here now. Everything else would be in
the bind/unbind callbacks. If there was a 'struct device', or maybe a
'struct class', that was associated with the whole display pipeline and
aggregate device we could attach the pm ops to that. Would 'struct
drm_device' be that? If yes we could make some drm functions that let
you attach PM ops to a struct device inside of that and make it a child
of the device that calls drm_dev_alloc().

>
> > It's an interesting idea to trigger shutdown when the component device
> > is unbound. Are you suggesting that the i2c bridge device have a
> > 'shutdown' callback, that essentially removes the bridge from the
> > encoder chain via mipi_dsi_detach() and then drm_bridge_remove()?
> > Presumably that would somehow tell the DSI encoder that it should stop
> > trying to use the i2c bridge and then drm_atomic_helper_shutdown()
> > wouldn't try to traverse beyond the DSI to shut things down.
>
> Nope, we don't want to unbind the driver on shutdown. I somehow
> thought you're dying in there, which is why I wondered what's going
> on. But since you're dying in pm_ops->shutdown, that's a different
> thing.

I'm dying in msm_pdev_shutdown(), but yes pm_ops are similar.

>
> > I will try it, but then I wonder about things like system wide
> > suspend/resume too. The drm encoder chain would need to reimplement the
> > logic for system wide suspend/resume so that any PM ops attached to the
> > msm device run in the correct order. Right now the bridge PM ops will
> > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > After this change, the msm PM ops will run, the bridge PM ops will run,
> > and then the i2c bus PM ops will run. It feels like that could be a
> > problem if we're suspending the DSI encoder while the bridge is still
> > active.
>
> Yup suspend/resume has the exact same problem as shutdown.

I think suspend/resume has the exact opposite problem. At least I think
the correct order is to suspend the bridge, then the encoder, i.e. DSI,
like is happening today. It looks like drm_atomic_helper_shutdown()
operates from the top down when we want bottom up? I admit I have no
idea what is supposed to happen here.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-10 19:08         ` Stephen Boyd
@ 2021-05-11 10:52           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2021-05-11 10:52 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Daniel Vetter, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Rafael J. Wysocki, Russell King, Rob Clark, dri-devel

On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:

[cut]

>
> >
> > > I will try it, but then I wonder about things like system wide
> > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > logic for system wide suspend/resume so that any PM ops attached to the
> > > msm device run in the correct order. Right now the bridge PM ops will
> > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > and then the i2c bus PM ops will run. It feels like that could be a
> > > problem if we're suspending the DSI encoder while the bridge is still
> > > active.
> >
> > Yup suspend/resume has the exact same problem as shutdown.
>
> I think suspend/resume has the exact opposite problem. At least I think
> the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> like is happening today. It looks like drm_atomic_helper_shutdown()
> operates from the top down when we want bottom up? I admit I have no
> idea what is supposed to happen here.

Why would the system-wide suspend ordering be different from the
shutdown ordering?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 10:52           ` Rafael J. Wysocki
  0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2021-05-11 10:52 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Linux Kernel Mailing List,
	dri-devel, Russell King

On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:

[cut]

>
> >
> > > I will try it, but then I wonder about things like system wide
> > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > logic for system wide suspend/resume so that any PM ops attached to the
> > > msm device run in the correct order. Right now the bridge PM ops will
> > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > and then the i2c bus PM ops will run. It feels like that could be a
> > > problem if we're suspending the DSI encoder while the bridge is still
> > > active.
> >
> > Yup suspend/resume has the exact same problem as shutdown.
>
> I think suspend/resume has the exact opposite problem. At least I think
> the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> like is happening today. It looks like drm_atomic_helper_shutdown()
> operates from the top down when we want bottom up? I admit I have no
> idea what is supposed to happen here.

Why would the system-wide suspend ordering be different from the
shutdown ordering?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-11 10:52           ` Rafael J. Wysocki
@ 2021-05-11 13:39             ` Daniel Vetter
  -1 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2021-05-11 13:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Boyd, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Russell King, Rob Clark, dri-devel

On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> [cut]
>
> >
> > >
> > > > I will try it, but then I wonder about things like system wide
> > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > active.
> > >
> > > Yup suspend/resume has the exact same problem as shutdown.
> >
> > I think suspend/resume has the exact opposite problem. At least I think
> > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > like is happening today. It looks like drm_atomic_helper_shutdown()
> > operates from the top down when we want bottom up? I admit I have no
> > idea what is supposed to happen here.
>
> Why would the system-wide suspend ordering be different from the
> shutdown ordering?

At least my point was that both shutdown and suspend/resume have the
same problem, and the righ fix is (I think at least) to add these
hooks to the component.c aggregate ops structure. Hence just adding
new callbacks for shutdown will be an incomplete solution.

I don't feel like changing the global device order is the right
approach, since essentially that's what component was meant to fix.
Except it's incomplete since it only provides a solution for
bind/unbind and not for shutdown or suspend/resume as other global
state changes. I think some drivers "fixed" this by putting stuff like
drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
make sure that everything is ready with that trick. But that doesn't
compose very well :-/
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 13:39             ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2021-05-11 13:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, dri-devel,
	Stephen Boyd, Russell King

On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> [cut]
>
> >
> > >
> > > > I will try it, but then I wonder about things like system wide
> > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > active.
> > >
> > > Yup suspend/resume has the exact same problem as shutdown.
> >
> > I think suspend/resume has the exact opposite problem. At least I think
> > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > like is happening today. It looks like drm_atomic_helper_shutdown()
> > operates from the top down when we want bottom up? I admit I have no
> > idea what is supposed to happen here.
>
> Why would the system-wide suspend ordering be different from the
> shutdown ordering?

At least my point was that both shutdown and suspend/resume have the
same problem, and the righ fix is (I think at least) to add these
hooks to the component.c aggregate ops structure. Hence just adding
new callbacks for shutdown will be an incomplete solution.

I don't feel like changing the global device order is the right
approach, since essentially that's what component was meant to fix.
Except it's incomplete since it only provides a solution for
bind/unbind and not for shutdown or suspend/resume as other global
state changes. I think some drivers "fixed" this by putting stuff like
drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
make sure that everything is ready with that trick. But that doesn't
compose very well :-/
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-08  7:41 ` Stephen Boyd
@ 2021-05-11 14:42   ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 34+ messages in thread
From: Russell King - ARM Linux admin @ 2021-05-11 14:42 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Greg Kroah-Hartman, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, Rob Clark, dri-devel

On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.

You really are not supposed to be doing anything with component devices
once they have been unbound. You can do stuff with them only between the
bind() and the unbind() callbacks for the host device.

Access to the host devices outside of that is totally undefined and
should not be done.

The shutdown callback should be fine as long as the other devices are
still bound, but there will be implications if the shutdown order
matters.

However, randomly pulling devices around in the DPM list sounds to me
like a very bad idea. What happens if such re-orderings result in a
child device being shutdown after a parent device has been shut down?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 14:42   ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 34+ messages in thread
From: Russell King - ARM Linux admin @ 2021-05-11 14:42 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Rafael J. Wysocki, Daniel Vetter, linux-kernel, dri-devel,
	Greg Kroah-Hartman

On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> Within the component device framework this usually isn't that bad
> because the real driver work is done at bind time via
> component{,master}_ops::bind(). It becomes a problem when the driver
> core, or host driver, wants to operate on the component device outside
> of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> driver core doesn't understand the relationship between the host device
> and the component devices and could possibly try to operate on component
> devices when they're already removed from the system or shut down.

You really are not supposed to be doing anything with component devices
once they have been unbound. You can do stuff with them only between the
bind() and the unbind() callbacks for the host device.

Access to the host devices outside of that is totally undefined and
should not be done.

The shutdown callback should be fine as long as the other devices are
still bound, but there will be implications if the shutdown order
matters.

However, randomly pulling devices around in the DPM list sounds to me
like a very bad idea. What happens if such re-orderings result in a
child device being shutdown after a parent device has been shut down?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-11 10:52           ` Rafael J. Wysocki
@ 2021-05-11 17:00             ` Stephen Boyd
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-11 17:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Daniel Vetter, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Russell King, Rob Clark, dri-devel

Quoting Rafael J. Wysocki (2021-05-11 03:52:06)
> On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> [cut]
>
> >
> > >
> > > > I will try it, but then I wonder about things like system wide
> > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > active.
> > >
> > > Yup suspend/resume has the exact same problem as shutdown.
> >
> > I think suspend/resume has the exact opposite problem. At least I think
> > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > like is happening today. It looks like drm_atomic_helper_shutdown()
> > operates from the top down when we want bottom up? I admit I have no
> > idea what is supposed to happen here.
>
> Why would the system-wide suspend ordering be different from the
> shutdown ordering?

I don't really know. I'm mostly noting that today the order of suspend
is to suspend the bridge device first and then the aggregate device. If
the suspend of the aggregate device is traversing the devices like
drm_atomic_helper_shutdown() then it would operate on the bridge device
after it has been suspended, like is happening during shutdown. But it
looks like that isn't happening. At least for the msm driver we're
suspending the aggregate device after the bridge, and there are some
weird usages of prepare and complete in there (see msm_pm_prepare() and
msm_pm_complete) which makes me think that it's all working around this
component code.

The prepare phase is going to suspend the display pipeline, and then the
bridge device will run its suspend hooks, and then the aggregate driver
will run its suspend hooks. If we had a proper device for the aggregate
device instead of the bind/unbind component hooks we could clean this
up.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 17:00             ` Stephen Boyd
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-11 17:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, dri-devel, Russell King

Quoting Rafael J. Wysocki (2021-05-11 03:52:06)
> On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> [cut]
>
> >
> > >
> > > > I will try it, but then I wonder about things like system wide
> > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > active.
> > >
> > > Yup suspend/resume has the exact same problem as shutdown.
> >
> > I think suspend/resume has the exact opposite problem. At least I think
> > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > like is happening today. It looks like drm_atomic_helper_shutdown()
> > operates from the top down when we want bottom up? I admit I have no
> > idea what is supposed to happen here.
>
> Why would the system-wide suspend ordering be different from the
> shutdown ordering?

I don't really know. I'm mostly noting that today the order of suspend
is to suspend the bridge device first and then the aggregate device. If
the suspend of the aggregate device is traversing the devices like
drm_atomic_helper_shutdown() then it would operate on the bridge device
after it has been suspended, like is happening during shutdown. But it
looks like that isn't happening. At least for the msm driver we're
suspending the aggregate device after the bridge, and there are some
weird usages of prepare and complete in there (see msm_pm_prepare() and
msm_pm_complete) which makes me think that it's all working around this
component code.

The prepare phase is going to suspend the display pipeline, and then the
bridge device will run its suspend hooks, and then the aggregate driver
will run its suspend hooks. If we had a proper device for the aggregate
device instead of the bind/unbind component hooks we could clean this
up.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-11 13:39             ` Daniel Vetter
@ 2021-05-11 17:19               ` Stephen Boyd
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-11 17:19 UTC (permalink / raw)
  To: Daniel Vetter, Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Russell King,
	Rob Clark, dri-devel

Quoting Daniel Vetter (2021-05-11 06:39:36)
> On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > [cut]
> >
> > >
> > > >
> > > > > I will try it, but then I wonder about things like system wide
> > > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > active.
> > > >
> > > > Yup suspend/resume has the exact same problem as shutdown.
> > >
> > > I think suspend/resume has the exact opposite problem. At least I think
> > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > operates from the top down when we want bottom up? I admit I have no
> > > idea what is supposed to happen here.
> >
> > Why would the system-wide suspend ordering be different from the
> > shutdown ordering?
>
> At least my point was that both shutdown and suspend/resume have the
> same problem, and the righ fix is (I think at least) to add these
> hooks to the component.c aggregate ops structure. Hence just adding
> new callbacks for shutdown will be an incomplete solution.

To add proper hooks to component.c we'll need to make the aggregate
device into a 'struct device' and make a bus for them that essentially
adds the aggregate device to the bus once all the components are
registered. The bind/unbind can be ported to probe/remove, and then the
aggregate driver can get PM ops that run before the component devices
run their PM ops.

Let me go try it out and see if I can make it minimally invasive so that
the migration path is simple.

>
> I don't feel like changing the global device order is the right
> approach, since essentially that's what component was meant to fix.
> Except it's incomplete since it only provides a solution for
> bind/unbind and not for shutdown or suspend/resume as other global
> state changes. I think some drivers "fixed" this by putting stuff like
> drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
> make sure that everything is ready with that trick. But that doesn't
> compose very well :-/

Yeah it looks like msm is using prepare/complete for this so that it can
jump in early and suspend the display pipeline before the components
suspend themselves. The shutdown path only has one callback so we can't
play the same games.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 17:19               ` Stephen Boyd
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-11 17:19 UTC (permalink / raw)
  To: Daniel Vetter, Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, dri-devel, Russell King

Quoting Daniel Vetter (2021-05-11 06:39:36)
> On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > [cut]
> >
> > >
> > > >
> > > > > I will try it, but then I wonder about things like system wide
> > > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > active.
> > > >
> > > > Yup suspend/resume has the exact same problem as shutdown.
> > >
> > > I think suspend/resume has the exact opposite problem. At least I think
> > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > operates from the top down when we want bottom up? I admit I have no
> > > idea what is supposed to happen here.
> >
> > Why would the system-wide suspend ordering be different from the
> > shutdown ordering?
>
> At least my point was that both shutdown and suspend/resume have the
> same problem, and the righ fix is (I think at least) to add these
> hooks to the component.c aggregate ops structure. Hence just adding
> new callbacks for shutdown will be an incomplete solution.

To add proper hooks to component.c we'll need to make the aggregate
device into a 'struct device' and make a bus for them that essentially
adds the aggregate device to the bus once all the components are
registered. The bind/unbind can be ported to probe/remove, and then the
aggregate driver can get PM ops that run before the component devices
run their PM ops.

Let me go try it out and see if I can make it minimally invasive so that
the migration path is simple.

>
> I don't feel like changing the global device order is the right
> approach, since essentially that's what component was meant to fix.
> Except it's incomplete since it only provides a solution for
> bind/unbind and not for shutdown or suspend/resume as other global
> state changes. I think some drivers "fixed" this by putting stuff like
> drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
> make sure that everything is ready with that trick. But that doesn't
> compose very well :-/

Yeah it looks like msm is using prepare/complete for this so that it can
jump in early and suspend the display pipeline before the components
suspend themselves. The shutdown path only has one callback so we can't
play the same games.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-11 17:00             ` Stephen Boyd
@ 2021-05-11 17:20               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2021-05-11 17:20 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Rafael J. Wysocki, Daniel Vetter, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Russell King, Rob Clark, dri-devel

On Tue, May 11, 2021 at 7:00 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Rafael J. Wysocki (2021-05-11 03:52:06)
> > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > [cut]
> >
> > >
> > > >
> > > > > I will try it, but then I wonder about things like system wide
> > > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > active.
> > > >
> > > > Yup suspend/resume has the exact same problem as shutdown.
> > >
> > > I think suspend/resume has the exact opposite problem. At least I think
> > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > operates from the top down when we want bottom up? I admit I have no
> > > idea what is supposed to happen here.
> >
> > Why would the system-wide suspend ordering be different from the
> > shutdown ordering?
>
> I don't really know. I'm mostly noting that today the order of suspend
> is to suspend the bridge device first and then the aggregate device. If
> the suspend of the aggregate device is traversing the devices like
> drm_atomic_helper_shutdown() then it would operate on the bridge device
> after it has been suspended, like is happening during shutdown. But it
> looks like that isn't happening. At least for the msm driver we're
> suspending the aggregate device after the bridge, and there are some
> weird usages of prepare and complete in there (see msm_pm_prepare() and
> msm_pm_complete) which makes me think that it's all working around this
> component code.

Well, it looks like the "prepare" phase is used sort-of against the
rules (because "prepare" is not supposed to make changes to the
hardware configuration or at least that is not its role) in order to
work around an ordering issue that is present in shutdown which
doesn't have a "prepare" phase.

> The prepare phase is going to suspend the display pipeline, and then the
> bridge device will run its suspend hooks, and then the aggregate driver
> will run its suspend hooks. If we had a proper device for the aggregate
> device instead of the bind/unbind component hooks we could clean this
> up.

I'm not sufficiently familiar with the component code to add anything
constructive here, but generally speaking it looks like the "natural"
dpm_list ordering does not match the order in which the devices in
question should be suspended (or shut down for that matter), so indeed
it is necessary to reorder dpm_list this way or another.

Please also note that it generally may not be sufficient to reorder
dpm_list if the devices are suspended and resumed asynchronously
during system-wide transitions, because in that case the callbacks of
different devices are only started in the dpm_list order, but they may
be completed in a different order (and of course they may run in
parallel with each other).

Shutdown is simpler, because it runs the callback synchronously for
all devices IIRC.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 17:20               ` Rafael J. Wysocki
  0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2021-05-11 17:20 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Linux Kernel Mailing List,
	dri-devel, Russell King

On Tue, May 11, 2021 at 7:00 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Rafael J. Wysocki (2021-05-11 03:52:06)
> > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > [cut]
> >
> > >
> > > >
> > > > > I will try it, but then I wonder about things like system wide
> > > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > active.
> > > >
> > > > Yup suspend/resume has the exact same problem as shutdown.
> > >
> > > I think suspend/resume has the exact opposite problem. At least I think
> > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > operates from the top down when we want bottom up? I admit I have no
> > > idea what is supposed to happen here.
> >
> > Why would the system-wide suspend ordering be different from the
> > shutdown ordering?
>
> I don't really know. I'm mostly noting that today the order of suspend
> is to suspend the bridge device first and then the aggregate device. If
> the suspend of the aggregate device is traversing the devices like
> drm_atomic_helper_shutdown() then it would operate on the bridge device
> after it has been suspended, like is happening during shutdown. But it
> looks like that isn't happening. At least for the msm driver we're
> suspending the aggregate device after the bridge, and there are some
> weird usages of prepare and complete in there (see msm_pm_prepare() and
> msm_pm_complete) which makes me think that it's all working around this
> component code.

Well, it looks like the "prepare" phase is used sort-of against the
rules (because "prepare" is not supposed to make changes to the
hardware configuration or at least that is not its role) in order to
work around an ordering issue that is present in shutdown which
doesn't have a "prepare" phase.

> The prepare phase is going to suspend the display pipeline, and then the
> bridge device will run its suspend hooks, and then the aggregate driver
> will run its suspend hooks. If we had a proper device for the aggregate
> device instead of the bind/unbind component hooks we could clean this
> up.

I'm not sufficiently familiar with the component code to add anything
constructive here, but generally speaking it looks like the "natural"
dpm_list ordering does not match the order in which the devices in
question should be suspended (or shut down for that matter), so indeed
it is necessary to reorder dpm_list this way or another.

Please also note that it generally may not be sufficient to reorder
dpm_list if the devices are suspended and resumed asynchronously
during system-wide transitions, because in that case the callbacks of
different devices are only started in the dpm_list order, but they may
be completed in a different order (and of course they may run in
parallel with each other).

Shutdown is simpler, because it runs the callback synchronously for
all devices IIRC.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-11 14:42   ` Russell King - ARM Linux admin
@ 2021-05-11 17:22     ` Stephen Boyd
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-11 17:22 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Greg Kroah-Hartman, linux-kernel, Rafael J. Wysocki,
	Daniel Vetter, Rob Clark, dri-devel

Quoting Russell King - ARM Linux admin (2021-05-11 07:42:37)
> On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> > Within the component device framework this usually isn't that bad
> > because the real driver work is done at bind time via
> > component{,master}_ops::bind(). It becomes a problem when the driver
> > core, or host driver, wants to operate on the component device outside
> > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > driver core doesn't understand the relationship between the host device
> > and the component devices and could possibly try to operate on component
> > devices when they're already removed from the system or shut down.
>
> You really are not supposed to be doing anything with component devices
> once they have been unbound. You can do stuff with them only between the
> bind() and the unbind() callbacks for the host device.

Got it. The device is not unbound in this case so this isn't the
problem.

>
> Access to the host devices outside of that is totally undefined and
> should not be done.
>
> The shutdown callback should be fine as long as the other devices are
> still bound, but there will be implications if the shutdown order
> matters.
>
> However, randomly pulling devices around in the DPM list sounds to me
> like a very bad idea. What happens if such re-orderings result in a
> child device being shutdown after a parent device has been shut down?
>

Fair enough. I'll cook up a 'component' bus and see if that can fix this
properly. It will add a new device for the aggregate driver that does
the bind/unbind so the host/parent device will still be ordered on the
DPM list at the same place. The new aggregate device will be after the
components and we'll attach the PM ops and shutdown hooks to that.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 17:22     ` Stephen Boyd
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Boyd @ 2021-05-11 17:22 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Rafael J. Wysocki, Daniel Vetter, linux-kernel, dri-devel,
	Greg Kroah-Hartman

Quoting Russell King - ARM Linux admin (2021-05-11 07:42:37)
> On Sat, May 08, 2021 at 12:41:18AM -0700, Stephen Boyd wrote:
> > Within the component device framework this usually isn't that bad
> > because the real driver work is done at bind time via
> > component{,master}_ops::bind(). It becomes a problem when the driver
> > core, or host driver, wants to operate on the component device outside
> > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > driver core doesn't understand the relationship between the host device
> > and the component devices and could possibly try to operate on component
> > devices when they're already removed from the system or shut down.
>
> You really are not supposed to be doing anything with component devices
> once they have been unbound. You can do stuff with them only between the
> bind() and the unbind() callbacks for the host device.

Got it. The device is not unbound in this case so this isn't the
problem.

>
> Access to the host devices outside of that is totally undefined and
> should not be done.
>
> The shutdown callback should be fine as long as the other devices are
> still bound, but there will be implications if the shutdown order
> matters.
>
> However, randomly pulling devices around in the DPM list sounds to me
> like a very bad idea. What happens if such re-orderings result in a
> child device being shutdown after a parent device has been shut down?
>

Fair enough. I'll cook up a 'component' bus and see if that can fix this
properly. It will add a new device for the aggregate driver that does
the bind/unbind so the host/parent device will still be ordered on the
DPM list at the same place. The new aggregate device will be after the
components and we'll attach the PM ops and shutdown hooks to that.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-11 17:19               ` Stephen Boyd
@ 2021-05-11 17:25                 ` Daniel Vetter
  -1 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2021-05-11 17:25 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Daniel Vetter, Rafael J. Wysocki, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Russell King, Rob Clark, dri-devel

On Tue, May 11, 2021 at 10:19:09AM -0700, Stephen Boyd wrote:
> Quoting Daniel Vetter (2021-05-11 06:39:36)
> > On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > [cut]
> > >
> > > >
> > > > >
> > > > > > I will try it, but then I wonder about things like system wide
> > > > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > > active.
> > > > >
> > > > > Yup suspend/resume has the exact same problem as shutdown.
> > > >
> > > > I think suspend/resume has the exact opposite problem. At least I think
> > > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > > operates from the top down when we want bottom up? I admit I have no
> > > > idea what is supposed to happen here.
> > >
> > > Why would the system-wide suspend ordering be different from the
> > > shutdown ordering?
> >
> > At least my point was that both shutdown and suspend/resume have the
> > same problem, and the righ fix is (I think at least) to add these
> > hooks to the component.c aggregate ops structure. Hence just adding
> > new callbacks for shutdown will be an incomplete solution.
> 
> To add proper hooks to component.c we'll need to make the aggregate
> device into a 'struct device' and make a bus for them that essentially
> adds the aggregate device to the bus once all the components are
> registered. The bind/unbind can be ported to probe/remove, and then the
> aggregate driver can get PM ops that run before the component devices
> run their PM ops.
> 
> Let me go try it out and see if I can make it minimally invasive so that
> the migration path is simple.

Thanks for volunteeering. Please cc Greg KH so we make sure we're not
doing this wrongly wrt the device model.
-Daniel

> > I don't feel like changing the global device order is the right
> > approach, since essentially that's what component was meant to fix.
> > Except it's incomplete since it only provides a solution for
> > bind/unbind and not for shutdown or suspend/resume as other global
> > state changes. I think some drivers "fixed" this by putting stuff like
> > drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
> > make sure that everything is ready with that trick. But that doesn't
> > compose very well :-/
> 
> Yeah it looks like msm is using prepare/complete for this so that it can
> jump in early and suspend the display pipeline before the components
> suspend themselves. The shutdown path only has one callback so we can't
> play the same games.

Yeah there's tons of hacks. i915 component usage with audio has similar
tricks to make suspend/resume work.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 17:25                 ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2021-05-11 17:25 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Linux Kernel Mailing List,
	dri-devel, Russell King

On Tue, May 11, 2021 at 10:19:09AM -0700, Stephen Boyd wrote:
> Quoting Daniel Vetter (2021-05-11 06:39:36)
> > On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > [cut]
> > >
> > > >
> > > > >
> > > > > > I will try it, but then I wonder about things like system wide
> > > > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > > active.
> > > > >
> > > > > Yup suspend/resume has the exact same problem as shutdown.
> > > >
> > > > I think suspend/resume has the exact opposite problem. At least I think
> > > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > > operates from the top down when we want bottom up? I admit I have no
> > > > idea what is supposed to happen here.
> > >
> > > Why would the system-wide suspend ordering be different from the
> > > shutdown ordering?
> >
> > At least my point was that both shutdown and suspend/resume have the
> > same problem, and the righ fix is (I think at least) to add these
> > hooks to the component.c aggregate ops structure. Hence just adding
> > new callbacks for shutdown will be an incomplete solution.
> 
> To add proper hooks to component.c we'll need to make the aggregate
> device into a 'struct device' and make a bus for them that essentially
> adds the aggregate device to the bus once all the components are
> registered. The bind/unbind can be ported to probe/remove, and then the
> aggregate driver can get PM ops that run before the component devices
> run their PM ops.
> 
> Let me go try it out and see if I can make it minimally invasive so that
> the migration path is simple.

Thanks for volunteeering. Please cc Greg KH so we make sure we're not
doing this wrongly wrt the device model.
-Daniel

> > I don't feel like changing the global device order is the right
> > approach, since essentially that's what component was meant to fix.
> > Except it's incomplete since it only provides a solution for
> > bind/unbind and not for shutdown or suspend/resume as other global
> > state changes. I think some drivers "fixed" this by putting stuff like
> > drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
> > make sure that everything is ready with that trick. But that doesn't
> > compose very well :-/
> 
> Yeah it looks like msm is using prepare/complete for this so that it can
> jump in early and suspend the display pipeline before the components
> suspend themselves. The shutdown path only has one callback so we can't
> play the same games.

Yeah there's tons of hacks. i915 component usage with audio has similar
tricks to make suspend/resume work.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-10 11:59   ` Rafael J. Wysocki
@ 2021-05-11 17:59     ` Saravana Kannan
  -1 siblings, 0 replies; 34+ messages in thread
From: Saravana Kannan @ 2021-05-11 17:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stephen Boyd, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Daniel Vetter, Russell King, Rob Clark, dri-devel

On Mon, May 10, 2021 at 4:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Sat, May 8, 2021 at 9:41 AM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > The device lists are poorly ordered when the component device code is
> > used. This is because component_master_add_with_match() returns 0
> > regardless of component devices calling component_add() first. It can
> > really only fail if an allocation fails, in which case everything is
> > going bad and we're out of memory. The host device (called master_dev in
> > the code), can succeed at probe and be put on the device lists before
> > any of the component devices are probed and put on the lists.
> >
> > Within the component device framework this usually isn't that bad
> > because the real driver work is done at bind time via
> > component{,master}_ops::bind(). It becomes a problem when the driver
> > core, or host driver, wants to operate on the component device outside
> > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > driver core doesn't understand the relationship between the host device
> > and the component devices and could possibly try to operate on component
> > devices when they're already removed from the system or shut down.
> >
> > Normally, device links or probe defer would reorder the lists and put
> > devices that depend on other devices in the lists at the correct
> > location, but with component devices this doesn't happen because this
> > information isn't expressed anywhere. Drivers simply succeed at
> > registering their component or host with the component framework and
> > wait for their bind() callback to be called once the other components
> > are ready. We could make various device links between 'master_dev' and
> > 'component->dev' but it's not necessary. Let's simply move the hosting
> > device to the end of the device lists when the component device fully
> > binds. This way we know that all components are present and have probed
> > properly and now the host device has really probed so it's safe to
> > assume the host driver ops can operate on any component device.
>
> Moving a device to the end of dpm_list is generally risky in cases
> when some dependency information may be missing.
>
> For example, if there is a device depending on the hosting one, but
> that dependency is not represented by a device link or a direct
> ancestor-descendant relationship (or generally a path in the device
> dependency graph leading from one of them to the other), then moving
> it to the end of dpm_list would cause system-wide suspend to fail (the
> hosting device would be suspended before the one depending on it).
>
> That may not be a concern here, but at least it would be good to
> document why it is not a concern.

I've been reading up on the component code and at least one of the msm
drivers that use it. I've also read part of the other thread that's
going on.

If device links work, why not use them? Also, are you trying this with
fw_devlink=on?

Looks like what you are missing (I can't tell without looking at the
DT/your specific driver) is a device link from the DSI bridge to the
I2C bridge/controller? If we add that, will things work properly? If
yes, why not add that?

Also, can you please add me to all the threads on this topic (if you
reply to them) so it's easier for me to reply?

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 17:59     ` Saravana Kannan
  0 siblings, 0 replies; 34+ messages in thread
From: Saravana Kannan @ 2021-05-11 17:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Daniel Vetter, Linux Kernel Mailing List, dri-devel,
	Stephen Boyd, Greg Kroah-Hartman, Russell King

On Mon, May 10, 2021 at 4:59 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Sat, May 8, 2021 at 9:41 AM Stephen Boyd <swboyd@chromium.org> wrote:
> >
> > The device lists are poorly ordered when the component device code is
> > used. This is because component_master_add_with_match() returns 0
> > regardless of component devices calling component_add() first. It can
> > really only fail if an allocation fails, in which case everything is
> > going bad and we're out of memory. The host device (called master_dev in
> > the code), can succeed at probe and be put on the device lists before
> > any of the component devices are probed and put on the lists.
> >
> > Within the component device framework this usually isn't that bad
> > because the real driver work is done at bind time via
> > component{,master}_ops::bind(). It becomes a problem when the driver
> > core, or host driver, wants to operate on the component device outside
> > of the bind/unbind functions, e.g. via 'remove' or 'shutdown'. The
> > driver core doesn't understand the relationship between the host device
> > and the component devices and could possibly try to operate on component
> > devices when they're already removed from the system or shut down.
> >
> > Normally, device links or probe defer would reorder the lists and put
> > devices that depend on other devices in the lists at the correct
> > location, but with component devices this doesn't happen because this
> > information isn't expressed anywhere. Drivers simply succeed at
> > registering their component or host with the component framework and
> > wait for their bind() callback to be called once the other components
> > are ready. We could make various device links between 'master_dev' and
> > 'component->dev' but it's not necessary. Let's simply move the hosting
> > device to the end of the device lists when the component device fully
> > binds. This way we know that all components are present and have probed
> > properly and now the host device has really probed so it's safe to
> > assume the host driver ops can operate on any component device.
>
> Moving a device to the end of dpm_list is generally risky in cases
> when some dependency information may be missing.
>
> For example, if there is a device depending on the hosting one, but
> that dependency is not represented by a device link or a direct
> ancestor-descendant relationship (or generally a path in the device
> dependency graph leading from one of them to the other), then moving
> it to the end of dpm_list would cause system-wide suspend to fail (the
> hosting device would be suspended before the one depending on it).
>
> That may not be a concern here, but at least it would be good to
> document why it is not a concern.

I've been reading up on the component code and at least one of the msm
drivers that use it. I've also read part of the other thread that's
going on.

If device links work, why not use them? Also, are you trying this with
fw_devlink=on?

Looks like what you are missing (I can't tell without looking at the
DT/your specific driver) is a device link from the DSI bridge to the
I2C bridge/controller? If we add that, will things work properly? If
yes, why not add that?

Also, can you please add me to all the threads on this topic (if you
reply to them) so it's easier for me to reply?

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-11 17:19               ` Stephen Boyd
@ 2021-05-11 19:12                 ` Saravana Kannan
  -1 siblings, 0 replies; 34+ messages in thread
From: Saravana Kannan @ 2021-05-11 19:12 UTC (permalink / raw)
  To: swboyd
  Cc: daniel, dri-devel, gregkh, linux-kernel, rafael, rmk+kernel,
	robdclark, Saravana Kannan, kernel-team

Quoting Stephen Boyd <swboyd@chromium.org>:
> Quoting Daniel Vetter (2021-05-11 06:39:36)
> > On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > [cut]
> > >
> > > >
> > > > >
> > > > > > I will try it, but then I wonder about things like system wide
> > > > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > > active.
> > > > >
> > > > > Yup suspend/resume has the exact same problem as shutdown.
> > > >
> > > > I think suspend/resume has the exact opposite problem. At least I think
> > > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > > operates from the top down when we want bottom up? I admit I have no
> > > > idea what is supposed to happen here.
> > >
> > > Why would the system-wide suspend ordering be different from the
> > > shutdown ordering?
> >
> > At least my point was that both shutdown and suspend/resume have the
> > same problem, and the righ fix is (I think at least) to add these
> > hooks to the component.c aggregate ops structure. Hence just adding
> > new callbacks for shutdown will be an incomplete solution.
> 
> To add proper hooks to component.c we'll need to make the aggregate
> device into a 'struct device' and make a bus for them that essentially
> adds the aggregate device to the bus once all the components are
> registered. The bind/unbind can be ported to probe/remove, and then the
> aggregate driver can get PM ops that run before the component devices
> run their PM ops.
> 
> Let me go try it out and see if I can make it minimally invasive so that
> the migration path is simple.

Yes, please! This is the right solution. We should put all these aggregate
devices in some "aggregate" bus (needs a better name) and NOT a drm bus because
AFAICT components are not specific to the drm framework.

You can also create device links (without the STATELESS flag, and a bunch of
other flags) from the aggregate device to all the component devices and this
will automatically fix all the ordering issues too.

I'd be happy to help with this if you want and happy to review this too. Please
Cc me when you send out this series.

-Saravana

> 
> >
> > I don't feel like changing the global device order is the right
> > approach, since essentially that's what component was meant to fix.
> > Except it's incomplete since it only provides a solution for
> > bind/unbind and not for shutdown or suspend/resume as other global
> > state changes. I think some drivers "fixed" this by putting stuff like
> > drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
> > make sure that everything is ready with that trick. But that doesn't
> > compose very well :-/
> 
> Yeah it looks like msm is using prepare/complete for this so that it can
> jump in early and suspend the display pipeline before the components
> suspend themselves. The shutdown path only has one callback so we can't
> play the same games.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-11 19:12                 ` Saravana Kannan
  0 siblings, 0 replies; 34+ messages in thread
From: Saravana Kannan @ 2021-05-11 19:12 UTC (permalink / raw)
  To: swboyd
  Cc: Saravana Kannan, rafael, gregkh, linux-kernel, dri-devel,
	rmk+kernel, kernel-team

Quoting Stephen Boyd <swboyd@chromium.org>:
> Quoting Daniel Vetter (2021-05-11 06:39:36)
> > On Tue, May 11, 2021 at 12:52 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Mon, May 10, 2021 at 9:08 PM Stephen Boyd <swboyd@chromium.org> wrote:
> > >
> > > [cut]
> > >
> > > >
> > > > >
> > > > > > I will try it, but then I wonder about things like system wide
> > > > > > suspend/resume too. The drm encoder chain would need to reimplement the
> > > > > > logic for system wide suspend/resume so that any PM ops attached to the
> > > > > > msm device run in the correct order. Right now the bridge PM ops will
> > > > > > run, the i2c bus PM ops will run, and then the msm PM ops will run.
> > > > > > After this change, the msm PM ops will run, the bridge PM ops will run,
> > > > > > and then the i2c bus PM ops will run. It feels like that could be a
> > > > > > problem if we're suspending the DSI encoder while the bridge is still
> > > > > > active.
> > > > >
> > > > > Yup suspend/resume has the exact same problem as shutdown.
> > > >
> > > > I think suspend/resume has the exact opposite problem. At least I think
> > > > the correct order is to suspend the bridge, then the encoder, i.e. DSI,
> > > > like is happening today. It looks like drm_atomic_helper_shutdown()
> > > > operates from the top down when we want bottom up? I admit I have no
> > > > idea what is supposed to happen here.
> > >
> > > Why would the system-wide suspend ordering be different from the
> > > shutdown ordering?
> >
> > At least my point was that both shutdown and suspend/resume have the
> > same problem, and the righ fix is (I think at least) to add these
> > hooks to the component.c aggregate ops structure. Hence just adding
> > new callbacks for shutdown will be an incomplete solution.
> 
> To add proper hooks to component.c we'll need to make the aggregate
> device into a 'struct device' and make a bus for them that essentially
> adds the aggregate device to the bus once all the components are
> registered. The bind/unbind can be ported to probe/remove, and then the
> aggregate driver can get PM ops that run before the component devices
> run their PM ops.
> 
> Let me go try it out and see if I can make it minimally invasive so that
> the migration path is simple.

Yes, please! This is the right solution. We should put all these aggregate
devices in some "aggregate" bus (needs a better name) and NOT a drm bus because
AFAICT components are not specific to the drm framework.

You can also create device links (without the STATELESS flag, and a bunch of
other flags) from the aggregate device to all the component devices and this
will automatically fix all the ordering issues too.

I'd be happy to help with this if you want and happy to review this too. Please
Cc me when you send out this series.

-Saravana

> 
> >
> > I don't feel like changing the global device order is the right
> > approach, since essentially that's what component was meant to fix.
> > Except it's incomplete since it only provides a solution for
> > bind/unbind and not for shutdown or suspend/resume as other global
> > state changes. I think some drivers "fixed" this by putting stuff like
> > drm_atomic_helper_shutdown/suspend/resume into early/late hooks, to
> > make sure that everything is ready with that trick. But that doesn't
> > compose very well :-/
> 
> Yeah it looks like msm is using prepare/complete for this so that it can
> jump in early and suspend the display pipeline before the components
> suspend themselves. The shutdown path only has one callback so we can't
> play the same games.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
  2021-05-10 16:05   ` Daniel Vetter
@ 2021-05-13 13:40     ` Russell King - ARM Linux admin
  -1 siblings, 0 replies; 34+ messages in thread
From: Russell King - ARM Linux admin @ 2021-05-13 13:40 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Stephen Boyd, Greg Kroah-Hartman, linux-kernel,
	Rafael J. Wysocki, Daniel Vetter, Rob Clark, dri-devel

On Mon, May 10, 2021 at 06:05:21PM +0200, Daniel Vetter wrote:
> Entirely aside, but an s/master/aggregate/ or similar over the entire
> component.c codebase would help a pile in making it easier to understand
> which part does what. Or at least I'm always terribly confused about which
> bind binds what and all that, so maybe an additional review whether we
> have a clear split into aggregate and individual components after that
> initial fix is needed.

I'm not entirely sure what you mean "which bind binds what".

The component helper solves this problem:

We have a master or aggregate device representing a collection of
individual devices. The aggregate and individual devices may be probed
by the device model in any order. The aggregate device is only complete
once all individual and aggregate devices have been successfully probed.

It does this by tracking which devices are present, and only when they
are all present does it call the bind() operation. Conversely, if one
happens to be removed, it calls the unbind() operation. To me, that's
very simple.

When we start talking about PM, the original idea was for the aggregate
device to handle that. However, DRM/OF has pushed to change the model a
bit such that the aggregate device is created as a platform device when
we detect the presence of one of the individual devices.

I suspect what we actually want is something that, when the first
individual device gets notified of a transition to a lower power mode,
we want to place the system formed by all the devices into a low power
mode. Please realise that it may not be appropriate for every
individual device to be affected by that transition until it receives
its own PM call.

> One question I have: Why is the bridge component driver not correctly
> ordered wrt the i2c driver it needs? The idea is that the aggregate driver
> doesn't access any hw itself, but entirely relies on all its components.

As far as I'm aware, bridge was never converted to use any component
stuff, so I'm not sure what you're referring to.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] component: Move host device to end of device lists on binding
@ 2021-05-13 13:40     ` Russell King - ARM Linux admin
  0 siblings, 0 replies; 34+ messages in thread
From: Russell King - ARM Linux admin @ 2021-05-13 13:40 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rafael J. Wysocki, Daniel Vetter, linux-kernel, dri-devel,
	Stephen Boyd, Greg Kroah-Hartman

On Mon, May 10, 2021 at 06:05:21PM +0200, Daniel Vetter wrote:
> Entirely aside, but an s/master/aggregate/ or similar over the entire
> component.c codebase would help a pile in making it easier to understand
> which part does what. Or at least I'm always terribly confused about which
> bind binds what and all that, so maybe an additional review whether we
> have a clear split into aggregate and individual components after that
> initial fix is needed.

I'm not entirely sure what you mean "which bind binds what".

The component helper solves this problem:

We have a master or aggregate device representing a collection of
individual devices. The aggregate and individual devices may be probed
by the device model in any order. The aggregate device is only complete
once all individual and aggregate devices have been successfully probed.

It does this by tracking which devices are present, and only when they
are all present does it call the bind() operation. Conversely, if one
happens to be removed, it calls the unbind() operation. To me, that's
very simple.

When we start talking about PM, the original idea was for the aggregate
device to handle that. However, DRM/OF has pushed to change the model a
bit such that the aggregate device is created as a platform device when
we detect the presence of one of the individual devices.

I suspect what we actually want is something that, when the first
individual device gets notified of a transition to a lower power mode,
we want to place the system formed by all the devices into a low power
mode. Please realise that it may not be appropriate for every
individual device to be affected by that transition until it receives
its own PM call.

> One question I have: Why is the bridge component driver not correctly
> ordered wrt the i2c driver it needs? The idea is that the aggregate driver
> doesn't access any hw itself, but entirely relies on all its components.

As far as I'm aware, bridge was never converted to use any component
stuff, so I'm not sure what you're referring to.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-05-13 13:40 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-08  7:41 [PATCH] component: Move host device to end of device lists on binding Stephen Boyd
2021-05-08  7:41 ` Stephen Boyd
2021-05-10 11:59 ` Rafael J. Wysocki
2021-05-10 11:59   ` Rafael J. Wysocki
2021-05-11 17:59   ` Saravana Kannan
2021-05-11 17:59     ` Saravana Kannan
2021-05-10 16:05 ` Daniel Vetter
2021-05-10 16:05   ` Daniel Vetter
2021-05-10 17:52   ` Stephen Boyd
2021-05-10 17:52     ` Stephen Boyd
2021-05-10 18:26     ` Daniel Vetter
2021-05-10 18:26       ` Daniel Vetter
2021-05-10 19:08       ` Stephen Boyd
2021-05-10 19:08         ` Stephen Boyd
2021-05-11 10:52         ` Rafael J. Wysocki
2021-05-11 10:52           ` Rafael J. Wysocki
2021-05-11 13:39           ` Daniel Vetter
2021-05-11 13:39             ` Daniel Vetter
2021-05-11 17:19             ` Stephen Boyd
2021-05-11 17:19               ` Stephen Boyd
2021-05-11 17:25               ` Daniel Vetter
2021-05-11 17:25                 ` Daniel Vetter
2021-05-11 19:12               ` Saravana Kannan
2021-05-11 19:12                 ` Saravana Kannan
2021-05-11 17:00           ` Stephen Boyd
2021-05-11 17:00             ` Stephen Boyd
2021-05-11 17:20             ` Rafael J. Wysocki
2021-05-11 17:20               ` Rafael J. Wysocki
2021-05-13 13:40   ` Russell King - ARM Linux admin
2021-05-13 13:40     ` Russell King - ARM Linux admin
2021-05-11 14:42 ` Russell King - ARM Linux admin
2021-05-11 14:42   ` Russell King - ARM Linux admin
2021-05-11 17:22   ` Stephen Boyd
2021-05-11 17:22     ` Stephen Boyd

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.