linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
@ 2020-06-25  3:24 Saravana Kannan
  2020-06-25  8:57 ` Geert Uytterhoeven
  2020-06-25 15:19 ` Rafael J. Wysocki
  0 siblings, 2 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-06-25  3:24 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Toan Le, Feng Kan,
	Saravana Kannan
  Cc: Geert Uytterhoeven, kernel-team, Rafael J. Wysocki, linux-kernel

Under the following conditions:
- driver A is built in and can probe device-A
- driver B is a module and can probe device-B
- device-A is supplier of device-B

Without this patch:
1. device-A is added.
2. device-B is added.
3. dpm_list is now [device-A, device-B].
4. driver-A defers probe of device-A.
5. deferred probe of device-A is reattempted
6. device-A is moved to end of dpm_list.
6. dpm_list is now [device-B, device-A].
7. driver-B is loaded and probes device-B.
8. dpm_list stays as [device-B, device-A].

Suspend (which goes in the reverse order of dpm_list) fails because
device-A (supplier) is suspended before device-B (consumer).

With this patch:
1. device-A is added.
2. device-B is added.
3. dpm_list is now [device-A, device-B].
4. driver-A defers probe of device-A.
5. deferred probe of device-A is reattempted later.
6. dpm_list is now [device-B, device-A].
7. driver-B is loaded and probes device-B.
8. dpm_list is now [device-A, device-B].

Suspend works because device-B (consumer) is suspended before device-A
(supplier).

Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/dd.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9a1d940342ac..52b2148c7983 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
 		 * probe makes that very unsafe.
 		 */
 		device_pm_move_to_tail(dev);
+		/* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
+		 * I'm worried if it'll have some unintended consequeneces. */
 
 		dev_dbg(dev, "Retrying from deferred list\n");
 		bus_probe_device(dev);
@@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 		goto re_probe;
 	}
 
+	/*
+	 * The devices are added to the dpm_list (resume/suspend (reverse
+	 * order) list) as they are registered with the driver core. But the
+	 * order the devices are added doesn't necessarily match the real
+	 * dependency order.
+	 *
+	 * The successful probe order is a much better signal. If a device just
+	 * probed successfully, then we know for sure that all the devices that
+	 * probed before it don't depend on the device. So, we can safely move
+	 * the device to the end of the dpm_list. As more devices probe,
+	 * they'll automatically get ordered correctly.
+	 */
+	device_pm_move_to_tail(dev);
+
 	pinctrl_init_done(dev);
 
 	if (dev->pm_domain && dev->pm_domain->sync)
-- 
2.27.0.111.gc72c7da667-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25  3:24 [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe Saravana Kannan
@ 2020-06-25  8:57 ` Geert Uytterhoeven
  2020-06-25 17:02   ` Saravana Kannan
  2020-06-25 15:19 ` Rafael J. Wysocki
  1 sibling, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2020-06-25  8:57 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Toan Le, Feng Kan,
	Android Kernel Team, Rafael J. Wysocki,
	Linux Kernel Mailing List, Linux-Renesas

Hi Saravana,

Thanks for your patch!

On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> Under the following conditions:
> - driver A is built in and can probe device-A
> - driver B is a module and can probe device-B

I think this is not correct: in my case driver B is builtin, too.

> - device-A is supplier of device-B
>
> Without this patch:
> 1. device-A is added.
> 2. device-B is added.
> 3. dpm_list is now [device-A, device-B].
> 4. driver-A defers probe of device-A.
> 5. deferred probe of device-A is reattempted

I think this is misleading: in my case driver-A did not defer the probe
of device-A, and driver-A never returned -EPROBE_DEFER.
Probing was merely paused, due to fw_devlink_pause();

> 6. device-A is moved to end of dpm_list.
> 6. dpm_list is now [device-B, device-A].
> 7. driver-B is loaded and probes device-B.
> 8. dpm_list stays as [device-B, device-A].
>
> Suspend (which goes in the reverse order of dpm_list) fails because
> device-A (supplier) is suspended before device-B (consumer).
>
> With this patch:
> 1. device-A is added.
> 2. device-B is added.
> 3. dpm_list is now [device-A, device-B].
> 4. driver-A defers probe of device-A.
> 5. deferred probe of device-A is reattempted later.
> 6. dpm_list is now [device-B, device-A].
> 7. driver-B is loaded and probes device-B.
> 8. dpm_list is now [device-A, device-B].
>
> Suspend works because device-B (consumer) is suspended before device-A
> (supplier).
>
> Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

This fixes wake-up by GPIO key on r8a7740/armadillo and sh73a0/kzm9g.
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
>                  * probe makes that very unsafe.
>                  */
>                 device_pm_move_to_tail(dev);
> +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> +                * I'm worried if it'll have some unintended consequeneces. */

Works fine for me with the call to device_pm_move_to_tail() removed, too
(at least on the two boards that showed the issue before).

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25  3:24 [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe Saravana Kannan
  2020-06-25  8:57 ` Geert Uytterhoeven
@ 2020-06-25 15:19 ` Rafael J. Wysocki
  2020-06-25 16:48   ` Saravana Kannan
  1 sibling, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2020-06-25 15:19 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Toan Le, Feng Kan,
	Geert Uytterhoeven, Cc: Android Kernel, Rafael J. Wysocki,
	Linux Kernel Mailing List

On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
>
> Under the following conditions:
> - driver A is built in and can probe device-A
> - driver B is a module and can probe device-B
> - device-A is supplier of device-B
>
> Without this patch:
> 1. device-A is added.
> 2. device-B is added.
> 3. dpm_list is now [device-A, device-B].
> 4. driver-A defers probe of device-A.
> 5. deferred probe of device-A is reattempted
> 6. device-A is moved to end of dpm_list.
> 6. dpm_list is now [device-B, device-A].
> 7. driver-B is loaded and probes device-B.
> 8. dpm_list stays as [device-B, device-A].
>
> Suspend (which goes in the reverse order of dpm_list) fails because
> device-A (supplier) is suspended before device-B (consumer).
>
> With this patch:
> 1. device-A is added.
> 2. device-B is added.
> 3. dpm_list is now [device-A, device-B].
> 4. driver-A defers probe of device-A.
> 5. deferred probe of device-A is reattempted later.
> 6. dpm_list is now [device-B, device-A].
> 7. driver-B is loaded and probes device-B.
> 8. dpm_list is now [device-A, device-B].
>
> Suspend works because device-B (consumer) is suspended before device-A
> (supplier).
>
> Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/dd.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 9a1d940342ac..52b2148c7983 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
>                  * probe makes that very unsafe.
>                  */
>                 device_pm_move_to_tail(dev);
> +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> +                * I'm worried if it'll have some unintended consequeneces. */

Yes, this needs to go away if you make the other change.

>
>                 dev_dbg(dev, "Retrying from deferred list\n");
>                 bus_probe_device(dev);
> @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>                 goto re_probe;
>         }
>
> +       /*
> +        * The devices are added to the dpm_list (resume/suspend (reverse
> +        * order) list) as they are registered with the driver core. But the
> +        * order the devices are added doesn't necessarily match the real
> +        * dependency order.
> +        *
> +        * The successful probe order is a much better signal. If a device just
> +        * probed successfully, then we know for sure that all the devices that
> +        * probed before it don't depend on the device. So, we can safely move
> +        * the device to the end of the dpm_list. As more devices probe,
> +        * they'll automatically get ordered correctly.
> +        */
> +       device_pm_move_to_tail(dev);

But it would be good to somehow limit this to the devices affected by
deferred probing or we'll end up reordering dpm_list unnecessarily for
many times in the actual majority of cases.

> +
>         pinctrl_init_done(dev);
>
>         if (dev->pm_domain && dev->pm_domain->sync)
> --
> 2.27.0.111.gc72c7da667-goog
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25 15:19 ` Rafael J. Wysocki
@ 2020-06-25 16:48   ` Saravana Kannan
  2020-06-25 16:58     ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-25 16:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Geert Uytterhoeven, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

Dropping Feng Kan <fkan@apm.com> and Toan Le <toanle@apm.com> because
their mails are bouncing.

On Thu, Jun 25, 2020 at 8:19 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > Under the following conditions:
> > - driver A is built in and can probe device-A
> > - driver B is a module and can probe device-B
> > - device-A is supplier of device-B
> >
> > Without this patch:
> > 1. device-A is added.
> > 2. device-B is added.
> > 3. dpm_list is now [device-A, device-B].
> > 4. driver-A defers probe of device-A.
> > 5. deferred probe of device-A is reattempted
> > 6. device-A is moved to end of dpm_list.
> > 6. dpm_list is now [device-B, device-A].
> > 7. driver-B is loaded and probes device-B.
> > 8. dpm_list stays as [device-B, device-A].
> >
> > Suspend (which goes in the reverse order of dpm_list) fails because
> > device-A (supplier) is suspended before device-B (consumer).
> >
> > With this patch:
> > 1. device-A is added.
> > 2. device-B is added.
> > 3. dpm_list is now [device-A, device-B].
> > 4. driver-A defers probe of device-A.
> > 5. deferred probe of device-A is reattempted later.
> > 6. dpm_list is now [device-B, device-A].
> > 7. driver-B is loaded and probes device-B.
> > 8. dpm_list is now [device-A, device-B].
> >
> > Suspend works because device-B (consumer) is suspended before device-A
> > (supplier).
> >
> > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/dd.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 9a1d940342ac..52b2148c7983 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> >                  * probe makes that very unsafe.
> >                  */
> >                 device_pm_move_to_tail(dev);
> > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > +                * I'm worried if it'll have some unintended consequeneces. */
>
> Yes, this needs to go away if you make the other change.
>
> >
> >                 dev_dbg(dev, "Retrying from deferred list\n");
> >                 bus_probe_device(dev);
> > @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >                 goto re_probe;
> >         }
> >
> > +       /*
> > +        * The devices are added to the dpm_list (resume/suspend (reverse
> > +        * order) list) as they are registered with the driver core. But the
> > +        * order the devices are added doesn't necessarily match the real
> > +        * dependency order.
> > +        *
> > +        * The successful probe order is a much better signal. If a device just
> > +        * probed successfully, then we know for sure that all the devices that
> > +        * probed before it don't depend on the device. So, we can safely move
> > +        * the device to the end of the dpm_list. As more devices probe,
> > +        * they'll automatically get ordered correctly.
> > +        */
> > +       device_pm_move_to_tail(dev);
>
> But it would be good to somehow limit this to the devices affected by
> deferred probing or we'll end up reordering dpm_list unnecessarily for
> many times in the actual majority of cases.

Yes, lots of unnecessary reordering, but doing it only for deferred
probes IS the problem. In the example I gave, the consumer is never
deferred probe because the supplier happens to finish probing before
the consumer probe is even attempted.

I'm open to other suggestions, but I think this is needed for all the
cases or at least more cases to be handled correctly.

One alternative I was thinking was not adding the device to the
dpm_list until it's probed. But I have the vague feeling of other
things between device_add() and device probe that expect the device to
be in the dpm_list.

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25 16:48   ` Saravana Kannan
@ 2020-06-25 16:58     ` Rafael J. Wysocki
  2020-06-25 17:01       ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2020-06-25 16:58 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Geert Uytterhoeven,
	Cc: Android Kernel, Rafael J. Wysocki, Linux Kernel Mailing List

On Thu, Jun 25, 2020 at 6:49 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Dropping Feng Kan <fkan@apm.com> and Toan Le <toanle@apm.com> because
> their mails are bouncing.
>
> On Thu, Jun 25, 2020 at 8:19 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > Under the following conditions:
> > > - driver A is built in and can probe device-A
> > > - driver B is a module and can probe device-B
> > > - device-A is supplier of device-B
> > >
> > > Without this patch:
> > > 1. device-A is added.
> > > 2. device-B is added.
> > > 3. dpm_list is now [device-A, device-B].
> > > 4. driver-A defers probe of device-A.
> > > 5. deferred probe of device-A is reattempted
> > > 6. device-A is moved to end of dpm_list.
> > > 6. dpm_list is now [device-B, device-A].
> > > 7. driver-B is loaded and probes device-B.
> > > 8. dpm_list stays as [device-B, device-A].
> > >
> > > Suspend (which goes in the reverse order of dpm_list) fails because
> > > device-A (supplier) is suspended before device-B (consumer).
> > >
> > > With this patch:
> > > 1. device-A is added.
> > > 2. device-B is added.
> > > 3. dpm_list is now [device-A, device-B].
> > > 4. driver-A defers probe of device-A.
> > > 5. deferred probe of device-A is reattempted later.
> > > 6. dpm_list is now [device-B, device-A].
> > > 7. driver-B is loaded and probes device-B.
> > > 8. dpm_list is now [device-A, device-B].
> > >
> > > Suspend works because device-B (consumer) is suspended before device-A
> > > (supplier).
> > >
> > > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > >  drivers/base/dd.c | 16 ++++++++++++++++
> > >  1 file changed, 16 insertions(+)
> > >
> > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > index 9a1d940342ac..52b2148c7983 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> > >                  * probe makes that very unsafe.
> > >                  */
> > >                 device_pm_move_to_tail(dev);
> > > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > > +                * I'm worried if it'll have some unintended consequeneces. */
> >
> > Yes, this needs to go away if you make the other change.
> >
> > >
> > >                 dev_dbg(dev, "Retrying from deferred list\n");
> > >                 bus_probe_device(dev);
> > > @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > >                 goto re_probe;
> > >         }
> > >
> > > +       /*
> > > +        * The devices are added to the dpm_list (resume/suspend (reverse
> > > +        * order) list) as they are registered with the driver core. But the
> > > +        * order the devices are added doesn't necessarily match the real
> > > +        * dependency order.
> > > +        *
> > > +        * The successful probe order is a much better signal. If a device just
> > > +        * probed successfully, then we know for sure that all the devices that
> > > +        * probed before it don't depend on the device. So, we can safely move
> > > +        * the device to the end of the dpm_list. As more devices probe,
> > > +        * they'll automatically get ordered correctly.
> > > +        */
> > > +       device_pm_move_to_tail(dev);
> >
> > But it would be good to somehow limit this to the devices affected by
> > deferred probing or we'll end up reordering dpm_list unnecessarily for
> > many times in the actual majority of cases.
>
> Yes, lots of unnecessary reordering, but doing it only for deferred
> probes IS the problem. In the example I gave, the consumer is never
> deferred probe because the supplier happens to finish probing before
> the consumer probe is even attempted.

But why would the supplier be moved to the end of dpm_list without
moving the consumer along with it?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25 16:58     ` Rafael J. Wysocki
@ 2020-06-25 17:01       ` Saravana Kannan
  2020-06-25 17:03         ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-25 17:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Geert Uytterhoeven, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Thu, Jun 25, 2020 at 9:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Jun 25, 2020 at 6:49 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > Dropping Feng Kan <fkan@apm.com> and Toan Le <toanle@apm.com> because
> > their mails are bouncing.
> >
> > On Thu, Jun 25, 2020 at 8:19 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > Under the following conditions:
> > > > - driver A is built in and can probe device-A
> > > > - driver B is a module and can probe device-B
> > > > - device-A is supplier of device-B
> > > >
> > > > Without this patch:
> > > > 1. device-A is added.
> > > > 2. device-B is added.
> > > > 3. dpm_list is now [device-A, device-B].
> > > > 4. driver-A defers probe of device-A.
> > > > 5. deferred probe of device-A is reattempted
> > > > 6. device-A is moved to end of dpm_list.
> > > > 6. dpm_list is now [device-B, device-A].
> > > > 7. driver-B is loaded and probes device-B.
> > > > 8. dpm_list stays as [device-B, device-A].
> > > >
> > > > Suspend (which goes in the reverse order of dpm_list) fails because
> > > > device-A (supplier) is suspended before device-B (consumer).
> > > >
> > > > With this patch:
> > > > 1. device-A is added.
> > > > 2. device-B is added.
> > > > 3. dpm_list is now [device-A, device-B].
> > > > 4. driver-A defers probe of device-A.
> > > > 5. deferred probe of device-A is reattempted later.
> > > > 6. dpm_list is now [device-B, device-A].
> > > > 7. driver-B is loaded and probes device-B.
> > > > 8. dpm_list is now [device-A, device-B].
> > > >
> > > > Suspend works because device-B (consumer) is suspended before device-A
> > > > (supplier).
> > > >
> > > > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > > > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > ---
> > > >  drivers/base/dd.c | 16 ++++++++++++++++
> > > >  1 file changed, 16 insertions(+)
> > > >
> > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > > index 9a1d940342ac..52b2148c7983 100644
> > > > --- a/drivers/base/dd.c
> > > > +++ b/drivers/base/dd.c
> > > > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> > > >                  * probe makes that very unsafe.
> > > >                  */
> > > >                 device_pm_move_to_tail(dev);
> > > > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > > > +                * I'm worried if it'll have some unintended consequeneces. */
> > >
> > > Yes, this needs to go away if you make the other change.
> > >
> > > >
> > > >                 dev_dbg(dev, "Retrying from deferred list\n");
> > > >                 bus_probe_device(dev);
> > > > @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > >                 goto re_probe;
> > > >         }
> > > >
> > > > +       /*
> > > > +        * The devices are added to the dpm_list (resume/suspend (reverse
> > > > +        * order) list) as they are registered with the driver core. But the
> > > > +        * order the devices are added doesn't necessarily match the real
> > > > +        * dependency order.
> > > > +        *
> > > > +        * The successful probe order is a much better signal. If a device just
> > > > +        * probed successfully, then we know for sure that all the devices that
> > > > +        * probed before it don't depend on the device. So, we can safely move
> > > > +        * the device to the end of the dpm_list. As more devices probe,
> > > > +        * they'll automatically get ordered correctly.
> > > > +        */
> > > > +       device_pm_move_to_tail(dev);
> > >
> > > But it would be good to somehow limit this to the devices affected by
> > > deferred probing or we'll end up reordering dpm_list unnecessarily for
> > > many times in the actual majority of cases.
> >
> > Yes, lots of unnecessary reordering, but doing it only for deferred
> > probes IS the problem. In the example I gave, the consumer is never
> > deferred probe because the supplier happens to finish probing before
> > the consumer probe is even attempted.
>
> But why would the supplier be moved to the end of dpm_list without
> moving the consumer along with it?

There is no device link between the supplier/consumer in this case.
Sadly there are plenty of cases where device links aren't present to
capture supplier/consumer dependencies.

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25  8:57 ` Geert Uytterhoeven
@ 2020-06-25 17:02   ` Saravana Kannan
  0 siblings, 0 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-06-25 17:02 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Android Kernel Team,
	Rafael J. Wysocki, Linux Kernel Mailing List, Linux-Renesas

Dropping Feng and Toan due to mail bounces.

On Thu, Jun 25, 2020 at 1:58 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> Thanks for your patch!
>
> On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> > Under the following conditions:
> > - driver A is built in and can probe device-A
> > - driver B is a module and can probe device-B
>
> I think this is not correct: in my case driver B is builtin, too.

This is a correct example, it just doesn't match with your case :)

Talking about fw_devlink_pause/resume() just distracts from the real
issue that's also present in systems that don't use DT.

You have this problem even on an ACPI system -- distributions loading
all the modules in a PC. We want suspend/resume to work for those too.
So, I'm just going for a simpler example.

> > - device-A is supplier of device-B
> >
> > Without this patch:
> > 1. device-A is added.
> > 2. device-B is added.
> > 3. dpm_list is now [device-A, device-B].
> > 4. driver-A defers probe of device-A.
> > 5. deferred probe of device-A is reattempted
>
> I think this is misleading: in my case driver-A did not defer the probe
> of device-A, and driver-A never returned -EPROBE_DEFER.
> Probing was merely paused, due to fw_devlink_pause();

What I said above. fw_devlink_pause() just defers the probe for the
device -- that's how it pauses and resumes probing. For example,
device link can defer the probe for a device without ever getting to
the driver too.

> > 6. device-A is moved to end of dpm_list.
> > 6. dpm_list is now [device-B, device-A].
> > 7. driver-B is loaded and probes device-B.
> > 8. dpm_list stays as [device-B, device-A].
> >
> > Suspend (which goes in the reverse order of dpm_list) fails because
> > device-A (supplier) is suspended before device-B (consumer).
> >
> > With this patch:
> > 1. device-A is added.
> > 2. device-B is added.
> > 3. dpm_list is now [device-A, device-B].
> > 4. driver-A defers probe of device-A.
> > 5. deferred probe of device-A is reattempted later.
> > 6. dpm_list is now [device-B, device-A].
> > 7. driver-B is loaded and probes device-B.
> > 8. dpm_list is now [device-A, device-B].
> >
> > Suspend works because device-B (consumer) is suspended before device-A
> > (supplier).
> >
> > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> This fixes wake-up by GPIO key on r8a7740/armadillo and sh73a0/kzm9g.
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Thanks!

>
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> >                  * probe makes that very unsafe.
> >                  */
> >                 device_pm_move_to_tail(dev);
> > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > +                * I'm worried if it'll have some unintended consequeneces. */
>
> Works fine for me with the call to device_pm_move_to_tail() removed, too
> (at least on the two boards that showed the issue before).

Yes, it feels right to remove this, but I just wanted to get a few
more opinions.


-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25 17:01       ` Saravana Kannan
@ 2020-06-25 17:03         ` Rafael J. Wysocki
  2020-06-25 17:08           ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2020-06-25 17:03 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Geert Uytterhoeven,
	Cc: Android Kernel, Rafael J. Wysocki, Linux Kernel Mailing List

On Thu, Jun 25, 2020 at 7:01 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Jun 25, 2020 at 9:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Jun 25, 2020 at 6:49 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > Dropping Feng Kan <fkan@apm.com> and Toan Le <toanle@apm.com> because
> > > their mails are bouncing.
> > >
> > > On Thu, Jun 25, 2020 at 8:19 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > Under the following conditions:
> > > > > - driver A is built in and can probe device-A
> > > > > - driver B is a module and can probe device-B
> > > > > - device-A is supplier of device-B
> > > > >
> > > > > Without this patch:
> > > > > 1. device-A is added.
> > > > > 2. device-B is added.
> > > > > 3. dpm_list is now [device-A, device-B].
> > > > > 4. driver-A defers probe of device-A.
> > > > > 5. deferred probe of device-A is reattempted
> > > > > 6. device-A is moved to end of dpm_list.
> > > > > 6. dpm_list is now [device-B, device-A].
> > > > > 7. driver-B is loaded and probes device-B.
> > > > > 8. dpm_list stays as [device-B, device-A].
> > > > >
> > > > > Suspend (which goes in the reverse order of dpm_list) fails because
> > > > > device-A (supplier) is suspended before device-B (consumer).
> > > > >
> > > > > With this patch:
> > > > > 1. device-A is added.
> > > > > 2. device-B is added.
> > > > > 3. dpm_list is now [device-A, device-B].
> > > > > 4. driver-A defers probe of device-A.
> > > > > 5. deferred probe of device-A is reattempted later.
> > > > > 6. dpm_list is now [device-B, device-A].
> > > > > 7. driver-B is loaded and probes device-B.
> > > > > 8. dpm_list is now [device-A, device-B].
> > > > >
> > > > > Suspend works because device-B (consumer) is suspended before device-A
> > > > > (supplier).
> > > > >
> > > > > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > > > > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > ---
> > > > >  drivers/base/dd.c | 16 ++++++++++++++++
> > > > >  1 file changed, 16 insertions(+)
> > > > >
> > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > > > index 9a1d940342ac..52b2148c7983 100644
> > > > > --- a/drivers/base/dd.c
> > > > > +++ b/drivers/base/dd.c
> > > > > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> > > > >                  * probe makes that very unsafe.
> > > > >                  */
> > > > >                 device_pm_move_to_tail(dev);
> > > > > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > > > > +                * I'm worried if it'll have some unintended consequeneces. */
> > > >
> > > > Yes, this needs to go away if you make the other change.
> > > >
> > > > >
> > > > >                 dev_dbg(dev, "Retrying from deferred list\n");
> > > > >                 bus_probe_device(dev);
> > > > > @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > > >                 goto re_probe;
> > > > >         }
> > > > >
> > > > > +       /*
> > > > > +        * The devices are added to the dpm_list (resume/suspend (reverse
> > > > > +        * order) list) as they are registered with the driver core. But the
> > > > > +        * order the devices are added doesn't necessarily match the real
> > > > > +        * dependency order.
> > > > > +        *
> > > > > +        * The successful probe order is a much better signal. If a device just
> > > > > +        * probed successfully, then we know for sure that all the devices that
> > > > > +        * probed before it don't depend on the device. So, we can safely move
> > > > > +        * the device to the end of the dpm_list. As more devices probe,
> > > > > +        * they'll automatically get ordered correctly.
> > > > > +        */
> > > > > +       device_pm_move_to_tail(dev);
> > > >
> > > > But it would be good to somehow limit this to the devices affected by
> > > > deferred probing or we'll end up reordering dpm_list unnecessarily for
> > > > many times in the actual majority of cases.
> > >
> > > Yes, lots of unnecessary reordering, but doing it only for deferred
> > > probes IS the problem. In the example I gave, the consumer is never
> > > deferred probe because the supplier happens to finish probing before
> > > the consumer probe is even attempted.
> >
> > But why would the supplier be moved to the end of dpm_list without
> > moving the consumer along with it?
>
> There is no device link between the supplier/consumer in this case.

So this is the real problem, isn't it?

> Sadly there are plenty of cases where device links aren't present to
> capture supplier/consumer dependencies.

And so that's why you want to add a ton of overhead to driver probing
in all of the cases in which that is not an issue?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25 17:03         ` Rafael J. Wysocki
@ 2020-06-25 17:08           ` Saravana Kannan
  2020-06-25 17:46             ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-25 17:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Geert Uytterhoeven, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Thu, Jun 25, 2020 at 10:03 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Jun 25, 2020 at 7:01 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Thu, Jun 25, 2020 at 9:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Jun 25, 2020 at 6:49 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > Dropping Feng Kan <fkan@apm.com> and Toan Le <toanle@apm.com> because
> > > > their mails are bouncing.
> > > >
> > > > On Thu, Jun 25, 2020 at 8:19 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > >
> > > > > > Under the following conditions:
> > > > > > - driver A is built in and can probe device-A
> > > > > > - driver B is a module and can probe device-B
> > > > > > - device-A is supplier of device-B
> > > > > >
> > > > > > Without this patch:
> > > > > > 1. device-A is added.
> > > > > > 2. device-B is added.
> > > > > > 3. dpm_list is now [device-A, device-B].
> > > > > > 4. driver-A defers probe of device-A.
> > > > > > 5. deferred probe of device-A is reattempted
> > > > > > 6. device-A is moved to end of dpm_list.
> > > > > > 6. dpm_list is now [device-B, device-A].
> > > > > > 7. driver-B is loaded and probes device-B.
> > > > > > 8. dpm_list stays as [device-B, device-A].
> > > > > >
> > > > > > Suspend (which goes in the reverse order of dpm_list) fails because
> > > > > > device-A (supplier) is suspended before device-B (consumer).
> > > > > >
> > > > > > With this patch:
> > > > > > 1. device-A is added.
> > > > > > 2. device-B is added.
> > > > > > 3. dpm_list is now [device-A, device-B].
> > > > > > 4. driver-A defers probe of device-A.
> > > > > > 5. deferred probe of device-A is reattempted later.
> > > > > > 6. dpm_list is now [device-B, device-A].
> > > > > > 7. driver-B is loaded and probes device-B.
> > > > > > 8. dpm_list is now [device-A, device-B].
> > > > > >
> > > > > > Suspend works because device-B (consumer) is suspended before device-A
> > > > > > (supplier).
> > > > > >
> > > > > > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > > > > > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > > > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > ---
> > > > > >  drivers/base/dd.c | 16 ++++++++++++++++
> > > > > >  1 file changed, 16 insertions(+)
> > > > > >
> > > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > > > > index 9a1d940342ac..52b2148c7983 100644
> > > > > > --- a/drivers/base/dd.c
> > > > > > +++ b/drivers/base/dd.c
> > > > > > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> > > > > >                  * probe makes that very unsafe.
> > > > > >                  */
> > > > > >                 device_pm_move_to_tail(dev);
> > > > > > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > > > > > +                * I'm worried if it'll have some unintended consequeneces. */
> > > > >
> > > > > Yes, this needs to go away if you make the other change.
> > > > >
> > > > > >
> > > > > >                 dev_dbg(dev, "Retrying from deferred list\n");
> > > > > >                 bus_probe_device(dev);
> > > > > > @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > > > >                 goto re_probe;
> > > > > >         }
> > > > > >
> > > > > > +       /*
> > > > > > +        * The devices are added to the dpm_list (resume/suspend (reverse
> > > > > > +        * order) list) as they are registered with the driver core. But the
> > > > > > +        * order the devices are added doesn't necessarily match the real
> > > > > > +        * dependency order.
> > > > > > +        *
> > > > > > +        * The successful probe order is a much better signal. If a device just
> > > > > > +        * probed successfully, then we know for sure that all the devices that
> > > > > > +        * probed before it don't depend on the device. So, we can safely move
> > > > > > +        * the device to the end of the dpm_list. As more devices probe,
> > > > > > +        * they'll automatically get ordered correctly.
> > > > > > +        */
> > > > > > +       device_pm_move_to_tail(dev);
> > > > >
> > > > > But it would be good to somehow limit this to the devices affected by
> > > > > deferred probing or we'll end up reordering dpm_list unnecessarily for
> > > > > many times in the actual majority of cases.
> > > >
> > > > Yes, lots of unnecessary reordering, but doing it only for deferred
> > > > probes IS the problem. In the example I gave, the consumer is never
> > > > deferred probe because the supplier happens to finish probing before
> > > > the consumer probe is even attempted.
> > >
> > > But why would the supplier be moved to the end of dpm_list without
> > > moving the consumer along with it?
> >
> > There is no device link between the supplier/consumer in this case.
>
> So this is the real problem, isn't it?
>
> > Sadly there are plenty of cases where device links aren't present to
> > capture supplier/consumer dependencies.
>
> And so that's why you want to add a ton of overhead to driver probing
> in all of the cases in which that is not an issue?

Well, until all/most of the frameworks add device links or
fw_devlink=on by default, it doesn't hurt to have suspend/resume work
in more platforms.

What about the option of not adding to dpm_list until a device is
probed? Is it DOA? Or can it be made to work?

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25 17:08           ` Saravana Kannan
@ 2020-06-25 17:46             ` Rafael J. Wysocki
  2020-06-25 17:51               ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2020-06-25 17:46 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Geert Uytterhoeven,
	Cc: Android Kernel, Rafael J. Wysocki, Linux Kernel Mailing List

On Thu, Jun 25, 2020 at 7:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Jun 25, 2020 at 10:03 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Jun 25, 2020 at 7:01 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Thu, Jun 25, 2020 at 9:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Jun 25, 2020 at 6:49 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > Dropping Feng Kan <fkan@apm.com> and Toan Le <toanle@apm.com> because
> > > > > their mails are bouncing.
> > > > >
> > > > > On Thu, Jun 25, 2020 at 8:19 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > >
> > > > > > > Under the following conditions:
> > > > > > > - driver A is built in and can probe device-A
> > > > > > > - driver B is a module and can probe device-B
> > > > > > > - device-A is supplier of device-B
> > > > > > >
> > > > > > > Without this patch:
> > > > > > > 1. device-A is added.
> > > > > > > 2. device-B is added.
> > > > > > > 3. dpm_list is now [device-A, device-B].
> > > > > > > 4. driver-A defers probe of device-A.
> > > > > > > 5. deferred probe of device-A is reattempted
> > > > > > > 6. device-A is moved to end of dpm_list.
> > > > > > > 6. dpm_list is now [device-B, device-A].
> > > > > > > 7. driver-B is loaded and probes device-B.
> > > > > > > 8. dpm_list stays as [device-B, device-A].
> > > > > > >
> > > > > > > Suspend (which goes in the reverse order of dpm_list) fails because
> > > > > > > device-A (supplier) is suspended before device-B (consumer).
> > > > > > >
> > > > > > > With this patch:
> > > > > > > 1. device-A is added.
> > > > > > > 2. device-B is added.
> > > > > > > 3. dpm_list is now [device-A, device-B].
> > > > > > > 4. driver-A defers probe of device-A.
> > > > > > > 5. deferred probe of device-A is reattempted later.
> > > > > > > 6. dpm_list is now [device-B, device-A].
> > > > > > > 7. driver-B is loaded and probes device-B.
> > > > > > > 8. dpm_list is now [device-A, device-B].
> > > > > > >
> > > > > > > Suspend works because device-B (consumer) is suspended before device-A
> > > > > > > (supplier).
> > > > > > >
> > > > > > > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > > > > > > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > > > > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > ---
> > > > > > >  drivers/base/dd.c | 16 ++++++++++++++++
> > > > > > >  1 file changed, 16 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > > > > > index 9a1d940342ac..52b2148c7983 100644
> > > > > > > --- a/drivers/base/dd.c
> > > > > > > +++ b/drivers/base/dd.c
> > > > > > > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> > > > > > >                  * probe makes that very unsafe.
> > > > > > >                  */
> > > > > > >                 device_pm_move_to_tail(dev);
> > > > > > > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > > > > > > +                * I'm worried if it'll have some unintended consequeneces. */
> > > > > >
> > > > > > Yes, this needs to go away if you make the other change.
> > > > > >
> > > > > > >
> > > > > > >                 dev_dbg(dev, "Retrying from deferred list\n");
> > > > > > >                 bus_probe_device(dev);
> > > > > > > @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > > > > >                 goto re_probe;
> > > > > > >         }
> > > > > > >
> > > > > > > +       /*
> > > > > > > +        * The devices are added to the dpm_list (resume/suspend (reverse
> > > > > > > +        * order) list) as they are registered with the driver core. But the
> > > > > > > +        * order the devices are added doesn't necessarily match the real
> > > > > > > +        * dependency order.
> > > > > > > +        *
> > > > > > > +        * The successful probe order is a much better signal. If a device just
> > > > > > > +        * probed successfully, then we know for sure that all the devices that
> > > > > > > +        * probed before it don't depend on the device. So, we can safely move
> > > > > > > +        * the device to the end of the dpm_list. As more devices probe,
> > > > > > > +        * they'll automatically get ordered correctly.
> > > > > > > +        */
> > > > > > > +       device_pm_move_to_tail(dev);
> > > > > >
> > > > > > But it would be good to somehow limit this to the devices affected by
> > > > > > deferred probing or we'll end up reordering dpm_list unnecessarily for
> > > > > > many times in the actual majority of cases.
> > > > >
> > > > > Yes, lots of unnecessary reordering, but doing it only for deferred
> > > > > probes IS the problem. In the example I gave, the consumer is never
> > > > > deferred probe because the supplier happens to finish probing before
> > > > > the consumer probe is even attempted.
> > > >
> > > > But why would the supplier be moved to the end of dpm_list without
> > > > moving the consumer along with it?
> > >
> > > There is no device link between the supplier/consumer in this case.
> >
> > So this is the real problem, isn't it?
> >
> > > Sadly there are plenty of cases where device links aren't present to
> > > capture supplier/consumer dependencies.
> >
> > And so that's why you want to add a ton of overhead to driver probing
> > in all of the cases in which that is not an issue?
>
> Well, until all/most of the frameworks add device links or
> fw_devlink=on by default, it doesn't hurt to have suspend/resume work
> in more platforms.

In the presence of deferred probing, that is.

Note that deferred probing gets in the way here and so the problem is
related to it.

> What about the option of not adding to dpm_list until a device is
> probed? Is it DOA?

Yes, it is, I'm afraid.  There are devices without drivers. :-)

> Or can it be made to work?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25 17:46             ` Rafael J. Wysocki
@ 2020-06-25 17:51               ` Saravana Kannan
  2020-06-26 11:27                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-25 17:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Geert Uytterhoeven, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Jun 25, 2020 at 7:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Thu, Jun 25, 2020 at 10:03 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Jun 25, 2020 at 7:01 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > On Thu, Jun 25, 2020 at 9:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jun 25, 2020 at 6:49 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > >
> > > > > > Dropping Feng Kan <fkan@apm.com> and Toan Le <toanle@apm.com> because
> > > > > > their mails are bouncing.
> > > > > >
> > > > > > On Thu, Jun 25, 2020 at 8:19 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > >
> > > > > > > On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > >
> > > > > > > > Under the following conditions:
> > > > > > > > - driver A is built in and can probe device-A
> > > > > > > > - driver B is a module and can probe device-B
> > > > > > > > - device-A is supplier of device-B
> > > > > > > >
> > > > > > > > Without this patch:
> > > > > > > > 1. device-A is added.
> > > > > > > > 2. device-B is added.
> > > > > > > > 3. dpm_list is now [device-A, device-B].
> > > > > > > > 4. driver-A defers probe of device-A.
> > > > > > > > 5. deferred probe of device-A is reattempted
> > > > > > > > 6. device-A is moved to end of dpm_list.
> > > > > > > > 6. dpm_list is now [device-B, device-A].
> > > > > > > > 7. driver-B is loaded and probes device-B.
> > > > > > > > 8. dpm_list stays as [device-B, device-A].
> > > > > > > >
> > > > > > > > Suspend (which goes in the reverse order of dpm_list) fails because
> > > > > > > > device-A (supplier) is suspended before device-B (consumer).
> > > > > > > >
> > > > > > > > With this patch:
> > > > > > > > 1. device-A is added.
> > > > > > > > 2. device-B is added.
> > > > > > > > 3. dpm_list is now [device-A, device-B].
> > > > > > > > 4. driver-A defers probe of device-A.
> > > > > > > > 5. deferred probe of device-A is reattempted later.
> > > > > > > > 6. dpm_list is now [device-B, device-A].
> > > > > > > > 7. driver-B is loaded and probes device-B.
> > > > > > > > 8. dpm_list is now [device-A, device-B].
> > > > > > > >
> > > > > > > > Suspend works because device-B (consumer) is suspended before device-A
> > > > > > > > (supplier).
> > > > > > > >
> > > > > > > > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > > > > > > > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > > > > > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > > ---
> > > > > > > >  drivers/base/dd.c | 16 ++++++++++++++++
> > > > > > > >  1 file changed, 16 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > > > > > > index 9a1d940342ac..52b2148c7983 100644
> > > > > > > > --- a/drivers/base/dd.c
> > > > > > > > +++ b/drivers/base/dd.c
> > > > > > > > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> > > > > > > >                  * probe makes that very unsafe.
> > > > > > > >                  */
> > > > > > > >                 device_pm_move_to_tail(dev);
> > > > > > > > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > > > > > > > +                * I'm worried if it'll have some unintended consequeneces. */
> > > > > > >
> > > > > > > Yes, this needs to go away if you make the other change.
> > > > > > >
> > > > > > > >
> > > > > > > >                 dev_dbg(dev, "Retrying from deferred list\n");
> > > > > > > >                 bus_probe_device(dev);
> > > > > > > > @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > > > > > >                 goto re_probe;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > +       /*
> > > > > > > > +        * The devices are added to the dpm_list (resume/suspend (reverse
> > > > > > > > +        * order) list) as they are registered with the driver core. But the
> > > > > > > > +        * order the devices are added doesn't necessarily match the real
> > > > > > > > +        * dependency order.
> > > > > > > > +        *
> > > > > > > > +        * The successful probe order is a much better signal. If a device just
> > > > > > > > +        * probed successfully, then we know for sure that all the devices that
> > > > > > > > +        * probed before it don't depend on the device. So, we can safely move
> > > > > > > > +        * the device to the end of the dpm_list. As more devices probe,
> > > > > > > > +        * they'll automatically get ordered correctly.
> > > > > > > > +        */
> > > > > > > > +       device_pm_move_to_tail(dev);
> > > > > > >
> > > > > > > But it would be good to somehow limit this to the devices affected by
> > > > > > > deferred probing or we'll end up reordering dpm_list unnecessarily for
> > > > > > > many times in the actual majority of cases.
> > > > > >
> > > > > > Yes, lots of unnecessary reordering, but doing it only for deferred
> > > > > > probes IS the problem. In the example I gave, the consumer is never
> > > > > > deferred probe because the supplier happens to finish probing before
> > > > > > the consumer probe is even attempted.
> > > > >
> > > > > But why would the supplier be moved to the end of dpm_list without
> > > > > moving the consumer along with it?
> > > >
> > > > There is no device link between the supplier/consumer in this case.
> > >
> > > So this is the real problem, isn't it?
> > >
> > > > Sadly there are plenty of cases where device links aren't present to
> > > > capture supplier/consumer dependencies.
> > >
> > > And so that's why you want to add a ton of overhead to driver probing
> > > in all of the cases in which that is not an issue?
> >
> > Well, until all/most of the frameworks add device links or
> > fw_devlink=on by default, it doesn't hurt to have suspend/resume work
> > in more platforms.
>
> In the presence of deferred probing, that is.
>
> Note that deferred probing gets in the way here and so the problem is
> related to it.

I mean, we officially support deferred probing. Shouldn't we fix it so
that it doesn't break suspend/resume? Also, it's pretty easy to have
cases where one module probes multiple device instances and loading it
in one order would break dpm_list order for one device and loading it
in another order would break it for another device. And there would be
no "proper" order to load modules (because module order != device
order).

> > What about the option of not adding to dpm_list until a device is
> > probed? Is it DOA?
>
> Yes, it is, I'm afraid.  There are devices without drivers. :-)

Devices that still suspend/resume without drivers?! I didn't know that
was possible.

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-25 17:51               ` Saravana Kannan
@ 2020-06-26 11:27                 ` Rafael J. Wysocki
  2020-06-26 20:34                   ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2020-06-26 11:27 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Geert Uytterhoeven,
	Cc: Android Kernel, Rafael J. Wysocki, Linux Kernel Mailing List

On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Jun 25, 2020 at 7:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Thu, Jun 25, 2020 at 10:03 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Jun 25, 2020 at 7:01 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > On Thu, Jun 25, 2020 at 9:58 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Jun 25, 2020 at 6:49 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > >
> > > > > > > Dropping Feng Kan <fkan@apm.com> and Toan Le <toanle@apm.com> because
> > > > > > > their mails are bouncing.
> > > > > > >
> > > > > > > On Thu, Jun 25, 2020 at 8:19 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Thu, Jun 25, 2020 at 5:24 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > > >
> > > > > > > > > Under the following conditions:
> > > > > > > > > - driver A is built in and can probe device-A
> > > > > > > > > - driver B is a module and can probe device-B
> > > > > > > > > - device-A is supplier of device-B
> > > > > > > > >
> > > > > > > > > Without this patch:
> > > > > > > > > 1. device-A is added.
> > > > > > > > > 2. device-B is added.
> > > > > > > > > 3. dpm_list is now [device-A, device-B].
> > > > > > > > > 4. driver-A defers probe of device-A.
> > > > > > > > > 5. deferred probe of device-A is reattempted
> > > > > > > > > 6. device-A is moved to end of dpm_list.
> > > > > > > > > 6. dpm_list is now [device-B, device-A].
> > > > > > > > > 7. driver-B is loaded and probes device-B.
> > > > > > > > > 8. dpm_list stays as [device-B, device-A].
> > > > > > > > >
> > > > > > > > > Suspend (which goes in the reverse order of dpm_list) fails because
> > > > > > > > > device-A (supplier) is suspended before device-B (consumer).
> > > > > > > > >
> > > > > > > > > With this patch:
> > > > > > > > > 1. device-A is added.
> > > > > > > > > 2. device-B is added.
> > > > > > > > > 3. dpm_list is now [device-A, device-B].
> > > > > > > > > 4. driver-A defers probe of device-A.
> > > > > > > > > 5. deferred probe of device-A is reattempted later.
> > > > > > > > > 6. dpm_list is now [device-B, device-A].
> > > > > > > > > 7. driver-B is loaded and probes device-B.
> > > > > > > > > 8. dpm_list is now [device-A, device-B].
> > > > > > > > >
> > > > > > > > > Suspend works because device-B (consumer) is suspended before device-A
> > > > > > > > > (supplier).
> > > > > > > > >
> > > > > > > > > Fixes: 494fd7b7ad10 ("PM / core: fix deferred probe breaking suspend resume order")
> > > > > > > > > Fixes: 716a7a259690 ("driver core: fw_devlink: Add support for batching fwnode parsing")
> > > > > > > > > Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> > > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/base/dd.c | 16 ++++++++++++++++
> > > > > > > > >  1 file changed, 16 insertions(+)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > > > > > > > index 9a1d940342ac..52b2148c7983 100644
> > > > > > > > > --- a/drivers/base/dd.c
> > > > > > > > > +++ b/drivers/base/dd.c
> > > > > > > > > @@ -109,6 +109,8 @@ static void deferred_probe_work_func(struct work_struct *work)
> > > > > > > > >                  * probe makes that very unsafe.
> > > > > > > > >                  */
> > > > > > > > >                 device_pm_move_to_tail(dev);
> > > > > > > > > +               /* Greg/Rafael: SHOULD I DELETE THIS? ^^ I think I should, but
> > > > > > > > > +                * I'm worried if it'll have some unintended consequeneces. */
> > > > > > > >
> > > > > > > > Yes, this needs to go away if you make the other change.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >                 dev_dbg(dev, "Retrying from deferred list\n");
> > > > > > > > >                 bus_probe_device(dev);
> > > > > > > > > @@ -557,6 +559,20 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > > > > > > > >                 goto re_probe;
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > +       /*
> > > > > > > > > +        * The devices are added to the dpm_list (resume/suspend (reverse
> > > > > > > > > +        * order) list) as they are registered with the driver core. But the
> > > > > > > > > +        * order the devices are added doesn't necessarily match the real
> > > > > > > > > +        * dependency order.
> > > > > > > > > +        *
> > > > > > > > > +        * The successful probe order is a much better signal. If a device just
> > > > > > > > > +        * probed successfully, then we know for sure that all the devices that
> > > > > > > > > +        * probed before it don't depend on the device. So, we can safely move
> > > > > > > > > +        * the device to the end of the dpm_list. As more devices probe,
> > > > > > > > > +        * they'll automatically get ordered correctly.
> > > > > > > > > +        */
> > > > > > > > > +       device_pm_move_to_tail(dev);
> > > > > > > >
> > > > > > > > But it would be good to somehow limit this to the devices affected by
> > > > > > > > deferred probing or we'll end up reordering dpm_list unnecessarily for
> > > > > > > > many times in the actual majority of cases.
> > > > > > >
> > > > > > > Yes, lots of unnecessary reordering, but doing it only for deferred
> > > > > > > probes IS the problem. In the example I gave, the consumer is never
> > > > > > > deferred probe because the supplier happens to finish probing before
> > > > > > > the consumer probe is even attempted.
> > > > > >
> > > > > > But why would the supplier be moved to the end of dpm_list without
> > > > > > moving the consumer along with it?
> > > > >
> > > > > There is no device link between the supplier/consumer in this case.
> > > >
> > > > So this is the real problem, isn't it?
> > > >
> > > > > Sadly there are plenty of cases where device links aren't present to
> > > > > capture supplier/consumer dependencies.
> > > >
> > > > And so that's why you want to add a ton of overhead to driver probing
> > > > in all of the cases in which that is not an issue?
> > >
> > > Well, until all/most of the frameworks add device links or
> > > fw_devlink=on by default, it doesn't hurt to have suspend/resume work
> > > in more platforms.
> >
> > In the presence of deferred probing, that is.
> >
> > Note that deferred probing gets in the way here and so the problem is
> > related to it.
>
> I mean, we officially support deferred probing. Shouldn't we fix it so
> that it doesn't break suspend/resume?

Yes, we should fix deferred probing.

> Also, it's pretty easy to have
> cases where one module probes multiple device instances and loading it
> in one order would break dpm_list order for one device and loading it
> in another order would break it for another device. And there would be
> no "proper" order to load modules (because module order != device
> order).

I'm not saying that the current code is perfect.  I'm saying that the
fix as proposed adds too much cost for everybody who may not care IMO.

>
> > > What about the option of not adding to dpm_list until a device is
> > > probed? Is it DOA?
> >
> > Yes, it is, I'm afraid.  There are devices without drivers. :-)
>
> Devices that still suspend/resume without drivers?! I didn't know that
> was possible.

There are "class" devices, "type" devices and devices that simply have
no drivers, but the bus type code may still want to touch them during
system-wide suspend/resume.

Modules may still not be loaded when the system is suspended for the
first time etc.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-26 11:27                 ` Rafael J. Wysocki
@ 2020-06-26 20:34                   ` Saravana Kannan
  2020-06-26 20:53                     ` Geert Uytterhoeven
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-26 20:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Geert Uytterhoeven, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > Note that deferred probing gets in the way here and so the problem is
> > > related to it.
> >
> > I mean, we officially support deferred probing. Shouldn't we fix it so
> > that it doesn't break suspend/resume?
>
> Yes, we should fix deferred probing.
>
> > Also, it's pretty easy to have
> > cases where one module probes multiple device instances and loading it
> > in one order would break dpm_list order for one device and loading it
> > in another order would break it for another device. And there would be
> > no "proper" order to load modules (because module order != device
> > order).
>
> I'm not saying that the current code is perfect.  I'm saying that the
> fix as proposed adds too much cost for everybody who may not care IMO.

Ok, how about I don't do this reordering until we see the first
deferred probe request? Will that work for you? In that case, systems
with no deferred probing will not incur any reordering cost. Or if
reordering starts only towards the end, all the previous probes won't
incur reordering cost.

I'm open to other suggestions too.

>
> > >
> > > Yes, it is, I'm afraid.  There are devices without drivers. :-)
> >
> > Devices that still suspend/resume without drivers?! I didn't know that
> > was possible.
>
> There are "class" devices, "type" devices and devices that simply have
> no drivers, but the bus type code may still want to touch them during
> system-wide suspend/resume.
>
> Modules may still not be loaded when the system is suspended for the
> first time etc.

Thanks for the context.


-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-26 20:34                   ` Saravana Kannan
@ 2020-06-26 20:53                     ` Geert Uytterhoeven
  2020-06-30 13:50                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2020-06-26 20:53 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

Hi Saravana,

On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@google.com> wrote:
> On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > Note that deferred probing gets in the way here and so the problem is
> > > > related to it.
> > >
> > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > that it doesn't break suspend/resume?
> >
> > Yes, we should fix deferred probing.

Please take into account that breakage is an actual regression.

> > > Also, it's pretty easy to have
> > > cases where one module probes multiple device instances and loading it
> > > in one order would break dpm_list order for one device and loading it
> > > in another order would break it for another device. And there would be
> > > no "proper" order to load modules (because module order != device
> > > order).
> >
> > I'm not saying that the current code is perfect.  I'm saying that the
> > fix as proposed adds too much cost for everybody who may not care IMO.
>
> Ok, how about I don't do this reordering until we see the first
> deferred probe request? Will that work for you? In that case, systems
> with no deferred probing will not incur any reordering cost. Or if
> reordering starts only towards the end, all the previous probes won't
> incur reordering cost.

That first deferred probe request is more or less as of the first probe,
since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
adding all top level devices"), at least on DT systems.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-26 20:53                     ` Geert Uytterhoeven
@ 2020-06-30 13:50                       ` Rafael J. Wysocki
  2020-06-30 15:38                         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 22+ messages in thread
From: Rafael J. Wysocki @ 2020-06-30 13:50 UTC (permalink / raw)
  To: Geert Uytterhoeven, Greg Kroah-Hartman
  Cc: Saravana Kannan, Rafael J. Wysocki, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > Note that deferred probing gets in the way here and so the problem is
> > > > > related to it.
> > > >
> > > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > > that it doesn't break suspend/resume?
> > >
> > > Yes, we should fix deferred probing.
>
> Please take into account that breakage is an actual regression.
>
> > > > Also, it's pretty easy to have
> > > > cases where one module probes multiple device instances and loading it
> > > > in one order would break dpm_list order for one device and loading it
> > > > in another order would break it for another device. And there would be
> > > > no "proper" order to load modules (because module order != device
> > > > order).
> > >
> > > I'm not saying that the current code is perfect.  I'm saying that the
> > > fix as proposed adds too much cost for everybody who may not care IMO.
> >
> > Ok, how about I don't do this reordering until we see the first
> > deferred probe request? Will that work for you? In that case, systems
> > with no deferred probing will not incur any reordering cost. Or if
> > reordering starts only towards the end, all the previous probes won't
> > incur reordering cost.
>
> That first deferred probe request is more or less as of the first probe,
> since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
> adding all top level devices"), at least on DT systems.

The deferred probe reordering of devices to the end of dpm_list
started in 2012, so it is nothing new, and it demonstrably works for
devices where the dependencies are known to the driver core.

That said, in the cases when the dependencies are known to the driver
core, it is also unnecessary to reorder dpm_list in
deferred_probe_work_func(), because the right ordering of it is going
to be determined elsewhere.

Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking
suspend resume order") is not the source of the problem here, because
the problem would have still been there without it, due to the
device_pm_move_last() that was there before, so the Fixes: tag
pointing to that commit is misleading.

Now, because 716a7a259690 ("driver core: fw_devlink: Add support for
batching fwnode parsing") is an optimization and the regression is
present because of it AFAICS, the best way to address it at that point
would be to revert commit 716a7a259690 for 5.8 and maybe do the
optimization more carefully.

Greg, what do you think?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-30 13:50                       ` Rafael J. Wysocki
@ 2020-06-30 15:38                         ` Greg Kroah-Hartman
  2020-06-30 16:11                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-06-30 15:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Geert Uytterhoeven, Saravana Kannan, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Tue, Jun 30, 2020 at 03:50:58PM +0200, Rafael J. Wysocki wrote:
> On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> >
> > Hi Saravana,
> >
> > On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > Note that deferred probing gets in the way here and so the problem is
> > > > > > related to it.
> > > > >
> > > > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > > > that it doesn't break suspend/resume?
> > > >
> > > > Yes, we should fix deferred probing.
> >
> > Please take into account that breakage is an actual regression.
> >
> > > > > Also, it's pretty easy to have
> > > > > cases where one module probes multiple device instances and loading it
> > > > > in one order would break dpm_list order for one device and loading it
> > > > > in another order would break it for another device. And there would be
> > > > > no "proper" order to load modules (because module order != device
> > > > > order).
> > > >
> > > > I'm not saying that the current code is perfect.  I'm saying that the
> > > > fix as proposed adds too much cost for everybody who may not care IMO.
> > >
> > > Ok, how about I don't do this reordering until we see the first
> > > deferred probe request? Will that work for you? In that case, systems
> > > with no deferred probing will not incur any reordering cost. Or if
> > > reordering starts only towards the end, all the previous probes won't
> > > incur reordering cost.
> >
> > That first deferred probe request is more or less as of the first probe,
> > since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
> > adding all top level devices"), at least on DT systems.
> 
> The deferred probe reordering of devices to the end of dpm_list
> started in 2012, so it is nothing new, and it demonstrably works for
> devices where the dependencies are known to the driver core.
> 
> That said, in the cases when the dependencies are known to the driver
> core, it is also unnecessary to reorder dpm_list in
> deferred_probe_work_func(), because the right ordering of it is going
> to be determined elsewhere.
> 
> Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking
> suspend resume order") is not the source of the problem here, because
> the problem would have still been there without it, due to the
> device_pm_move_last() that was there before, so the Fixes: tag
> pointing to that commit is misleading.
> 
> Now, because 716a7a259690 ("driver core: fw_devlink: Add support for
> batching fwnode parsing") is an optimization and the regression is
> present because of it AFAICS, the best way to address it at that point
> would be to revert commit 716a7a259690 for 5.8 and maybe do the
> optimization more carefully.
> 
> Greg, what do you think?

I've been ignoreing this and letting you all sort it out :)

But if you think that patch should be reverted, I'll not object and will
be glad to to it if this solves the issue.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-30 15:38                         ` Greg Kroah-Hartman
@ 2020-06-30 16:11                           ` Rafael J. Wysocki
  2020-06-30 17:11                             ` Saravana Kannan
  2020-07-01 11:07                             ` Geert Uytterhoeven
  0 siblings, 2 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2020-06-30 16:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Geert Uytterhoeven
  Cc: Rafael J. Wysocki, Saravana Kannan, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Tue, Jun 30, 2020 at 5:39 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jun 30, 2020 at 03:50:58PM +0200, Rafael J. Wysocki wrote:
> > On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven
> > <geert@linux-m68k.org> wrote:
> > >
> > > Hi Saravana,
> > >
> > > On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > Note that deferred probing gets in the way here and so the problem is
> > > > > > > related to it.
> > > > > >
> > > > > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > > > > that it doesn't break suspend/resume?
> > > > >
> > > > > Yes, we should fix deferred probing.
> > >
> > > Please take into account that breakage is an actual regression.
> > >
> > > > > > Also, it's pretty easy to have
> > > > > > cases where one module probes multiple device instances and loading it
> > > > > > in one order would break dpm_list order for one device and loading it
> > > > > > in another order would break it for another device. And there would be
> > > > > > no "proper" order to load modules (because module order != device
> > > > > > order).
> > > > >
> > > > > I'm not saying that the current code is perfect.  I'm saying that the
> > > > > fix as proposed adds too much cost for everybody who may not care IMO.
> > > >
> > > > Ok, how about I don't do this reordering until we see the first
> > > > deferred probe request? Will that work for you? In that case, systems
> > > > with no deferred probing will not incur any reordering cost. Or if
> > > > reordering starts only towards the end, all the previous probes won't
> > > > incur reordering cost.
> > >
> > > That first deferred probe request is more or less as of the first probe,
> > > since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
> > > adding all top level devices"), at least on DT systems.
> >
> > The deferred probe reordering of devices to the end of dpm_list
> > started in 2012, so it is nothing new, and it demonstrably works for
> > devices where the dependencies are known to the driver core.
> >
> > That said, in the cases when the dependencies are known to the driver
> > core, it is also unnecessary to reorder dpm_list in
> > deferred_probe_work_func(), because the right ordering of it is going
> > to be determined elsewhere.
> >
> > Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking
> > suspend resume order") is not the source of the problem here, because
> > the problem would have still been there without it, due to the
> > device_pm_move_last() that was there before, so the Fixes: tag
> > pointing to that commit is misleading.
> >
> > Now, because 716a7a259690 ("driver core: fw_devlink: Add support for
> > batching fwnode parsing") is an optimization and the regression is
> > present because of it AFAICS, the best way to address it at that point
> > would be to revert commit 716a7a259690 for 5.8 and maybe do the
> > optimization more carefully.
> >
> > Greg, what do you think?
>
> I've been ignoreing this and letting you all sort it out :)
>
> But if you think that patch should be reverted, I'll not object and will
> be glad to to it if this solves the issue.

Well, if Geert can confirm that reverting commit 716a7a259690 makes
the problem go away, IMO this would be the most reasonable thing to do
at this stage of the cycle without risking that more regressions will
be introduced.

Geert?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-30 16:11                           ` Rafael J. Wysocki
@ 2020-06-30 17:11                             ` Saravana Kannan
  2020-06-30 17:15                               ` Rafael J. Wysocki
  2020-07-10 13:21                               ` Greg Kroah-Hartman
  2020-07-01 11:07                             ` Geert Uytterhoeven
  1 sibling, 2 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-06-30 17:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Geert Uytterhoeven, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Tue, Jun 30, 2020 at 9:11 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Jun 30, 2020 at 5:39 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Jun 30, 2020 at 03:50:58PM +0200, Rafael J. Wysocki wrote:
> > > On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven
> > > <geert@linux-m68k.org> wrote:
> > > >
> > > > Hi Saravana,
> > > >
> > > > On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > > Note that deferred probing gets in the way here and so the problem is
> > > > > > > > related to it.
> > > > > > >
> > > > > > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > > > > > that it doesn't break suspend/resume?
> > > > > >
> > > > > > Yes, we should fix deferred probing.
> > > >
> > > > Please take into account that breakage is an actual regression.
> > > >
> > > > > > > Also, it's pretty easy to have
> > > > > > > cases where one module probes multiple device instances and loading it
> > > > > > > in one order would break dpm_list order for one device and loading it
> > > > > > > in another order would break it for another device. And there would be
> > > > > > > no "proper" order to load modules (because module order != device
> > > > > > > order).
> > > > > >
> > > > > > I'm not saying that the current code is perfect.  I'm saying that the
> > > > > > fix as proposed adds too much cost for everybody who may not care IMO.
> > > > >
> > > > > Ok, how about I don't do this reordering until we see the first
> > > > > deferred probe request? Will that work for you? In that case, systems
> > > > > with no deferred probing will not incur any reordering cost. Or if
> > > > > reordering starts only towards the end, all the previous probes won't
> > > > > incur reordering cost.
> > > >
> > > > That first deferred probe request is more or less as of the first probe,
> > > > since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
> > > > adding all top level devices"), at least on DT systems.
> > >
> > > The deferred probe reordering of devices to the end of dpm_list
> > > started in 2012, so it is nothing new, and it demonstrably works for
> > > devices where the dependencies are known to the driver core.

Isn't "where the dependencies are known to the driver core" this a big caveat?

> > >
> > > That said, in the cases when the dependencies are known to the driver
> > > core, it is also unnecessary to reorder dpm_list in
> > > deferred_probe_work_func(), because the right ordering of it is going
> > > to be determined elsewhere.

Until driver core knows about 100% of the dependencies, we still need
to do some kind of dpm_list reordering to have correct ordering. Even
with fw_devlink=on, I'd imagine it'd be difficult to achieve 100%
dependency being known to driver core.

> > >
> > > Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking
> > > suspend resume order") is not the source of the problem here, because
> > > the problem would have still been there without it, due to the
> > > device_pm_move_last() that was there before, so the Fixes: tag
> > > pointing to that commit is misleading.
> > >
> > > Now, because 716a7a259690 ("driver core: fw_devlink: Add support for
> > > batching fwnode parsing") is an optimization and the regression is
> > > present because of it AFAICS, the best way to address it at that point
> > > would be to revert commit 716a7a259690 for 5.8 and maybe do the
> > > optimization more carefully.

No, this patch is not adding any new issues to deferred probe. It just
increases the probability of reproducing the issue. That's exactly why
I wrote the commit text for this patch without the fwnode batch
processing example. Even if you revert the patch, suspend/resume
ordering is broken if deferred probe happens.

> > >
> > > Greg, what do you think?
> >
> > I've been ignoreing this and letting you all sort it out :)
> >
> > But if you think that patch should be reverted, I'll not object and will
> > be glad to to it if this solves the issue.
>
> Well, if Geert can confirm that reverting commit 716a7a259690 makes
> the problem go away, IMO this would be the most reasonable thing to do
> at this stage of the cycle without risking that more regressions will
> be introduced.

I already have a patch to avoid deferred probe during batch fwnode
parsing. I'm trying to do a few more tests before I send it out. So,
it'd be nice if we don't revert it right now and give me some time to
finish testing.

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-30 17:11                             ` Saravana Kannan
@ 2020-06-30 17:15                               ` Rafael J. Wysocki
  2020-07-10 13:21                               ` Greg Kroah-Hartman
  1 sibling, 0 replies; 22+ messages in thread
From: Rafael J. Wysocki @ 2020-06-30 17:15 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Geert Uytterhoeven,
	Cc: Android Kernel, Rafael J. Wysocki, Linux Kernel Mailing List

On Tue, Jun 30, 2020 at 7:11 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Jun 30, 2020 at 9:11 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Jun 30, 2020 at 5:39 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Tue, Jun 30, 2020 at 03:50:58PM +0200, Rafael J. Wysocki wrote:
> > > > On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven
> > > > <geert@linux-m68k.org> wrote:
> > > > >
> > > > > Hi Saravana,
> > > > >
> > > > > On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > > > Note that deferred probing gets in the way here and so the problem is
> > > > > > > > > related to it.
> > > > > > > >
> > > > > > > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > > > > > > that it doesn't break suspend/resume?
> > > > > > >
> > > > > > > Yes, we should fix deferred probing.
> > > > >
> > > > > Please take into account that breakage is an actual regression.
> > > > >
> > > > > > > > Also, it's pretty easy to have
> > > > > > > > cases where one module probes multiple device instances and loading it
> > > > > > > > in one order would break dpm_list order for one device and loading it
> > > > > > > > in another order would break it for another device. And there would be
> > > > > > > > no "proper" order to load modules (because module order != device
> > > > > > > > order).
> > > > > > >
> > > > > > > I'm not saying that the current code is perfect.  I'm saying that the
> > > > > > > fix as proposed adds too much cost for everybody who may not care IMO.
> > > > > >
> > > > > > Ok, how about I don't do this reordering until we see the first
> > > > > > deferred probe request? Will that work for you? In that case, systems
> > > > > > with no deferred probing will not incur any reordering cost. Or if
> > > > > > reordering starts only towards the end, all the previous probes won't
> > > > > > incur reordering cost.
> > > > >
> > > > > That first deferred probe request is more or less as of the first probe,
> > > > > since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
> > > > > adding all top level devices"), at least on DT systems.
> > > >
> > > > The deferred probe reordering of devices to the end of dpm_list
> > > > started in 2012, so it is nothing new, and it demonstrably works for
> > > > devices where the dependencies are known to the driver core.
>
> Isn't "where the dependencies are known to the driver core" this a big caveat?
>
> > > >
> > > > That said, in the cases when the dependencies are known to the driver
> > > > core, it is also unnecessary to reorder dpm_list in
> > > > deferred_probe_work_func(), because the right ordering of it is going
> > > > to be determined elsewhere.
>
> Until driver core knows about 100% of the dependencies, we still need
> to do some kind of dpm_list reordering to have correct ordering. Even
> with fw_devlink=on, I'd imagine it'd be difficult to achieve 100%
> dependency being known to driver core.
>
> > > >
> > > > Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking
> > > > suspend resume order") is not the source of the problem here, because
> > > > the problem would have still been there without it, due to the
> > > > device_pm_move_last() that was there before, so the Fixes: tag
> > > > pointing to that commit is misleading.
> > > >
> > > > Now, because 716a7a259690 ("driver core: fw_devlink: Add support for
> > > > batching fwnode parsing") is an optimization and the regression is
> > > > present because of it AFAICS, the best way to address it at that point
> > > > would be to revert commit 716a7a259690 for 5.8 and maybe do the
> > > > optimization more carefully.
>
> No, this patch is not adding any new issues to deferred probe. It just
> increases the probability of reproducing the issue. That's exactly why
> I wrote the commit text for this patch without the fwnode batch
> processing example. Even if you revert the patch, suspend/resume
> ordering is broken if deferred probe happens.
>
> > > >
> > > > Greg, what do you think?
> > >
> > > I've been ignoreing this and letting you all sort it out :)
> > >
> > > But if you think that patch should be reverted, I'll not object and will
> > > be glad to to it if this solves the issue.
> >
> > Well, if Geert can confirm that reverting commit 716a7a259690 makes
> > the problem go away, IMO this would be the most reasonable thing to do
> > at this stage of the cycle without risking that more regressions will
> > be introduced.
>
> I already have a patch to avoid deferred probe during batch fwnode
> parsing. I'm trying to do a few more tests before I send it out. So,
> it'd be nice if we don't revert it right now and give me some time to
> finish testing.

If you have an alternative fix, let's see it before deciding what to do.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-30 16:11                           ` Rafael J. Wysocki
  2020-06-30 17:11                             ` Saravana Kannan
@ 2020-07-01 11:07                             ` Geert Uytterhoeven
  1 sibling, 0 replies; 22+ messages in thread
From: Geert Uytterhoeven @ 2020-07-01 11:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Saravana Kannan, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

Hi Rafael,

On Tue, Jun 30, 2020 at 6:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Jun 30, 2020 at 5:39 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Tue, Jun 30, 2020 at 03:50:58PM +0200, Rafael J. Wysocki wrote:
> > > On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven
> > > <geert@linux-m68k.org> wrote:
> > > > On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > > Note that deferred probing gets in the way here and so the problem is
> > > > > > > > related to it.
> > > > > > >
> > > > > > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > > > > > that it doesn't break suspend/resume?
> > > > > >
> > > > > > Yes, we should fix deferred probing.
> > > >
> > > > Please take into account that breakage is an actual regression.
> > > >
> > > > > > > Also, it's pretty easy to have
> > > > > > > cases where one module probes multiple device instances and loading it
> > > > > > > in one order would break dpm_list order for one device and loading it
> > > > > > > in another order would break it for another device. And there would be
> > > > > > > no "proper" order to load modules (because module order != device
> > > > > > > order).
> > > > > >
> > > > > > I'm not saying that the current code is perfect.  I'm saying that the
> > > > > > fix as proposed adds too much cost for everybody who may not care IMO.
> > > > >
> > > > > Ok, how about I don't do this reordering until we see the first
> > > > > deferred probe request? Will that work for you? In that case, systems
> > > > > with no deferred probing will not incur any reordering cost. Or if
> > > > > reordering starts only towards the end, all the previous probes won't
> > > > > incur reordering cost.
> > > >
> > > > That first deferred probe request is more or less as of the first probe,
> > > > since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
> > > > adding all top level devices"), at least on DT systems.
> > >
> > > The deferred probe reordering of devices to the end of dpm_list
> > > started in 2012, so it is nothing new, and it demonstrably works for
> > > devices where the dependencies are known to the driver core.
> > >
> > > That said, in the cases when the dependencies are known to the driver
> > > core, it is also unnecessary to reorder dpm_list in
> > > deferred_probe_work_func(), because the right ordering of it is going
> > > to be determined elsewhere.
> > >
> > > Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking
> > > suspend resume order") is not the source of the problem here, because
> > > the problem would have still been there without it, due to the
> > > device_pm_move_last() that was there before, so the Fixes: tag
> > > pointing to that commit is misleading.
> > >
> > > Now, because 716a7a259690 ("driver core: fw_devlink: Add support for
> > > batching fwnode parsing") is an optimization and the regression is
> > > present because of it AFAICS, the best way to address it at that point
> > > would be to revert commit 716a7a259690 for 5.8 and maybe do the
> > > optimization more carefully.
> > >
> > > Greg, what do you think?
> >
> > I've been ignoreing this and letting you all sort it out :)
> >
> > But if you think that patch should be reverted, I'll not object and will
> > be glad to to it if this solves the issue.
>
> Well, if Geert can confirm that reverting commit 716a7a259690 makes
> the problem go away, IMO this would be the most reasonable thing to do
> at this stage of the cycle without risking that more regressions will
> be introduced.
>
> Geert?

Reverting commit 716a7a25969003d8 ("driver core: fw_devlink: Add support
for batching fwnode parsing") requires reverting commits
fefcfc968723caf9 ("driver core: Remove check in
driver_deferred_probe_force_trigger()") and 93d2e4322aa74c1a ("of:
platform: Batch fwnode parsing when adding all top level devices"),
too.

While reverting the latter is sufficient to fix the regression for me, I
can confirm that reverting all three fixes the issue, too.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-06-30 17:11                             ` Saravana Kannan
  2020-06-30 17:15                               ` Rafael J. Wysocki
@ 2020-07-10 13:21                               ` Greg Kroah-Hartman
  2020-07-10 20:47                                 ` Saravana Kannan
  1 sibling, 1 reply; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-07-10 13:21 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Geert Uytterhoeven, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Tue, Jun 30, 2020 at 10:11:01AM -0700, Saravana Kannan wrote:
> On Tue, Jun 30, 2020 at 9:11 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Jun 30, 2020 at 5:39 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Tue, Jun 30, 2020 at 03:50:58PM +0200, Rafael J. Wysocki wrote:
> > > > On Fri, Jun 26, 2020 at 10:53 PM Geert Uytterhoeven
> > > > <geert@linux-m68k.org> wrote:
> > > > >
> > > > > Hi Saravana,
> > > > >
> > > > > On Fri, Jun 26, 2020 at 10:34 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > On Fri, Jun 26, 2020 at 4:27 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > On Thu, Jun 25, 2020 at 7:52 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > > On Thu, Jun 25, 2020 at 10:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > > > > Note that deferred probing gets in the way here and so the problem is
> > > > > > > > > related to it.
> > > > > > > >
> > > > > > > > I mean, we officially support deferred probing. Shouldn't we fix it so
> > > > > > > > that it doesn't break suspend/resume?
> > > > > > >
> > > > > > > Yes, we should fix deferred probing.
> > > > >
> > > > > Please take into account that breakage is an actual regression.
> > > > >
> > > > > > > > Also, it's pretty easy to have
> > > > > > > > cases where one module probes multiple device instances and loading it
> > > > > > > > in one order would break dpm_list order for one device and loading it
> > > > > > > > in another order would break it for another device. And there would be
> > > > > > > > no "proper" order to load modules (because module order != device
> > > > > > > > order).
> > > > > > >
> > > > > > > I'm not saying that the current code is perfect.  I'm saying that the
> > > > > > > fix as proposed adds too much cost for everybody who may not care IMO.
> > > > > >
> > > > > > Ok, how about I don't do this reordering until we see the first
> > > > > > deferred probe request? Will that work for you? In that case, systems
> > > > > > with no deferred probing will not incur any reordering cost. Or if
> > > > > > reordering starts only towards the end, all the previous probes won't
> > > > > > incur reordering cost.
> > > > >
> > > > > That first deferred probe request is more or less as of the first probe,
> > > > > since commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing when
> > > > > adding all top level devices"), at least on DT systems.
> > > >
> > > > The deferred probe reordering of devices to the end of dpm_list
> > > > started in 2012, so it is nothing new, and it demonstrably works for
> > > > devices where the dependencies are known to the driver core.
> 
> Isn't "where the dependencies are known to the driver core" this a big caveat?
> 
> > > >
> > > > That said, in the cases when the dependencies are known to the driver
> > > > core, it is also unnecessary to reorder dpm_list in
> > > > deferred_probe_work_func(), because the right ordering of it is going
> > > > to be determined elsewhere.
> 
> Until driver core knows about 100% of the dependencies, we still need
> to do some kind of dpm_list reordering to have correct ordering. Even
> with fw_devlink=on, I'd imagine it'd be difficult to achieve 100%
> dependency being known to driver core.
> 
> > > >
> > > > Also commit 494fd7b7ad10 ("PM / core: fix deferred probe breaking
> > > > suspend resume order") is not the source of the problem here, because
> > > > the problem would have still been there without it, due to the
> > > > device_pm_move_last() that was there before, so the Fixes: tag
> > > > pointing to that commit is misleading.
> > > >
> > > > Now, because 716a7a259690 ("driver core: fw_devlink: Add support for
> > > > batching fwnode parsing") is an optimization and the regression is
> > > > present because of it AFAICS, the best way to address it at that point
> > > > would be to revert commit 716a7a259690 for 5.8 and maybe do the
> > > > optimization more carefully.
> 
> No, this patch is not adding any new issues to deferred probe. It just
> increases the probability of reproducing the issue. That's exactly why
> I wrote the commit text for this patch without the fwnode batch
> processing example. Even if you revert the patch, suspend/resume
> ordering is broken if deferred probe happens.
> 
> > > >
> > > > Greg, what do you think?
> > >
> > > I've been ignoreing this and letting you all sort it out :)
> > >
> > > But if you think that patch should be reverted, I'll not object and will
> > > be glad to to it if this solves the issue.
> >
> > Well, if Geert can confirm that reverting commit 716a7a259690 makes
> > the problem go away, IMO this would be the most reasonable thing to do
> > at this stage of the cycle without risking that more regressions will
> > be introduced.
> 
> I already have a patch to avoid deferred probe during batch fwnode
> parsing. I'm trying to do a few more tests before I send it out. So,
> it'd be nice if we don't revert it right now and give me some time to
> finish testing.

So this series is no longer needed given your other series that I just
took?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe
  2020-07-10 13:21                               ` Greg Kroah-Hartman
@ 2020-07-10 20:47                                 ` Saravana Kannan
  0 siblings, 0 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-07-10 20:47 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Rafael J. Wysocki, Geert Uytterhoeven, Cc: Android Kernel,
	Rafael J. Wysocki, Linux Kernel Mailing List

On Fri, Jul 10, 2020 at 6:21 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jun 30, 2020 at 10:11:01AM -0700, Saravana Kannan wrote:
> > I already have a patch to avoid deferred probe during batch fwnode
> > parsing. I'm trying to do a few more tests before I send it out. So,
> > it'd be nice if we don't revert it right now and give me some time to
> > finish testing.
>
> So this series is no longer needed given your other series that I just
> took?

This series is no longer needed to fix the issue with fw_devlink
optimization that Geert was seeing. The other series you pulled in
takes care of Geert's issue.

But deferred probe can still break suspend/resume ordering (example
mentioned in the commit text). So I think we should fix that (version
X of this patch).

Rafael was concerned about some of the extra work v1 will cause for
cases that work fine today. So, we need to find a compromise where we
can fix the issue and optimize the fix as much as possible.

One optimization we can do is to call device_pm_move_to_tail(dev) in
really_probe() only after deferred probe is triggered (as in the
thread gets to run) for the first time. Until then, really_probe()
wouldn't have to call device_pm_move_to_tail(dev);

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-07-10 20:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25  3:24 [PATCH v1] driver core: Fix suspend/resume order issue with deferred probe Saravana Kannan
2020-06-25  8:57 ` Geert Uytterhoeven
2020-06-25 17:02   ` Saravana Kannan
2020-06-25 15:19 ` Rafael J. Wysocki
2020-06-25 16:48   ` Saravana Kannan
2020-06-25 16:58     ` Rafael J. Wysocki
2020-06-25 17:01       ` Saravana Kannan
2020-06-25 17:03         ` Rafael J. Wysocki
2020-06-25 17:08           ` Saravana Kannan
2020-06-25 17:46             ` Rafael J. Wysocki
2020-06-25 17:51               ` Saravana Kannan
2020-06-26 11:27                 ` Rafael J. Wysocki
2020-06-26 20:34                   ` Saravana Kannan
2020-06-26 20:53                     ` Geert Uytterhoeven
2020-06-30 13:50                       ` Rafael J. Wysocki
2020-06-30 15:38                         ` Greg Kroah-Hartman
2020-06-30 16:11                           ` Rafael J. Wysocki
2020-06-30 17:11                             ` Saravana Kannan
2020-06-30 17:15                               ` Rafael J. Wysocki
2020-07-10 13:21                               ` Greg Kroah-Hartman
2020-07-10 20:47                                 ` Saravana Kannan
2020-07-01 11:07                             ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).