linux-renesas-soc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
       [not found] ` <20200515053500.215929-5-saravanak@google.com>
@ 2020-06-17 12:20   ` Geert Uytterhoeven
  2020-06-17 18:36     ` Saravana Kannan
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2020-06-17 12:20 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Saravana,

On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> parsing of the device tree nodes when a lot of devices are added. This
> will significantly cut down parsing time (as much a 1 second on some
> systems). So, use them when adding devices for all the top level device
> tree nodes in a system.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
when adding all top level devices") in v5.8-rc1, and I have bisected a
regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
no longer be woken up from s2ram by a GPIO key. Reverting the commit
fixes the issue.

On these systems, the GPIO/PFC block has its interrupt lines connected
to intermediate interrupt controllers (Renesas INTC), which are in turn
connected to the main interrupt controller (ARM GIC).  The INTC block is
part of a power and clock domain.  Hence if a GPIO is enabled as a
wake-up source, the INTC is part of the wake-up path, and thus must be
kept enabled when entering s2ram.

While this commit has no impact on probe order for me (unlike in Marek's
case), it does have an impact on suspend order:
  - Before this commit:
      1. The keyboard (gpio-keys) is suspended, and calls
         enable_irq_wake() to inform the upstream interrupt controller
         (INTC) that it is part of the wake-up path,
      2. INTC is suspended, and calls device_set_wakeup_path() to inform
         the device core that it must be kept enabled,
      3. The system is woken by pressing a wake-up key.

  - After this commit:
      1. INTC is suspended, and is not aware it is part of the wake-up
         path, so it is disabled by the device core,
      2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
      3. Pressing a wake-up key has no effect, as INTC is disabled, and
         the interrupt does not come through.

It looks like no device links are involved, as both gpio-keys and INTC have
no links.
Do you have a clue?

Thanks!

> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -538,7 +538,9 @@ static int __init of_platform_default_populate_init(void)
>         }
>
>         /* Populate everything else. */
> +       fw_devlink_pause();
>         of_platform_default_populate(NULL, NULL, NULL);
> +       fw_devlink_resume();
>
>         return 0;
>  }

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-17 12:20   ` [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices Geert Uytterhoeven
@ 2020-06-17 18:36     ` Saravana Kannan
  2020-06-18  7:31       ` Geert Uytterhoeven
  0 siblings, 1 reply; 9+ messages in thread
From: Saravana Kannan @ 2020-06-17 18:36 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Geert,

On Wed, Jun 17, 2020 at 5:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > parsing of the device tree nodes when a lot of devices are added. This
> > will significantly cut down parsing time (as much a 1 second on some
> > systems). So, use them when adding devices for all the top level device
> > tree nodes in a system.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
> when adding all top level devices") in v5.8-rc1, and I have bisected a
> regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
> no longer be woken up from s2ram by a GPIO key. Reverting the commit
> fixes the issue.
>
> On these systems, the GPIO/PFC block has its interrupt lines connected
> to intermediate interrupt controllers (Renesas INTC), which are in turn
> connected to the main interrupt controller (ARM GIC).  The INTC block is
> part of a power and clock domain.  Hence if a GPIO is enabled as a
> wake-up source, the INTC is part of the wake-up path, and thus must be
> kept enabled when entering s2ram.
>
> While this commit has no impact on probe order for me (unlike in Marek's
> case), it does have an impact on suspend order:
>   - Before this commit:
>       1. The keyboard (gpio-keys) is suspended, and calls
>          enable_irq_wake() to inform the upstream interrupt controller
>          (INTC) that it is part of the wake-up path,
>       2. INTC is suspended, and calls device_set_wakeup_path() to inform
>          the device core that it must be kept enabled,
>       3. The system is woken by pressing a wake-up key.
>
>   - After this commit:
>       1. INTC is suspended, and is not aware it is part of the wake-up
>          path, so it is disabled by the device core,
>       2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
>       3. Pressing a wake-up key has no effect, as INTC is disabled, and
>          the interrupt does not come through.
>
> It looks like no device links are involved, as both gpio-keys and INTC have
> no links.
> Do you have a clue?
>
> Thanks!

That patch of mine defers probe on all devices added by the
of_platform_default_populate() call, and then once the call returns,
it immediately triggers a deferred probe.

So all these devices are being probed in parallel in the deferred
probe workqueue while the main "initcall thread" continues down to
further initcalls. It looks like some of the drivers in subsequent
initcalls are assuming that devices in the earlier initcalls always
probe and can't be deferred?

There are two options.
1. Fix these drivers.
2. Add a "flush deferred workqueue" in fw_devlink_resume()

I'd rather we fix the drivers so that they handle deferred probes
correctly. Thoughts?

-Saravana

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-17 18:36     ` Saravana Kannan
@ 2020-06-18  7:31       ` Geert Uytterhoeven
  2020-06-18 23:00         ` Saravana Kannan
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2020-06-18  7:31 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Saravana,

On Wed, Jun 17, 2020 at 8:36 PM Saravana Kannan <saravanak@google.com> wrote:
> On Wed, Jun 17, 2020 at 5:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> > > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > > parsing of the device tree nodes when a lot of devices are added. This
> > > will significantly cut down parsing time (as much a 1 second on some
> > > systems). So, use them when adding devices for all the top level device
> > > tree nodes in a system.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
> > when adding all top level devices") in v5.8-rc1, and I have bisected a
> > regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
> > no longer be woken up from s2ram by a GPIO key. Reverting the commit
> > fixes the issue.
> >
> > On these systems, the GPIO/PFC block has its interrupt lines connected
> > to intermediate interrupt controllers (Renesas INTC), which are in turn
> > connected to the main interrupt controller (ARM GIC).  The INTC block is
> > part of a power and clock domain.  Hence if a GPIO is enabled as a
> > wake-up source, the INTC is part of the wake-up path, and thus must be
> > kept enabled when entering s2ram.
> >
> > While this commit has no impact on probe order for me (unlike in Marek's
> > case), it does have an impact on suspend order:
> >   - Before this commit:
> >       1. The keyboard (gpio-keys) is suspended, and calls
> >          enable_irq_wake() to inform the upstream interrupt controller
> >          (INTC) that it is part of the wake-up path,
> >       2. INTC is suspended, and calls device_set_wakeup_path() to inform
> >          the device core that it must be kept enabled,
> >       3. The system is woken by pressing a wake-up key.
> >
> >   - After this commit:
> >       1. INTC is suspended, and is not aware it is part of the wake-up
> >          path, so it is disabled by the device core,
> >       2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
> >       3. Pressing a wake-up key has no effect, as INTC is disabled, and
> >          the interrupt does not come through.
> >
> > It looks like no device links are involved, as both gpio-keys and INTC have
> > no links.
> > Do you have a clue?
> >
> > Thanks!
>
> That patch of mine defers probe on all devices added by the
> of_platform_default_populate() call, and then once the call returns,
> it immediately triggers a deferred probe.
>
> So all these devices are being probed in parallel in the deferred
> probe workqueue while the main "initcall thread" continues down to
> further initcalls. It looks like some of the drivers in subsequent
> initcalls are assuming that devices in the earlier initcalls always
> probe and can't be deferred?
>
> There are two options.
> 1. Fix these drivers.
> 2. Add a "flush deferred workqueue" in fw_devlink_resume()
>
> I'd rather we fix the drivers so that they handle deferred probes
> correctly. Thoughts?

While the affected drivers should handle deferred probe fine, none of
the affected drivers is subject to deferred probing: they all probe
successfully on first try (I had added debug prints to
platform_drv_probe() to be sure).
The affected drivers are still probed in the same order (INTC is one of
the earliest drivers probed, gpio-keys is the last). However, during
system suspend, gpio-keys is suspended before INTC, which is wrong, as
gpio-keys uses an interrupt provided by INTC.

Perhaps the "in parallel" is the real culprit, and there is a race
condition somewhere?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-18  7:31       ` Geert Uytterhoeven
@ 2020-06-18 23:00         ` Saravana Kannan
  2020-06-19 12:24           ` Geert Uytterhoeven
  0 siblings, 1 reply; 9+ messages in thread
From: Saravana Kannan @ 2020-06-18 23:00 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

On Thu, Jun 18, 2020 at 12:32 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jun 17, 2020 at 8:36 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Wed, Jun 17, 2020 at 5:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > > > parsing of the device tree nodes when a lot of devices are added. This
> > > > will significantly cut down parsing time (as much a 1 second on some
> > > > systems). So, use them when adding devices for all the top level device
> > > > tree nodes in a system.
> > > >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > >
> > > This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
> > > when adding all top level devices") in v5.8-rc1, and I have bisected a
> > > regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
> > > no longer be woken up from s2ram by a GPIO key. Reverting the commit
> > > fixes the issue.
> > >
> > > On these systems, the GPIO/PFC block has its interrupt lines connected
> > > to intermediate interrupt controllers (Renesas INTC), which are in turn
> > > connected to the main interrupt controller (ARM GIC).  The INTC block is
> > > part of a power and clock domain.  Hence if a GPIO is enabled as a
> > > wake-up source, the INTC is part of the wake-up path, and thus must be
> > > kept enabled when entering s2ram.
> > >
> > > While this commit has no impact on probe order for me (unlike in Marek's
> > > case), it does have an impact on suspend order:
> > >   - Before this commit:
> > >       1. The keyboard (gpio-keys) is suspended, and calls
> > >          enable_irq_wake() to inform the upstream interrupt controller
> > >          (INTC) that it is part of the wake-up path,
> > >       2. INTC is suspended, and calls device_set_wakeup_path() to inform
> > >          the device core that it must be kept enabled,
> > >       3. The system is woken by pressing a wake-up key.
> > >
> > >   - After this commit:
> > >       1. INTC is suspended, and is not aware it is part of the wake-up
> > >          path, so it is disabled by the device core,
> > >       2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
> > >       3. Pressing a wake-up key has no effect, as INTC is disabled, and
> > >          the interrupt does not come through.
> > >
> > > It looks like no device links are involved, as both gpio-keys and INTC have
> > > no links.
> > > Do you have a clue?
> > >
> > > Thanks!
> >
> > That patch of mine defers probe on all devices added by the
> > of_platform_default_populate() call, and then once the call returns,
> > it immediately triggers a deferred probe.
> >
> > So all these devices are being probed in parallel in the deferred
> > probe workqueue while the main "initcall thread" continues down to
> > further initcalls. It looks like some of the drivers in subsequent
> > initcalls are assuming that devices in the earlier initcalls always
> > probe and can't be deferred?
> >
> > There are two options.
> > 1. Fix these drivers.
> > 2. Add a "flush deferred workqueue" in fw_devlink_resume()
> >
> > I'd rather we fix the drivers so that they handle deferred probes
> > correctly. Thoughts?
>
> While the affected drivers should handle deferred probe fine, none of
> the affected drivers is subject to deferred probing: they all probe
> successfully on first try (I had added debug prints to
> platform_drv_probe() to be sure).
> The affected drivers are still probed in the same order (INTC is one of
> the earliest drivers probed, gpio-keys is the last).

Thanks, this is useful info. Now I know that my patch isn't somehow
reordering devices that would have probed as soon as
of_platform_default_populate_init() added them.

When you say the "The affected drivers are still probed in the same
order", are you only referring to the devices that would have probed
before of_platform_default_populate_init() returns? Or ALL devices in
the system are probing in the same order?

I assume gpio-keys gets probed in the "normal init thread" and not by
the deferred probe workqueue? I'm guessing this because gpio_keys
driver seems to register during late_initcall() whereas
of_platform_default_populate_init() runs as an arch_initcall_sync().

> However, during
> system suspend, gpio-keys is suspended before INTC, which is wrong, as
> gpio-keys uses an interrupt provided by INTC.
>
> Perhaps the "in parallel" is the real culprit, and there is a race
> condition somewhere?

I tried digging into the gpio_keys driver code to see how it interacts
with INTC and if gpio-keys defers probe if INTC hasn't probed yet. But
it seems like a rabbit hole that'd be easier to figure out when you
have the device. Can you check if gpio-keys is probing before INTC in
the "bad" case?

Also, in general, can you see if there's a difference in the probe
order between all the devices in the system? Adding a log to
really_probe() would be better in case non-platform devices are
getting reordered (my change affects all devices that are created from
DT, not just platform devices).

I want to make sure we understand the real issue before we try to fix it.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-18 23:00         ` Saravana Kannan
@ 2020-06-19 12:24           ` Geert Uytterhoeven
  2020-06-19 20:07             ` Saravana Kannan
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2020-06-19 12:24 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Saravana,

On Fri, Jun 19, 2020 at 1:01 AM Saravana Kannan <saravanak@google.com> wrote:
> On Thu, Jun 18, 2020 at 12:32 AM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Wed, Jun 17, 2020 at 8:36 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Wed, Jun 17, 2020 at 5:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > > > > parsing of the device tree nodes when a lot of devices are added. This
> > > > > will significantly cut down parsing time (as much a 1 second on some
> > > > > systems). So, use them when adding devices for all the top level device
> > > > > tree nodes in a system.
> > > > >
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > >
> > > > This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
> > > > when adding all top level devices") in v5.8-rc1, and I have bisected a
> > > > regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
> > > > no longer be woken up from s2ram by a GPIO key. Reverting the commit
> > > > fixes the issue.
> > > >
> > > > On these systems, the GPIO/PFC block has its interrupt lines connected
> > > > to intermediate interrupt controllers (Renesas INTC), which are in turn
> > > > connected to the main interrupt controller (ARM GIC).  The INTC block is
> > > > part of a power and clock domain.  Hence if a GPIO is enabled as a
> > > > wake-up source, the INTC is part of the wake-up path, and thus must be
> > > > kept enabled when entering s2ram.
> > > >
> > > > While this commit has no impact on probe order for me (unlike in Marek's
> > > > case), it does have an impact on suspend order:
> > > >   - Before this commit:
> > > >       1. The keyboard (gpio-keys) is suspended, and calls
> > > >          enable_irq_wake() to inform the upstream interrupt controller
> > > >          (INTC) that it is part of the wake-up path,
> > > >       2. INTC is suspended, and calls device_set_wakeup_path() to inform
> > > >          the device core that it must be kept enabled,
> > > >       3. The system is woken by pressing a wake-up key.
> > > >
> > > >   - After this commit:
> > > >       1. INTC is suspended, and is not aware it is part of the wake-up
> > > >          path, so it is disabled by the device core,
> > > >       2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
> > > >       3. Pressing a wake-up key has no effect, as INTC is disabled, and
> > > >          the interrupt does not come through.
> > > >
> > > > It looks like no device links are involved, as both gpio-keys and INTC have
> > > > no links.
> > > > Do you have a clue?
> > > >
> > > > Thanks!
> > >
> > > That patch of mine defers probe on all devices added by the
> > > of_platform_default_populate() call, and then once the call returns,
> > > it immediately triggers a deferred probe.
> > >
> > > So all these devices are being probed in parallel in the deferred
> > > probe workqueue while the main "initcall thread" continues down to
> > > further initcalls. It looks like some of the drivers in subsequent
> > > initcalls are assuming that devices in the earlier initcalls always
> > > probe and can't be deferred?
> > >
> > > There are two options.
> > > 1. Fix these drivers.
> > > 2. Add a "flush deferred workqueue" in fw_devlink_resume()
> > >
> > > I'd rather we fix the drivers so that they handle deferred probes
> > > correctly. Thoughts?
> >
> > While the affected drivers should handle deferred probe fine, none of
> > the affected drivers is subject to deferred probing: they all probe
> > successfully on first try (I had added debug prints to
> > platform_drv_probe() to be sure).
> > The affected drivers are still probed in the same order (INTC is one of
> > the earliest drivers probed, gpio-keys is the last).
>
> Thanks, this is useful info. Now I know that my patch isn't somehow
> reordering devices that would have probed as soon as
> of_platform_default_populate_init() added them.
>
> When you say the "The affected drivers are still probed in the same
> order", are you only referring to the devices that would have probed
> before of_platform_default_populate_init() returns? Or ALL devices in
> the system are probing in the same order?

I was referring to all platform devices (based on a debug print added to
platform_drv_probe()). See more below.

> I assume gpio-keys gets probed in the "normal init thread" and not by
> the deferred probe workqueue? I'm guessing this because gpio_keys
> driver seems to register during late_initcall() whereas
> of_platform_default_populate_init() runs as an arch_initcall_sync().

After adding a WARN(1, ...) to gpio_keys_probe(), the backtrace shows it
is called directly from do_one_initcall(), in both the good and the bad
case.

> > However, during
> > system suspend, gpio-keys is suspended before INTC, which is wrong, as
> > gpio-keys uses an interrupt provided by INTC.
> >
> > Perhaps the "in parallel" is the real culprit, and there is a race
> > condition somewhere?
>
> I tried digging into the gpio_keys driver code to see how it interacts
> with INTC and if gpio-keys defers probe if INTC hasn't probed yet. But
> it seems like a rabbit hole that'd be easier to figure out when you
> have the device. Can you check if gpio-keys is probing before INTC in
> the "bad" case?

It is not, gpio-keys is always probed very late.

Hence for testing, I moved gpio-keys initialization just before INTC, so
it is probed before INTC. Then gpio-keys is deferred, as expected, and
reprobes successfully later.
Interestingly, that fixes my wake-up issue, too?!?

> Also, in general, can you see if there's a difference in the probe
> order between all the devices in the system? Adding a log to
> really_probe() would be better in case non-platform devices are
> getting reordered (my change affects all devices that are created from
> DT, not just platform devices).
>
> I want to make sure we understand the real issue before we try to fix it.

Enabling all debug prints in really_probe(), comparing the output
before/after the bad commit, and filtering out all noise, I get:

     bus: 'platform': really_probe: probing driver reg-dummy with
device reg-dummy
B    bus: 'platform': really_probe: probing driver renesas_intc_irqpin
with device e6900000.interrupt-controller
     bus: 'platform': really_probe: probing driver renesas_intc_irqpin
with device e6900004.interrupt-controller
A   +Workqueue: events deferred_probe_work_func
     bus: 'platform': really_probe: probing driver renesas_intc_irqpin
with device e6900008.interrupt-controller
C    bus: 'platform': really_probe: probing driver renesas_intc_irqpin
with device e690000c.interrupt-controller
     bus: 'platform': really_probe: probing driver sh-pfc with device
e6050000.pin-controller
     bus: 'platform': really_probe: probing driver reg-fixed-voltage
with device regulator-1p8v
     bus: 'platform': really_probe: probing driver reg-fixed-voltage
with device regulator-3p3v
     bus: 'platform': really_probe: probing driver reg-fixed-voltage
with device regulator-vmmc-sdhi0
     bus: 'platform': really_probe: probing driver reg-fixed-voltage
with device regulator-vmmc-sdhi2
     bus: 'platform': really_probe: probing driver i2c-sh_mobile with
device e6820000.i2c
     bus: 'i2c': really_probe: probing driver as3711 with device 0-0040
     bus: 'platform': really_probe: probing driver as3711-regulator
with device as3711-regulator
     bus: 'platform': really_probe: probing driver i2c-sh_mobile with
device e6822000.i2c
     bus: 'platform': really_probe: probing driver i2c-sh_mobile with
device e6826000.i2c
     bus: 'i2c': really_probe: probing driver pcf857x with device 2-0020
     bus: 'platform': really_probe: probing driver sh_cmt with device
e6138000.timer
     bus: 'platform': really_probe: probing driver armv7-pmu with device pmu
     bus: 'platform': really_probe: probing driver simple-pm-bus with
device fec10000.bus
     bus: 'platform': really_probe: probing driver as3711-backlight
with device as3711-backlight
     bus: 'platform': really_probe: probing driver sh-sci with device
e6c80000.serial
     bus: 'platform': really_probe: probing driver smsc911x with
device 10000000.ethernet
     bus: 'i2c': really_probe: probing driver st1232-ts with device 1-0055
     bus: 'i2c': really_probe: probing driver adxl34x with device 0-001d
     bus: 'i2c': really_probe: probing driver rtc-rs5c372 with device 0-0032
     bus: 'platform': really_probe: probing driver rmobile_reset with
device e6180000.system-controller
     bus: 'platform': really_probe: probing driver cpufreq-dt with
device cpufreq-dt
     bus: 'platform': really_probe: probing driver sh_mobile_sdhi with
device ee100000.sd
     bus: 'platform': really_probe: probing driver sh_mobile_sdhi with
device ee140000.sd
     bus: 'platform': really_probe: probing driver sh_mmcif with
device e6bd0000.mmc
     bus: 'platform': really_probe: probing driver leds-gpio with device leds
     bus: 'i2c': really_probe: probing driver ak8975 with device 0-000c
     bus: 'platform': really_probe: probing driver snd-soc-dummy with
device snd-soc-dummy
     bus: 'i2c': really_probe: probing driver ak4642-codec with device 0-0012
     bus: 'platform': really_probe: probing driver asoc-simple-card
with device sound
     bus: 'platform': really_probe: probing driver fsi-pcm-audio with
device ec230000.sound
     bus: 'platform': really_probe: probing driver asoc-simple-card
with device sound
     bus: 'platform': really_probe: probing driver gpio-keys with
device keyboard"
     bus: 'mmc': really_probe: probing driver mmcblk with device mmc2:0001

So all devices are probed in the exact same order.
A: Note the addition of the message "Workqueue: events
   deferred_probe_work_func", which might give a clue?
B,C: "e6900000.interrupt-controller" and "e6900008.interrupt-controller"
     are the two devices that are suspended later in the wrong order.
     One of them is probed before A, one after, so A may be a red herring?

I'm still not much wiser, though....

BTW, r8a7740/armadillo is single CPU, while sh73a0/kzm9g is dual-CPU.
So both UP and SMP are affected.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-19 12:24           ` Geert Uytterhoeven
@ 2020-06-19 20:07             ` Saravana Kannan
  2020-06-20  2:32               ` Saravana Kannan
  0 siblings, 1 reply; 9+ messages in thread
From: Saravana Kannan @ 2020-06-19 20:07 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Snipped a bunch of old context.

On Fri, Jun 19, 2020 at 5:24 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> On Fri, Jun 19, 2020 at 1:01 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > When you say the "The affected drivers are still probed in the same
> > order", are you only referring to the devices that would have probed
> > before of_platform_default_populate_init() returns? Or ALL devices in
> > the system are probing in the same order?
>
> I was referring to all platform devices (based on a debug print added to
> platform_drv_probe()). See more below.
>
> > I assume gpio-keys gets probed in the "normal init thread" and not by
> > the deferred probe workqueue? I'm guessing this because gpio_keys
> > driver seems to register during late_initcall() whereas
> > of_platform_default_populate_init() runs as an arch_initcall_sync().
>
> After adding a WARN(1, ...) to gpio_keys_probe(), the backtrace shows it
> is called directly from do_one_initcall(), in both the good and the bad
> case.
>
> > > However, during
> > > system suspend, gpio-keys is suspended before INTC, which is wrong, as
> > > gpio-keys uses an interrupt provided by INTC.
> > >
> > > Perhaps the "in parallel" is the real culprit, and there is a race
> > > condition somewhere?
> >
> > I tried digging into the gpio_keys driver code to see how it interacts
> > with INTC and if gpio-keys defers probe if INTC hasn't probed yet. But
> > it seems like a rabbit hole that'd be easier to figure out when you
> > have the device. Can you check if gpio-keys is probing before INTC in
> > the "bad" case?
>
> It is not, gpio-keys is always probed very late.
>
> Hence for testing, I moved gpio-keys initialization just before INTC, so
> it is probed before INTC. Then gpio-keys is deferred, as expected, and
> reprobes successfully later.
> Interestingly, that fixes my wake-up issue, too?!?
>
> > Also, in general, can you see if there's a difference in the probe
> > order between all the devices in the system? Adding a log to
> > really_probe() would be better in case non-platform devices are
> > getting reordered (my change affects all devices that are created from
> > DT, not just platform devices).
> >
> > I want to make sure we understand the real issue before we try to fix it.
>
> Enabling all debug prints in really_probe(), comparing the output
> before/after the bad commit, and filtering out all noise, I get:
>
>      bus: 'platform': really_probe: probing driver reg-dummy with
> device reg-dummy
> B    bus: 'platform': really_probe: probing driver renesas_intc_irqpin
> with device e6900000.interrupt-controller
>      bus: 'platform': really_probe: probing driver renesas_intc_irqpin
> with device e6900004.interrupt-controller
> A   +Workqueue: events deferred_probe_work_func
>      bus: 'platform': really_probe: probing driver renesas_intc_irqpin
> with device e6900008.interrupt-controller
> C    bus: 'platform': really_probe: probing driver renesas_intc_irqpin
> with device e690000c.interrupt-controller
>      bus: 'platform': really_probe: probing driver sh-pfc with device
> e6050000.pin-controller
>      bus: 'platform': really_probe: probing driver reg-fixed-voltage
> with device regulator-1p8v
>      bus: 'platform': really_probe: probing driver reg-fixed-voltage
> with device regulator-3p3v
>      bus: 'platform': really_probe: probing driver reg-fixed-voltage
> with device regulator-vmmc-sdhi0
>      bus: 'platform': really_probe: probing driver reg-fixed-voltage
> with device regulator-vmmc-sdhi2
>      bus: 'platform': really_probe: probing driver i2c-sh_mobile with
> device e6820000.i2c
>      bus: 'i2c': really_probe: probing driver as3711 with device 0-0040
>      bus: 'platform': really_probe: probing driver as3711-regulator
> with device as3711-regulator
>      bus: 'platform': really_probe: probing driver i2c-sh_mobile with
> device e6822000.i2c
>      bus: 'platform': really_probe: probing driver i2c-sh_mobile with
> device e6826000.i2c
>      bus: 'i2c': really_probe: probing driver pcf857x with device 2-0020
>      bus: 'platform': really_probe: probing driver sh_cmt with device
> e6138000.timer
>      bus: 'platform': really_probe: probing driver armv7-pmu with device pmu
>      bus: 'platform': really_probe: probing driver simple-pm-bus with
> device fec10000.bus
>      bus: 'platform': really_probe: probing driver as3711-backlight
> with device as3711-backlight
>      bus: 'platform': really_probe: probing driver sh-sci with device
> e6c80000.serial
>      bus: 'platform': really_probe: probing driver smsc911x with
> device 10000000.ethernet
>      bus: 'i2c': really_probe: probing driver st1232-ts with device 1-0055
>      bus: 'i2c': really_probe: probing driver adxl34x with device 0-001d
>      bus: 'i2c': really_probe: probing driver rtc-rs5c372 with device 0-0032
>      bus: 'platform': really_probe: probing driver rmobile_reset with
> device e6180000.system-controller
>      bus: 'platform': really_probe: probing driver cpufreq-dt with
> device cpufreq-dt
>      bus: 'platform': really_probe: probing driver sh_mobile_sdhi with
> device ee100000.sd
>      bus: 'platform': really_probe: probing driver sh_mobile_sdhi with
> device ee140000.sd
>      bus: 'platform': really_probe: probing driver sh_mmcif with
> device e6bd0000.mmc
>      bus: 'platform': really_probe: probing driver leds-gpio with device leds
>      bus: 'i2c': really_probe: probing driver ak8975 with device 0-000c
>      bus: 'platform': really_probe: probing driver snd-soc-dummy with
> device snd-soc-dummy
>      bus: 'i2c': really_probe: probing driver ak4642-codec with device 0-0012
>      bus: 'platform': really_probe: probing driver asoc-simple-card
> with device sound
>      bus: 'platform': really_probe: probing driver fsi-pcm-audio with
> device ec230000.sound
>      bus: 'platform': really_probe: probing driver asoc-simple-card
> with device sound
>      bus: 'platform': really_probe: probing driver gpio-keys with
> device keyboard"
>      bus: 'mmc': really_probe: probing driver mmcblk with device mmc2:0001
>
> So all devices are probed in the exact same order.
> A: Note the addition of the message "Workqueue: events
>    deferred_probe_work_func", which might give a clue?
> B,C: "e6900000.interrupt-controller" and "e6900008.interrupt-controller"
>      are the two devices that are suspended later in the wrong order.
>      One of them is probed before A, one after, so A may be a red herring?
>

Thanks! Your logs and the tip above helped!

*face palm*/Sigh... (for what is happening)

In the absence of other info (like device links, parent-child
relationship, etc) the device_add() order is used as an implicit order
for suspend (in reverse) and resume.
device_add() -> device_pm_add() -> list_add_tail(..., &dpm_list)

But the deferred probe code reorders the deferred devices:
deferred_probe_work_func() -> device_pm_move_to_tail()

So, this is what is happening:
1. of_platform_default_populate() adds all the top level devices in DT order.
2. So dpm_list is: ... INTC, ..., gpio-keys
3. Only devices with drivers already registered by this time get added
to the deferred probe list (because that only happens when a driver
matches)
4. INTC is one of them.
5. So when deferred_probe_work_func() gets to INTC, it moves it to the
end of the dpm_list
6. So dpm_list is now: ..., gpio-keys, ... INTC

Things fail.

Obviously, my patch is causing this unintended behavior and I could
try to fix it. But I'm not really sure if the actual bug is in my
patch...

Deferred probe is supposed to allow a driver to defer probe without
worrying something else is going to get screwed up. Also, order of
devices in DT isn't supposed to affect functionality. Both of these
are not true with the current behavior.

I think instead of deferred_probe_work_func() moving the device to the
end of the dpm_list, I think the device probing successfully is what
should move it to the end of the dpm_list. That way, the dpm_list is
actually ordered by when the devices become functional and not the
random order in DT or random probe order which can get pretty
convoluted with multiple deferred probes. This feels right and will
make suspend/resume more robust against DT ordering -- but I'm not
sure what other wide ranging impact this has for other platforms.

Greg/Rafael, do you agree with the paragraph above? If yes, I can send
a patch out that orders dpm_list by probe success order.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-19 20:07             ` Saravana Kannan
@ 2020-06-20  2:32               ` Saravana Kannan
  2020-06-22 15:49                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 9+ messages in thread
From: Saravana Kannan @ 2020-06-20  2:32 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

On Fri, Jun 19, 2020 at 1:07 PM Saravana Kannan <saravanak@google.com> wrote:
>
> I think instead of deferred_probe_work_func() moving the device to the
> end of the dpm_list, I think the device probing successfully is what
> should move it to the end of the dpm_list. That way, the dpm_list is
> actually ordered by when the devices become functional and not the
> random order in DT or random probe order which can get pretty
> convoluted with multiple deferred probes. This feels right and will
> make suspend/resume more robust against DT ordering -- but I'm not
> sure what other wide ranging impact this has for other platforms.

Geert,

If you want to play around with a potential fix to test my hypothesis,
I think it's just adding this one line to driver_bound():
============
klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
device_links_driver_bound(dev);
+device_pm_move_to_tail(dev);

device_pm_check_callbacks(dev);
============

-Saravana

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-20  2:32               ` Saravana Kannan
@ 2020-06-22 15:49                 ` Geert Uytterhoeven
  2020-06-24 23:22                   ` Saravana Kannan
  0 siblings, 1 reply; 9+ messages in thread
From: Geert Uytterhoeven @ 2020-06-22 15:49 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Saravana,

On Sat, Jun 20, 2020 at 4:33 AM Saravana Kannan <saravanak@google.com> wrote:
> On Fri, Jun 19, 2020 at 1:07 PM Saravana Kannan <saravanak@google.com> wrote:
> > I think instead of deferred_probe_work_func() moving the device to the
> > end of the dpm_list, I think the device probing successfully is what
> > should move it to the end of the dpm_list. That way, the dpm_list is
> > actually ordered by when the devices become functional and not the
> > random order in DT or random probe order which can get pretty
> > convoluted with multiple deferred probes. This feels right and will
> > make suspend/resume more robust against DT ordering -- but I'm not
> > sure what other wide ranging impact this has for other platforms.
>
> If you want to play around with a potential fix to test my hypothesis,
> I think it's just adding this one line to driver_bound():
> ============
> klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> device_links_driver_bound(dev);
> +device_pm_move_to_tail(dev);
>
> device_pm_check_callbacks(dev);
> ============

Thanks, that seems to fix the issue for me, on both affected systems!
Note that this has quite some impact on the order devices are suspended,
but this seems harmless.

Will try on more systems later...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-22 15:49                 ` Geert Uytterhoeven
@ 2020-06-24 23:22                   ` Saravana Kannan
  0 siblings, 0 replies; 9+ messages in thread
From: Saravana Kannan @ 2020-06-24 23:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

On Mon, Jun 22, 2020 at 8:49 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Sat, Jun 20, 2020 at 4:33 AM Saravana Kannan <saravanak@google.com> wrote:
> > On Fri, Jun 19, 2020 at 1:07 PM Saravana Kannan <saravanak@google.com> wrote:
> > > I think instead of deferred_probe_work_func() moving the device to the
> > > end of the dpm_list, I think the device probing successfully is what
> > > should move it to the end of the dpm_list. That way, the dpm_list is
> > > actually ordered by when the devices become functional and not the
> > > random order in DT or random probe order which can get pretty
> > > convoluted with multiple deferred probes. This feels right and will
> > > make suspend/resume more robust against DT ordering -- but I'm not
> > > sure what other wide ranging impact this has for other platforms.
> >
> > If you want to play around with a potential fix to test my hypothesis,
> > I think it's just adding this one line to driver_bound():
> > ============
> > klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> > device_links_driver_bound(dev);
> > +device_pm_move_to_tail(dev);
> >
> > device_pm_check_callbacks(dev);
> > ============
>
> Thanks, that seems to fix the issue for me, on both affected systems!
> Note that this has quite some impact on the order devices are suspended,
> but this seems harmless.
>
> Will try on more systems later...

Thanks for testing. Maybe I should just send that change as a patch
and see what Greg/Rafael have to say to that.

It's a general fix anyway. So, might as well send it out.

-Saravana

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-06-24 23:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200515053500.215929-1-saravanak@google.com>
     [not found] ` <20200515053500.215929-5-saravanak@google.com>
2020-06-17 12:20   ` [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices Geert Uytterhoeven
2020-06-17 18:36     ` Saravana Kannan
2020-06-18  7:31       ` Geert Uytterhoeven
2020-06-18 23:00         ` Saravana Kannan
2020-06-19 12:24           ` Geert Uytterhoeven
2020-06-19 20:07             ` Saravana Kannan
2020-06-20  2:32               ` Saravana Kannan
2020-06-22 15:49                 ` Geert Uytterhoeven
2020-06-24 23:22                   ` Saravana Kannan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).