All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] PM Domains: Ensure the provider is resumed first
@ 2020-11-16 16:17 Jon Hunter
  2020-11-19 10:15 ` Ulf Hansson
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Hunter @ 2020-11-16 16:17 UTC (permalink / raw)
  To: Ulf Hansson, Rafael J . Wysocki, Kevin Hilman, linux-tegra, linux-pm

Hi all,

We recently ran into a problem on Tegra186 where it was failing to
resume from suspend. It turned out that a driver, the Tegra ACONNECT
(drivers/bus/tegra-aconnect.c), was being resumed before the PM domain
provider, the BPMP (drivers/firmware/tegra/bpmp.c), and the Tegra
ACONNECT was trying to enable the PM domain before the provider had been
resumed.

According to commit 4d23a5e84806 it states that 'genpd powers on the PM
domain unconditionally in the system PM resume "noirq" phase'. However,
what I don't see is anything that guarantees that the provider is
resumed before any device that requires power domains. Unless there is
something that I am missing?

Now by default the ACONNECT is resumed during the noirq phase, but I
have tried making it and its child devices, suspend/resume in the normal
system phase but this does not seem to make a difference. So I am
looking for a bit of guidance on how best to fix this.

Thanks
Jon




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] PM Domains: Ensure the provider is resumed first
  2020-11-16 16:17 [RFC] PM Domains: Ensure the provider is resumed first Jon Hunter
@ 2020-11-19 10:15 ` Ulf Hansson
  2020-11-23 13:09   ` Jon Hunter
  0 siblings, 1 reply; 6+ messages in thread
From: Ulf Hansson @ 2020-11-19 10:15 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Rafael J . Wysocki, Kevin Hilman, linux-tegra, Linux PM

On Mon, 16 Nov 2020 at 17:17, Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi all,
>
> We recently ran into a problem on Tegra186 where it was failing to
> resume from suspend. It turned out that a driver, the Tegra ACONNECT
> (drivers/bus/tegra-aconnect.c), was being resumed before the PM domain
> provider, the BPMP (drivers/firmware/tegra/bpmp.c), and the Tegra
> ACONNECT was trying to enable the PM domain before the provider had been
> resumed.
>
> According to commit 4d23a5e84806 it states that 'genpd powers on the PM
> domain unconditionally in the system PM resume "noirq" phase'. However,
> what I don't see is anything that guarantees that the provider is
> resumed before any device that requires power domains. Unless there is
> something that I am missing?

The genpd provider's ->power_on() callback should be invoked as soon
as an attached device gets resumed via the ->resume_noirq() callback
(genpd_resume_noirq). Have you verified that this is working as
expected for you?

Note that, if there is no device attached to the genpd, the
->power_on() callback may not be invoked - unless there is a child
domain being powered on.

From the genpd provider driver point of view - why do you need to
implement system suspend/resume callbacks at all? Do you have some
additional operations to run, besides those executed from the
->power_on|off() callbacks?

>
> Now by default the ACONNECT is resumed during the noirq phase, but I
> have tried making it and its child devices, suspend/resume in the normal
> system phase but this does not seem to make a difference. So I am
> looking for a bit of guidance on how best to fix this.

The order of suspend/resume of devices are based upon the order the
devices get stored in the "dpm list" - unless there are child/parents
relationships or device-links that the PM core also takes into
account.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] PM Domains: Ensure the provider is resumed first
  2020-11-19 10:15 ` Ulf Hansson
@ 2020-11-23 13:09   ` Jon Hunter
  2020-11-23 14:31     ` Ulf Hansson
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Hunter @ 2020-11-23 13:09 UTC (permalink / raw)
  To: Ulf Hansson; +Cc: Rafael J . Wysocki, Kevin Hilman, linux-tegra, Linux PM

Hi Ulf,

On 19/11/2020 10:15, Ulf Hansson wrote:
> On Mon, 16 Nov 2020 at 17:17, Jon Hunter <jonathanh@nvidia.com> wrote:
>>
>> Hi all,
>>
>> We recently ran into a problem on Tegra186 where it was failing to
>> resume from suspend. It turned out that a driver, the Tegra ACONNECT
>> (drivers/bus/tegra-aconnect.c), was being resumed before the PM domain
>> provider, the BPMP (drivers/firmware/tegra/bpmp.c), and the Tegra
>> ACONNECT was trying to enable the PM domain before the provider had been
>> resumed.
>>
>> According to commit 4d23a5e84806 it states that 'genpd powers on the PM
>> domain unconditionally in the system PM resume "noirq" phase'. However,
>> what I don't see is anything that guarantees that the provider is
>> resumed before any device that requires power domains. Unless there is
>> something that I am missing?
> 
> The genpd provider's ->power_on() callback should be invoked as soon
> as an attached device gets resumed via the ->resume_noirq() callback
> (genpd_resume_noirq). Have you verified that this is working as
> expected for you?

Yes this is working as expected. The problem is that the ->power_on
callback for a device is occurring before the provider itself has been
resumed.

> Note that, if there is no device attached to the genpd, the
> ->power_on() callback may not be invoked - unless there is a child
> domain being powered on.
> 
> From the genpd provider driver point of view - why do you need to
> implement system suspend/resume callbacks at all? Do you have some
> additional operations to run, besides those executed from the
> ->power_on|off() callbacks?

The provider in this case is an embedded controller, the BPMP, and it
needs to be resumed [0] prior to calling the provider callbacks. I am
wondering if any other providers have this requirement?

Thanks
Jon

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/tegra/bpmp.c#n797

-- 
nvpublic

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] PM Domains: Ensure the provider is resumed first
  2020-11-23 13:09   ` Jon Hunter
@ 2020-11-23 14:31     ` Ulf Hansson
  2020-11-23 14:57       ` Jon Hunter
  0 siblings, 1 reply; 6+ messages in thread
From: Ulf Hansson @ 2020-11-23 14:31 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Rafael J . Wysocki, Kevin Hilman, linux-tegra, Linux PM, Saravana Kannan

+ Saravana

On Mon, 23 Nov 2020 at 14:09, Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Ulf,
>
> On 19/11/2020 10:15, Ulf Hansson wrote:
> > On Mon, 16 Nov 2020 at 17:17, Jon Hunter <jonathanh@nvidia.com> wrote:
> >>
> >> Hi all,
> >>
> >> We recently ran into a problem on Tegra186 where it was failing to
> >> resume from suspend. It turned out that a driver, the Tegra ACONNECT
> >> (drivers/bus/tegra-aconnect.c), was being resumed before the PM domain
> >> provider, the BPMP (drivers/firmware/tegra/bpmp.c), and the Tegra
> >> ACONNECT was trying to enable the PM domain before the provider had been
> >> resumed.
> >>
> >> According to commit 4d23a5e84806 it states that 'genpd powers on the PM
> >> domain unconditionally in the system PM resume "noirq" phase'. However,
> >> what I don't see is anything that guarantees that the provider is
> >> resumed before any device that requires power domains. Unless there is
> >> something that I am missing?
> >
> > The genpd provider's ->power_on() callback should be invoked as soon
> > as an attached device gets resumed via the ->resume_noirq() callback
> > (genpd_resume_noirq). Have you verified that this is working as
> > expected for you?
>
> Yes this is working as expected. The problem is that the ->power_on
> callback for a device is occurring before the provider itself has been
> resumed.

I see.

>
> > Note that, if there is no device attached to the genpd, the
> > ->power_on() callback may not be invoked - unless there is a child
> > domain being powered on.
> >
> > From the genpd provider driver point of view - why do you need to
> > implement system suspend/resume callbacks at all? Do you have some
> > additional operations to run, besides those executed from the
> > ->power_on|off() callbacks?
>
> The provider in this case is an embedded controller, the BPMP, and it
> needs to be resumed [0] prior to calling the provider callbacks. I am
> wondering if any other providers have this requirement?

It seems like it should be a rather common requirement for a genpd
provider - at least for those providers that need to run some
additional operations at system suspend/resume.

I guess the reason for this problem is that the order of how the
devices end up in the dpm_list, doesn't fit well for your case.
Normally, a provider should be registered prior and the consumer, as
that would probably lead to that the provider becomes resumed first.

In any case, to make sure the order becomes correct, a device link
should be created between the genpd domain provider and the consumer
device. As a matter of fact, this is done "automagically" during boot
for DT based platforms, see of_link_property() in
drivers/of/property.c.

Currently these device links are created with DL_FLAG_SYNC_STATE_ONLY,
unless the "fw_devlink" kernel command line specifies a different
option (on == DL_FLAG_AUTOPROBE_CONSUMER). I would try to play with
that and see how that turns out.

>
> Thanks
> Jon
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/tegra/bpmp.c#n797

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] PM Domains: Ensure the provider is resumed first
  2020-11-23 14:31     ` Ulf Hansson
@ 2020-11-23 14:57       ` Jon Hunter
  2020-11-23 20:23         ` Saravana Kannan
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Hunter @ 2020-11-23 14:57 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Rafael J . Wysocki, Kevin Hilman, linux-tegra, Linux PM, Saravana Kannan


On 23/11/2020 14:31, Ulf Hansson wrote:

...

>>> Note that, if there is no device attached to the genpd, the
>>> ->power_on() callback may not be invoked - unless there is a child
>>> domain being powered on.
>>>
>>> From the genpd provider driver point of view - why do you need to
>>> implement system suspend/resume callbacks at all? Do you have some
>>> additional operations to run, besides those executed from the
>>> ->power_on|off() callbacks?
>>
>> The provider in this case is an embedded controller, the BPMP, and it
>> needs to be resumed [0] prior to calling the provider callbacks. I am
>> wondering if any other providers have this requirement?
> 
> It seems like it should be a rather common requirement for a genpd
> provider - at least for those providers that need to run some
> additional operations at system suspend/resume.
> 
> I guess the reason for this problem is that the order of how the
> devices end up in the dpm_list, doesn't fit well for your case.
> Normally, a provider should be registered prior and the consumer, as
> that would probably lead to that the provider becomes resumed first.

Yes it does appear that way. The BPMP (genpd provider) is probed before
the ACONNECT (consumer) but still the order in which they end up in is
not what we want.

> In any case, to make sure the order becomes correct, a device link
> should be created between the genpd domain provider and the consumer
> device. As a matter of fact, this is done "automagically" during boot
> for DT based platforms, see of_link_property() in
> drivers/of/property.c.
> 
> Currently these device links are created with DL_FLAG_SYNC_STATE_ONLY,
> unless the "fw_devlink" kernel command line specifies a different
> option (on == DL_FLAG_AUTOPROBE_CONSUMER). I would try to play with
> that and see how that turns out.

OK, good to know. I will take a look at that. Thanks for the inputs.

Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] PM Domains: Ensure the provider is resumed first
  2020-11-23 14:57       ` Jon Hunter
@ 2020-11-23 20:23         ` Saravana Kannan
  0 siblings, 0 replies; 6+ messages in thread
From: Saravana Kannan @ 2020-11-23 20:23 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Ulf Hansson, Rafael J . Wysocki, Kevin Hilman, linux-tegra, Linux PM

On Mon, Nov 23, 2020 at 6:57 AM Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
> On 23/11/2020 14:31, Ulf Hansson wrote:
>
> ...
>
> >>> Note that, if there is no device attached to the genpd, the
> >>> ->power_on() callback may not be invoked - unless there is a child
> >>> domain being powered on.
> >>>
> >>> From the genpd provider driver point of view - why do you need to
> >>> implement system suspend/resume callbacks at all? Do you have some
> >>> additional operations to run, besides those executed from the
> >>> ->power_on|off() callbacks?
> >>
> >> The provider in this case is an embedded controller, the BPMP, and it
> >> needs to be resumed [0] prior to calling the provider callbacks. I am
> >> wondering if any other providers have this requirement?
> >
> > It seems like it should be a rather common requirement for a genpd
> > provider - at least for those providers that need to run some
> > additional operations at system suspend/resume.
> >
> > I guess the reason for this problem is that the order of how the
> > devices end up in the dpm_list, doesn't fit well for your case.
> > Normally, a provider should be registered prior and the consumer, as
> > that would probably lead to that the provider becomes resumed first.
>
> Yes it does appear that way. The BPMP (genpd provider) is probed before
> the ACONNECT (consumer) but still the order in which they end up in is
> not what we want.
>
> > In any case, to make sure the order becomes correct, a device link
> > should be created between the genpd domain provider and the consumer
> > device. As a matter of fact, this is done "automagically" during boot
> > for DT based platforms, see of_link_property() in
> > drivers/of/property.c.

Ulf, thanks for adding me and pointing people to fw_devlink.

> >
> > Currently these device links are created with DL_FLAG_SYNC_STATE_ONLY,
> > unless the "fw_devlink" kernel command line specifies a different
> > option (on == DL_FLAG_AUTOPROBE_CONSUMER). I would try to play with
> > that and see how that turns out.
>
> OK, good to know. I will take a look at that. Thanks for the inputs.

Jon,

Let me know if you run into issues with fw_devlink. I can try to help
you address them. If you hit issues, it's going to be one of these two
cases below.

Known issues with fw_devlink:
1. It doesn't like cycles in dependencies in DT (it does the right
thing for some of them, but not all).
2. If a DT device node has a driver that "probes" it without using
driver core/struct device, it blocks the probing of all its consumers.

(1) is on my TODO list of things to fix
(2) needs the driver to be fixed to not do this.

-Saravana

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-23 20:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16 16:17 [RFC] PM Domains: Ensure the provider is resumed first Jon Hunter
2020-11-19 10:15 ` Ulf Hansson
2020-11-23 13:09   ` Jon Hunter
2020-11-23 14:31     ` Ulf Hansson
2020-11-23 14:57       ` Jon Hunter
2020-11-23 20:23         ` Saravana Kannan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.