All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ulf Hansson <ulf.hansson@linaro.org>
To: Stephan Gerhold <stephan@gerhold.net>
Cc: Stephan Gerhold <stephan.gerhold@kernkonzept.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Andy Gross <agross@kernel.org>,
	Bjorn Andersson <andersson@kernel.org>,
	Konrad Dybcio <konrad.dybcio@linaro.org>,
	Ilia Lin <ilia.lin@kernel.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Rob Herring <robh+dt@kernel.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Conor Dooley <conor+dt@kernel.org>,
	linux-pm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
	linux-kernel@vger.kernel.org, devicetree@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH 1/4] cpufreq: qcom-nvmem: Enable virtual power domain devices
Date: Fri, 29 Sep 2023 15:14:07 +0200	[thread overview]
Message-ID: <CAPDyKFppdXe1AZo1jm2Bc_ZR18hw5Bmh1x+2P7Obhb_rJ2gc4Q@mail.gmail.com> (raw)
In-Reply-To: <ZQGqfMigCFZP_HLA@gerhold.net>

On Wed, 13 Sept 2023 at 14:26, Stephan Gerhold <stephan@gerhold.net> wrote:
>
> On Wed, Sep 13, 2023 at 12:56:16PM +0200, Ulf Hansson wrote:
> > On Tue, 12 Sept 2023 at 11:40, Stephan Gerhold
> > <stephan.gerhold@kernkonzept.com> wrote:
> > >
> > > The genpd core ignores performance state votes from devices that are
> > > runtime suspended as of commit 5937c3ce2122 ("PM: domains: Drop/restore
> > > performance state votes for devices at runtime PM").
> >
> > I think you are referring to the wrong commit above. Please have a
> > look at commit 3c5a272202c2 ("PM: domains: Improve runtime PM
> > performance state handling"), instead.
> >
> > I also suggest rephrasing the above into saying that the performance
> > state vote for a device is cached rather than carried out, if
> > pm_runtime_suspended() returns true for it.
> >
> > Another relevant information in the commit message would be to add
> > that during device-attach (genpd_dev_pm_attach_by_id()), calls
> > pm_runtime_enable() the device.
> >
>
> Thanks, I will try to clarify this a bit! I was actually looking at that
> commit originally but decided to reference the commit that "started the
> change", since the this commit is marked as fix of the one I referenced.
> But I think you're right, it would be more clear to reference "PM:
> domains: Improve runtime PM performance state handling" directly.
>
> > > However, at the
> > > moment nothing ever enables the virtual devices created in
> > > qcom-cpufreq-nvmem for the cpufreq power domain scaling, so they are
> > > permanently runtime-suspended.
> > >
> > > Fix this by enabling the devices after attaching them and use
> > > dev_pm_syscore_device() to ensure the power domain also stays on when
> > > going to suspend. Since it supplies the CPU we can never turn it off
> > > from Linux. There are other mechanisms to turn it off when needed,
> > > usually in the RPM firmware or the cpuidle path.
> > >
> > > Without this fix performance states votes are silently ignored, and the
> > > CPU/CPR voltage is never adjusted. This has been broken since 5.14 but
> > > for some reason no one noticed this on QCS404 so far.
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: 1cb8339ca225 ("cpufreq: qcom: Add support for qcs404 on nvmem driver")
> > > Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
> > > ---
> > >  drivers/cpufreq/qcom-cpufreq-nvmem.c | 21 ++++++++++++++++++++-
> > >  1 file changed, 20 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cpufreq/qcom-cpufreq-nvmem.c b/drivers/cpufreq/qcom-cpufreq-nvmem.c
> > > index 84d7033e5efe..17d6ab14c909 100644
> > > --- a/drivers/cpufreq/qcom-cpufreq-nvmem.c
> > > +++ b/drivers/cpufreq/qcom-cpufreq-nvmem.c
> > > @@ -25,6 +25,7 @@
> > >  #include <linux/platform_device.h>
> > >  #include <linux/pm_domain.h>
> > >  #include <linux/pm_opp.h>
> > > +#include <linux/pm_runtime.h>
> > >  #include <linux/slab.h>
> > >  #include <linux/soc/qcom/smem.h>
> > >
> > > @@ -280,6 +281,7 @@ static int qcom_cpufreq_probe(struct platform_device *pdev)
> > >         }
> > >
> > >         for_each_possible_cpu(cpu) {
> > > +               struct device **virt_devs = NULL;
> > >                 struct dev_pm_opp_config config = {
> > >                         .supported_hw = NULL,
> > >                 };
> > > @@ -300,7 +302,7 @@ static int qcom_cpufreq_probe(struct platform_device *pdev)
> > >
> > >                 if (drv->data->genpd_names) {
> > >                         config.genpd_names = drv->data->genpd_names;
> > > -                       config.virt_devs = NULL;
> > > +                       config.virt_devs = &virt_devs;
> > >                 }
> > >
> > >                 if (config.supported_hw || config.genpd_names) {
> > > @@ -311,6 +313,23 @@ static int qcom_cpufreq_probe(struct platform_device *pdev)
> > >                                 goto free_opp;
> > >                         }
> > >                 }
> > > +
> > > +               if (virt_devs) {
> > > +                       const char * const *name = config.genpd_names;
> > > +                       int i;
> > > +
> > > +                       for (i = 0; *name; i++, name++) {
> > > +                               ret = pm_runtime_resume_and_get(virt_devs[i]);
> > > +                               if (ret) {
> > > +                                       dev_err(cpu_dev, "failed to resume %s: %d\n",
> > > +                                               *name, ret);
> > > +                                       goto free_opp;
> > > +                               }
> >
> > Shouldn't we restore the usage count at ->remove() too?
> >
> > > +
> > > +                               /* Keep CPU power domain always-on */
> > > +                               dev_pm_syscore_device(virt_devs[i], true);
> >
> > Is this really correct? cpufreq is suspended/resumed by the PM core
> > during system wide suspend/resume. See dpm_suspend|resume(). Isn't
> > that sufficient?
> >
> > Moreover, it looks like the cpr genpd provider supports genpd's
> > ->power_on|off() callbacks. Is there something wrong with this, that I
> > am missing?
> >
>
> I think this question is a quite fundamental one. To explain this
> properly I will need to delve a bit into the implementation details of
> the two different GENPD providers that are applicable here:
>
> Fundamentally, we are describing the main power supply for the CPU here.
> Consider a simple regulator with adjustable voltage. From the Linux
> point of view this regulator should be marked as "regulator-always-on".
> If we would turn off this regulator, the CPU would be immediately dead
> without proper shutdown done by firmware or hardware.
>
> Representing the regulator as power domain does not change much, except
> that we now have abstract "performance states" instead of actual voltages.
> However, for power domains there is currently no generic mechanism like
> "regulator-always-on" in the DT, only drivers can specify
> GENPD_FLAG_ALWAYS_ON.

We have relied on genpd providers to act on their compatible strings
to make the correct configuration. If that isn't sufficient, I don't
see why we couldn't add a new DT property corresponding to
GENPD_FLAG_ALWAYS_ON.

>
> The special situation on MSM8909 is that there are two possible setups
> for the CPU power supply depending on the PMIC that is used (see
> "[PATCH 4/4] cpufreq: qcom-nvmem: Add MSM8909"): CPR or RPMPD. Both are
> GENPD providers so in theory we can just have either
>
>   cpu@0 { power-domains = <&cpr>; }; // or
>   cpu@0 { power-domains = <&rpmpd MSM8909_VDDCX_AO>; };
>
> in the DT, without handling this specifically on the cpufreq side.

Looks like it would be nice to get a patch for the MSM8909 DTS too, as
part of the series, to get a better picture of how this is going to be
used. Would that be possible for you to provide?

>
> The two GENPD providers behave quite differently though:
>
>  - CPR: CPR is not really a power domain itself. It's more like a monitor
>    on a power supply line coming from some other regulator. CPR provides
>    suggestions how to adjust the voltage for best power/stability.
>
>    The GENPD .power_off() disables the CPR state machine and forwards
>    this to the regulator with regulator_disable(). On QCS404 the
>    regulator is marked regulator-always-on, so it will never be disabled
>    from Linux. The SAW/SPM hardware component on Qualcomm SoCs will
>    usually disable the regulator during deep cpuidle states.

Parts of this sound a bit odd to me. The CPR/CPUfreq shouldn't really
need to vote for the CPU's power-rail(s) from a powered-on/off (CPU
idle states) point of view, but only from a performance (voltage
level) point of view.

If the enable/disable voting on the regulator really has an impact on
some platforms, it sounds like it could prevent deeper CPU idle states
too. That's probably not what we want, right?

I also had a look at the existing CPR genpd provider's probe
function/path (cpr_probe()) - and it turns out there is no call to
regulator_enable(). Whatever that means to us.

>
>  - RPMPD: This is the generic driver for all the SoC power domains
>    managed by the RPM firmware. It's not CPU-specific. However, as
>    special feature each power domain is exposed twice in Linux, e.g.
>    "MSM8909_VDDCX" and "MSM8909_VDDCX_AO". The _AO ("active-only")
>    variant tells the RPM firmware that the performance/enable vote only
>    applies when the CPU is active (not in deep cpuidle state).
>
>    The GENPD .power_off() drops all performance state votes and also
>    releases the "enable" vote for the power domain.
>
> Now, imagine what happens during system wide suspend/resume:
>
>  - CPR: The CPR state machine gets disabled. The voltage stays as-is.
>      - With "regulator-always-on": The CPU keeps running until WFI.
>      - Without: I would expect the CPU is dead immediately(?)

As I indicated above, I am starting to feel that this is a bit upside
down. CPR/CPUfreq should vote on voltages to scale performance, but
not for cpu idle states.

Perhaps what is missing is a synchronization point or a notification,
to inform the CPR driver that its state machine (registers) needs to
be saved/restored, when the power-rails get turned on/off. In fact, we
have a couple mechanisms at hand to support this.

>
>  - RPMPD: The performance/enable vote is dropped. The power domain might
>    go to minimal voltage or even turn off completely. However, the CPU
>    actually needs to keep running at the same frequency until WFI!
>    Worst case, the CPU is dead immediately when the power domain votes
>    get dropped.

Since RPMPD is managing the voting for both performance and low power
states for different kinds of devices, this certainly gets a bit more
complicated.

On the other hand, the CPUfreq driver should really only vote for
performance states for the CPUs and not for low power states. The
latter is a job for cpuidle and other consumers of the RPMPD to
manage, I think.

So, while thinking of this, I just realized that it may not always be
a good idea for genpd to cache a performance state request, for an
attached device and for which pm_runtime_suspended() returns true for
it. As this is the default behaviour in genpd, I am thinking that we
need a way to make that behaviour configurable for an attached device.
What do you think about that?

>
> In case of RPMPD, the votes must remain even during system wide suspend.
> The special _AO variant of the power domain tells the firmware to
> release the votes once the CPU has been shut down cleanly. It will also
> restore them once the CPU wakes up (long before the resume handlers run).
>
> My conclusion was that in both cases we want to keep the "power domain"
> enabled, since the CPU must keep running for a short while even after
> the system suspend handlers have been called.

It looks to me that the system wide suspend/resume case isn't really
much different from the runtime suspend/resume case.

It's not CPUfreq's role (by calling pm_runtime_resume_and_get(), for
example) to prevent the RPMPD from entering a low power state.

>
> Does this help with understanding the problem? It's a bit complicated. :D

Yes, I think so, thanks!

Although, I am afraid my response made this even more complicated. :-)

>
> Thanks!
> Stephan
>
> PS: This is essentially just another manifestation of a discussion we
> had a few times already over the years about where to enable power
> domains used by cpufreq, e.g. [1, 2, 3, 4]. Apparently I already
> mentioned back in 2021 already that QCS404 is broken [5] (I forgot
> about that :')).

Right, I recall these discussions now, thanks for the pointers.

Let's try to get to the bottom of this and figure out a proper solution.

>
> [1]: https://lore.kernel.org/linux-pm/YLi5N06Qs+gYHgYg@gerhold.net/
> [2]: https://lore.kernel.org/linux-pm/20200826093328.88268-1-stephan@gerhold.net/
> [3]: https://lore.kernel.org/linux-pm/20200730080146.25185-1-stephan@gerhold.net/
> [4]: https://lore.kernel.org/linux-arm-msm/20200426123140.GA190483@gerhold.net/
> [5]: https://lore.kernel.org/linux-pm/YLoTl7MfMfq2g10h@gerhold.net/

Kind regards
Uffe

  reply	other threads:[~2023-09-29 13:14 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-12  9:40 [PATCH 0/4] cpufreq: Add basic cpufreq scaling for Qualcomm MSM8909 Stephan Gerhold
2023-09-12  9:40 ` [PATCH 1/4] cpufreq: qcom-nvmem: Enable virtual power domain devices Stephan Gerhold
2023-09-13 10:56   ` Ulf Hansson
2023-09-13 12:26     ` Stephan Gerhold
2023-09-29 13:14       ` Ulf Hansson [this message]
2023-09-29 17:01         ` Stephan Gerhold
2023-10-12 11:33           ` Ulf Hansson
2023-10-12 18:43             ` Stephan Gerhold
2023-10-16 14:47               ` Ulf Hansson
2023-10-17 20:54                 ` Stephan Gerhold
2023-10-17 21:50                   ` Ulf Hansson
2023-10-18  7:13                     ` Stephan Gerhold
2023-11-28 12:41                 ` [PATCH 1/4] cpufreq: qcom-nvmem: Handling multiple power domains Stephan Gerhold
2023-12-04 10:59                   ` Ulf Hansson
2023-09-12  9:40 ` [PATCH 2/4] cpufreq: dt: platdev: Add MSM8909 to blocklist Stephan Gerhold
2023-09-12  9:53   ` Konrad Dybcio
2023-09-28  6:52   ` Viresh Kumar
2023-09-12  9:40 ` [PATCH 3/4] dt-bindings: cpufreq: qcom-nvmem: Document MSM8909 Stephan Gerhold
2023-09-12 18:29   ` Rob Herring
2023-09-28  6:54     ` Viresh Kumar
2023-09-12  9:40 ` [PATCH 4/4] cpufreq: qcom-nvmem: Add MSM8909 Stephan Gerhold
2023-09-12  9:59   ` Konrad Dybcio
2023-09-12 11:08     ` Stephan Gerhold

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPDyKFppdXe1AZo1jm2Bc_ZR18hw5Bmh1x+2P7Obhb_rJ2gc4Q@mail.gmail.com \
    --to=ulf.hansson@linaro.org \
    --cc=agross@kernel.org \
    --cc=andersson@kernel.org \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=ilia.lin@kernel.org \
    --cc=konrad.dybcio@linaro.org \
    --cc=krzysztof.kozlowski+dt@linaro.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=stephan.gerhold@kernkonzept.com \
    --cc=stephan@gerhold.net \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.