All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFD] Voltage dependencies for clocks (DVFS)
@ 2018-06-18 20:44 grahamr
  2018-07-02  5:13 ` Michael Turquette
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: grahamr @ 2018-06-18 20:44 UTC (permalink / raw)
  To: linux-clk, linux-pm, sboyd
  Cc: mturquette, dianders, ulf.hansson, viresh.kumar, Taniya Das,
	Rajendra Nayak, Amit Nischal

Hi All,

A number of vendors, including Qualcomm, support sophisticated digital 
voltage rail management based on the state (rate and enable) of each 
clock in the system.  In a Qualcomm SOC we require the ability to assert 
a vote on both a regulator enable and voltage corner (essentially a 
quantified voltage level) when clocks are enabled or have their rate 
set.  Importantly, if a clock is disabled, it must release its required 
voltage vote.
In general clients of the clock framework are not aware of the mapping 
between their performance level (clock rate) and voltage requirement, 
because it comes as part of the clock controller design and not the core 
design, and also because the clock controller elements themselves have 
voltage requirements that must be included in the relationship (i.e. 
PLLs need minimum voltage levels to operate at a given output 
frequency).

The solution we have deployed for many years is to associate a regulator 
and voltage level with each clock rate, and manage voting for the 
associated regulator in the clock framework during state changes.  There 
are two issues with this solution that have been raised by maintainers:

1. The use of the regulator framework to track voltage corners has been 
deemed invalid.  We are in the process of changing our implementation to 
use genpd, which is being modified to support aggregation, and is viewed 
as acceptable in principle.
2. In some SOCs, control over voltages may itself require enabling or 
configuring a clock (think an I2C regulator for example).  In such a 
design, the regulator->clock and clock->regulator dependencies 
introduces a potential deadlock.

The second concern above is the more challenging.  A proposal to use OPP 
to manage core DVFS has been made, but suffers from three major 
problems: first, it requires clients to use OPP for setting their rate, 
but the clock framework for other control (i.e. enable or disable). Not 
only is this confusing and (I would claim) ugly, it would also be very 
easy for clients to accidentally call clk_set_rate directly instead of 
going via OPP, resulting in invalid DVFS operating points.  The second 
problem is that OPP does not allow for removing a voltage vote entirely 
when the clock is disabled - which is a fundamental requirement for the 
design to be power optimal.  The third problem is that semantically 
using OPP for requesting specific functional frequencies (i.e. for a  
serial bus) is an abuse of that framework.  It requires the clients to 
find the "performance level" that matches the specific frequency they 
require, then request that performance level - when what they really 
want is to "set rate" on the clock to a specific, known, frequency.

We have avoided the deadlock concern in Qualcomm SOCs by never having a 
clock dependency to control a voltage rail.  This is admittedly not a 
good general solution (although it does appear to be perfectly 
acceptable for current SOCs that have such clock/voltage requirements).

So the question to everybody is how to manage clock-driven DVFS in a 
safe and acceptable fashion?  Would an integrated solution in the clock 
framework with a requirement that SOCs using it not have a 
regulator->clock path be acceptable (it works for us at least)?

Thanks,
Graham

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-06-18 20:44 [RFD] Voltage dependencies for clocks (DVFS) grahamr
@ 2018-07-02  5:13 ` Michael Turquette
  2018-07-04  6:55 ` Viresh Kumar
  2018-07-05  8:19 ` Peter De Schrijver
  2 siblings, 0 replies; 34+ messages in thread
From: Michael Turquette @ 2018-07-02  5:13 UTC (permalink / raw)
  To: grahamr, linux-clk, linux-pm, sboyd
  Cc: dianders, ulf.hansson, viresh.kumar, Taniya Das, Rajendra Nayak,
	Amit Nischal

Hi Graham,

Thanks for bringing up this issue. It has been near and dear to my heart
for a decade. Apologies in advance for the long-winded replies and
occasional memes.

Quoting grahamr@codeaurora.org (2018-06-18 13:44:39)
> Hi All,
> =

> A number of vendors, including Qualcomm, support sophisticated digital =

> voltage rail management based on the state (rate and enable) of each =

> clock in the system.  In a Qualcomm SOC we require the ability to assert =

> a vote on both a regulator enable and voltage corner (essentially a =

> quantified voltage level) when clocks are enabled or have their rate =

> set. =


Who votes on the voltage control hardware? Is it the device that
consumes the clk and voltage resources, or does the clk hardware vote on
the regulator hardware on behalf of the consuming device?

Are there details of the QCOM-specific DVFS/DCVS scheme and sequencing
somewhere in a public doc? (if not, ping me off list with the name of
the doc and I'll look it up in createpoint/chipcode)

> Importantly, if a clock is disabled, it must release its required =

> voltage vote.

OK, so a clock votes on a regulator ... still would be good to
understand this in more detail.

> In general clients of the clock framework are not aware of the mapping =

> between their performance level (clock rate) and voltage requirement, =

> because it comes as part of the clock controller design and not the core =

> design,

Not caring about voltage seems to be common. Client devices know what
performance (or clock rate) that they want and voltage is just a
function of the requested performance level/clock rate.

The OPP library was designed with this in mind and this reality is
reflected in the fact that there are dev_pm_opp_find_freq_* functions,
but no dev_pm_opp_find_volt_* functions. Consumer drivers give zero
shits about voltage.

> and also because the clock controller elements themselves have =

> voltage requirements that must be included in the relationship (i.e. =

> PLLs need minimum voltage levels to operate at a given output =

> frequency).

Expanding on my missive above, a clock controller can be considered Just
Another Device with performance states that have a corresponding voltage
requirement.

Put another way, if a certain voltage is needed for correct clock signal
generation, then the clock generator hardware is the consumer device
that consumes this voltage.

> =

> The solution we have deployed for many years is to associate a regulator =

> and voltage level with each clock rate, and manage voting for the =

> associated regulator in the clock framework during state changes.  There =


Hmm, so the entry point for touching the voltage control hardware is the
clock framework? Sounds perilous.

Does it not make sense on QCOM hardware to track voltage requirements
based on the *consumer* device requirements? The scheme you're
describing is more like attributing the voltage requirements *to the
clock*. In my experiences, this likely does not model reality.

Are you not setting the voltage level for the power that is distributed
to the transistors associated with the consumer device? This would imply
that the voltage level is a property of the consumer device, no? Of
course the voltage and clock rate are a pair (tuple), but the ownership
of this property is the consumer device, right?

In your out-of-tree solutions, what happens if two distinct consumer
devices share a clock? Does the regulator only get voted on once (by the
clock), or twice (by the two consumer devices)?

> are two issues with this solution that have been raised by maintainers:
> =

> 1. The use of the regulator framework to track voltage corners has been =

> deemed invalid.  We are in the process of changing our implementation to =


Can you provide a link to the thread explaining that? Would be helpful
to me.

> use genpd, which is being modified to support aggregation, and is viewed =

> as acceptable in principle.

Yeah, now that genpd has a concept of performance[0][1], this makes
sense.  However that begs the question of whether the genpd should be a
superset of a {clk,regulator} tuple?

Why not use regulators to model the voltage hardware and then group
clk+regulator inside of a genpd for a "performance domain" or
"performance island"?

And Linux power domains can be nested, so you could possibly have a
"performance domain" genpd that is made of a clock paired with a
"voltage domain" genpd instead of using the regulator framework. For the
rest of this email I'll refer to "regulators", but you can just
substitute that with "voltage domain genpd" and it still works.

I should probably make it clear here that I have always felt that the
clock framework in Linux is a framework for managing some of the lowest
level of hardware primitives. I don't consider the clk framework to be a
good point of entry for a higher level DVFS/DCVS solution.

I've always imagined that a DVFS framework would come along and glue the
clk and regulator and scpi and opp and p-state and wtf frameworks
together into something coherent. Perhaps performance-aware genpd is the
hero we deserve?

> 2. In some SOCs, control over voltages may itself require enabling or =

> configuring a clock (think an I2C regulator for example).  In such a =

> design, the regulator->clock and clock->regulator dependencies =

> introduces a potential deadlock.

This is true if the clock framework is the entry point for DVFS
transitions. Ie:

clk_set_rate(target_clk)
-> regulator_set_voltage
   -> clk_prepare_enable(i2c_clk)
   -> clk_set_rate(i2c_clk)

But if regulators as a primitive are peers of clocks, and not children,
and if we group them all up under a performance-aware genpd we end up
with something like:


(increasing performance)
dev_pm_genpd_set_performance_state
-> regulator_set_voltage
   -> clk_prepare_enable(i2c_clk)
   -> clk_set_rate(i2c_clk)
-> clk_set_rate(target_clk)

(decreasing performance)
dev_pm_genpd_set_performance_state
-> clk_set_rate(target_clk)
-> regulator_set_voltage
   -> clk_prepare_enable(i2c_clk)
   -> clk_set_rate(i2c_clk)

> =

> The second concern above is the more challenging.  A proposal to use OPP =

> to manage core DVFS has been made, but suffers from three major =

> problems: first, it requires clients to use OPP for setting their rate, =

> but the clock framework for other control (i.e. enable or disable). Not =

> only is this confusing and (I would claim) ugly, it would also be very =

> easy for clients to accidentally call clk_set_rate directly instead of =

> going via OPP, resulting in invalid DVFS operating points.  The second =


Yes, yes, it would be interesting to have a framework that handled both
performance and idle ... oh wait genpd!

> problem is that OPP does not allow for removing a voltage vote entirely =

> when the clock is disabled - which is a fundamental requirement for the =

> design to be power optimal. =


I'll take your word on that. Not sure I follow, but I'm fine with OPP
not being the lynchpin of this whole thing. If the performance-aware
genpd implementation mandates genpd then it is broken and needs fixing
(I haven't had time to look).

> The third problem is that semantically =

> using OPP for requesting specific functional frequencies (i.e. for a  =

> serial bus) is an abuse of that framework.  It requires the clients to =

> find the "performance level" that matches the specific frequency they =

> require, then request that performance level - when what they really =

> want is to "set rate" on the clock to a specific, known, frequency.

Agreed, OPP library is pretty limited by design and doesn't fit all
cases. However I do feel that the clock framework should be used for
clock-only use cases (e.g. audio codec) and some dvfs framework should
be used for other use cases that may perform dvfs (such as using some
sort of genpd, per-device pm_qos and maybe runtime pm interface).
Performance-aware genpd is worth serious consideration here.

The devil is is in the details as always. Consider an audio codec that
just needs to configure a clock, and the clock generator hardware itself
has a voltage requirement for a given frequency or hardware
configuration, but the audio codec has no such voltage requirement (e.g.
external audio chip that consumes clk signal via clkin pin). =


In this case I would expect the clk provider driver to use whatever
performance framework is required to achieve its clk rate, but the audio
codec driver could just use the clk consumer apis like normal without
consideration for DVFS or voltage.

> =

> We have avoided the deadlock concern in Qualcomm SOCs by never having a =

> clock dependency to control a voltage rail.  This is admittedly not a =

> good general solution (although it does appear to be perfectly =

> acceptable for current SOCs that have such clock/voltage requirements).
> =

> So the question to everybody is how to manage clock-driven DVFS in a =

> safe and acceptable fashion?  Would an integrated solution in the clock =

> framework with a requirement that SOCs using it not have a =

> regulator->clock path be acceptable (it works for us at least)?

I'll repeat myself: my current frame of mind is that the clk framework
manages low level primitives and is not a good entry point for higher
level operations such as DVFS. I believe that the DVFS solution should
be a higher level framework that make calls into the clock framework,
and possibly the regulator framework or whatever.

I'm open to changing my mind on that once we've gathered some more
requirements from other SoC vendors and whatnot. But for now I don't
think adding voltage control to the clk framework is great.

Regarding the locking, if the clock framework locking is somehow broken
for your use case then maybe we need to rewrite the clock framework
locking scheme (ugh) but everything is on the table. Then again if we
consider the clk framework as a peer/sibling to the regulator framework,
and encapsulate the whole sequence in a higher level abstraction (like
genpd), then I wonder if your locking woes disappear?

Regards,
Mike

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm=
it/?id=3D42f6284ae602469762ee721ec31ddfc6170e00bc
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm=
it/?id=3D42f6284ae602469762ee721ec31ddfc6170e00b://lkml.org/lkml/2017/10/11=
/66 =


> =

> Thanks,
> Graham

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-06-18 20:44 [RFD] Voltage dependencies for clocks (DVFS) grahamr
  2018-07-02  5:13 ` Michael Turquette
@ 2018-07-04  6:55 ` Viresh Kumar
  2018-07-04 12:50   ` Ulf Hansson
  2018-07-05  8:19 ` Peter De Schrijver
  2 siblings, 1 reply; 34+ messages in thread
From: Viresh Kumar @ 2018-07-04  6:55 UTC (permalink / raw)
  To: grahamr
  Cc: linux-clk, linux-pm, sboyd, mturquette, dianders, ulf.hansson,
	Taniya Das, Rajendra Nayak, Amit Nischal, vincent.guittot,
	amit.kucheria

On 18-06-18, 13:44, grahamr@codeaurora.org wrote:
> The second concern above is the more challenging.  A proposal to use OPP to
> manage core DVFS has been made, but suffers from three major problems:

A bit of history first on how things evolved. The OPP core was
initially designed to be a data provider, which can be used by other
framework/drivers like cpufreq, etc.

Sometime back we realized that we can get rid of some code that is
duplicated across drivers but is essentially doing the same thing,
programming clock and regulators. We looked at the best place to put
that functionality and OPP core was the place we chose.

> first, it requires clients to use OPP for setting their rate, but the clock
> framework for other control (i.e. enable or disable). Not only is this
> confusing and (I would claim) ugly, it would also be very easy for clients
> to accidentally call clk_set_rate directly instead of going via OPP,
> resulting in invalid DVFS operating points.

I agree to this problem. It would be more correct and convenient to
have things controlled by a single entity.

> The second problem is that OPP
> does not allow for removing a voltage vote entirely when the clock is
> disabled - which is a fundamental requirement for the design to be power
> optimal.

This should be done by the entity which takes care of
enabling/disabling the resources and its the drivers for now. So yeah
it is the same problem as the first one I would say.

> The third problem is that semantically using OPP for requesting
> specific functional frequencies (i.e. for a  serial bus) is an abuse of that
> framework.  It requires the clients to find the "performance level" that
> matches the specific frequency they require, then request that performance
> level - when what they really want is to "set rate" on the clock to a
> specific, known, frequency.

This is the prototype of that routine:

int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);

All we need is the target frequency and not an OPP.

Maybe we can solve the problems you raised with the OPP core like
this:

- On a call to dev_pm_opp_set_rate(), we check the currently cached
  frequency value. If it is unknown or 0, we call clk_enable() as
  well..

- When the drivers want things to shut down, they call
  dev_pm_opp_set_rate(dev, 0) and on this we drop regulator vote, etc,
  and call clk_disable().

Will that be acceptable to everyone ?

-- 
viresh

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-04  6:55 ` Viresh Kumar
@ 2018-07-04 12:50   ` Ulf Hansson
  2018-07-04 12:54     ` Rafael J. Wysocki
  2018-07-20 17:12     ` Stephen Boyd
  0 siblings, 2 replies; 34+ messages in thread
From: Ulf Hansson @ 2018-07-04 12:50 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: grahamr, linux-clk, Linux PM, Stephen Boyd, Mike Turquette,
	Doug Anderson, Taniya Das, Rajendra Nayak, Amit Nischal,
	Vincent Guittot, Amit Kucheria

[...]

>> The third problem is that semantically using OPP for requesting
>> specific functional frequencies (i.e. for a  serial bus) is an abuse of that
>> framework.  It requires the clients to find the "performance level" that
>> matches the specific frequency they require, then request that performance
>> level - when what they really want is to "set rate" on the clock to a
>> specific, known, frequency.
>
> This is the prototype of that routine:
>
> int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
>
> All we need is the target frequency and not an OPP.
>
> Maybe we can solve the problems you raised with the OPP core like
> this:
>
> - On a call to dev_pm_opp_set_rate(), we check the currently cached
>   frequency value. If it is unknown or 0, we call clk_enable() as
>   well..
>
> - When the drivers want things to shut down, they call
>   dev_pm_opp_set_rate(dev, 0) and on this we drop regulator vote, etc,
>   and call clk_disable().
>
> Will that be acceptable to everyone ?

For a cross SoC driver, the device may not be hooked up to an OPP
table. Then it needs to manage the clock anyways. I guess we can do
that for those cases that needs it.

Another option is to use the ->runtime_suspend|resume() callbacks at
the genpd level. In principle when the device enters runtime suspend,
genpd can decide to drop the votes for the device (if some exist). At
runtime resume, genpd then should re-store the votes for the device it
had before it became runtime suspended (or if the vote became updated
in-between).

In that way, the driver only needs to call dev_pm_opp_set_rate() when
it calls clk_set_rate().

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-04 12:50   ` Ulf Hansson
@ 2018-07-04 12:54     ` Rafael J. Wysocki
  2018-07-04 12:58       ` Ulf Hansson
  2018-07-20 17:12     ` Stephen Boyd
  1 sibling, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2018-07-04 12:54 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Viresh Kumar, grahamr, linux-clk, Linux PM, Stephen Boyd,
	Mike Turquette, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria

On Wed, Jul 4, 2018 at 2:50 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]
>
>>> The third problem is that semantically using OPP for requesting
>>> specific functional frequencies (i.e. for a  serial bus) is an abuse of that
>>> framework.  It requires the clients to find the "performance level" that
>>> matches the specific frequency they require, then request that performance
>>> level - when what they really want is to "set rate" on the clock to a
>>> specific, known, frequency.
>>
>> This is the prototype of that routine:
>>
>> int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
>>
>> All we need is the target frequency and not an OPP.
>>
>> Maybe we can solve the problems you raised with the OPP core like
>> this:
>>
>> - On a call to dev_pm_opp_set_rate(), we check the currently cached
>>   frequency value. If it is unknown or 0, we call clk_enable() as
>>   well..
>>
>> - When the drivers want things to shut down, they call
>>   dev_pm_opp_set_rate(dev, 0) and on this we drop regulator vote, etc,
>>   and call clk_disable().
>>
>> Will that be acceptable to everyone ?
>
> For a cross SoC driver, the device may not be hooked up to an OPP
> table. Then it needs to manage the clock anyways. I guess we can do
> that for those cases that needs it.
>
> Another option is to use the ->runtime_suspend|resume() callbacks at
> the genpd level. In principle when the device enters runtime suspend,
> genpd can decide to drop the votes for the device (if some exist). At
> runtime resume, genpd then should re-store the votes for the device it
> had before it became runtime suspended (or if the vote became updated
> in-between).
>
> In that way, the driver only needs to call dev_pm_opp_set_rate() when
> it calls clk_set_rate().

You'd need that for system-wide PM too I guess.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-04 12:54     ` Rafael J. Wysocki
@ 2018-07-04 12:58       ` Ulf Hansson
  0 siblings, 0 replies; 34+ messages in thread
From: Ulf Hansson @ 2018-07-04 12:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, grahamr, linux-clk, Linux PM, Stephen Boyd,
	Mike Turquette, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria

On 4 July 2018 at 14:54, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Wed, Jul 4, 2018 at 2:50 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> [...]
>>
>>>> The third problem is that semantically using OPP for requesting
>>>> specific functional frequencies (i.e. for a  serial bus) is an abuse of that
>>>> framework.  It requires the clients to find the "performance level" that
>>>> matches the specific frequency they require, then request that performance
>>>> level - when what they really want is to "set rate" on the clock to a
>>>> specific, known, frequency.
>>>
>>> This is the prototype of that routine:
>>>
>>> int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
>>>
>>> All we need is the target frequency and not an OPP.
>>>
>>> Maybe we can solve the problems you raised with the OPP core like
>>> this:
>>>
>>> - On a call to dev_pm_opp_set_rate(), we check the currently cached
>>>   frequency value. If it is unknown or 0, we call clk_enable() as
>>>   well..
>>>
>>> - When the drivers want things to shut down, they call
>>>   dev_pm_opp_set_rate(dev, 0) and on this we drop regulator vote, etc,
>>>   and call clk_disable().
>>>
>>> Will that be acceptable to everyone ?
>>
>> For a cross SoC driver, the device may not be hooked up to an OPP
>> table. Then it needs to manage the clock anyways. I guess we can do
>> that for those cases that needs it.
>>
>> Another option is to use the ->runtime_suspend|resume() callbacks at
>> the genpd level. In principle when the device enters runtime suspend,
>> genpd can decide to drop the votes for the device (if some exist). At
>> runtime resume, genpd then should re-store the votes for the device it
>> had before it became runtime suspended (or if the vote became updated
>> in-between).
>>
>> In that way, the driver only needs to call dev_pm_opp_set_rate() when
>> it calls clk_set_rate().
>
> You'd need that for system-wide PM too I guess.

Yep, but I don't think it's that critical as I think the PM domain
will be powered off anyways for that scenario. Of course, unless it
stays powered on because of some wakeup settings, but I doubt that.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-06-18 20:44 [RFD] Voltage dependencies for clocks (DVFS) grahamr
  2018-07-02  5:13 ` Michael Turquette
  2018-07-04  6:55 ` Viresh Kumar
@ 2018-07-05  8:19 ` Peter De Schrijver
  2 siblings, 0 replies; 34+ messages in thread
From: Peter De Schrijver @ 2018-07-05  8:19 UTC (permalink / raw)
  To: grahamr
  Cc: linux-clk, linux-pm, sboyd, mturquette, dianders, ulf.hansson,
	viresh.kumar, Taniya Das, Rajendra Nayak, Amit Nischal

On Mon, Jun 18, 2018 at 01:44:39PM -0700, grahamr@codeaurora.org wrote:
> Hi All,
> 
> A number of vendors, including Qualcomm, support sophisticated
> digital voltage rail management based on the state (rate and enable)
> of each clock in the system.  In a Qualcomm SOC we require the
> ability to assert a vote on both a regulator enable and voltage
> corner (essentially a quantified voltage level) when clocks are
> enabled or have their rate set.  Importantly, if a clock is
> disabled, it must release its required voltage vote.
> In general clients of the clock framework are not aware of the
> mapping between their performance level (clock rate) and voltage
> requirement, because it comes as part of the clock controller design
> and not the core design, and also because the clock controller
> elements themselves have voltage requirements that must be included
> in the relationship (i.e. PLLs need minimum voltage levels to
> operate at a given output frequency).
> 
> The solution we have deployed for many years is to associate a
> regulator and voltage level with each clock rate, and manage voting
> for the associated regulator in the clock framework during state
> changes.  There are two issues with this solution that have been
> raised by maintainers:
> 
> 1. The use of the regulator framework to track voltage corners has
> been deemed invalid.  We are in the process of changing our
> implementation to use genpd, which is being modified to support
> aggregation, and is viewed as acceptable in principle.
> 2. In some SOCs, control over voltages may itself require enabling
> or configuring a clock (think an I2C regulator for example).  In
> such a design, the regulator->clock and clock->regulator
> dependencies introduces a potential deadlock.
> 
> The second concern above is the more challenging.  A proposal to use
> OPP to manage core DVFS has been made, but suffers from three major
> problems: first, it requires clients to use OPP for setting their
> rate, but the clock framework for other control (i.e. enable or
> disable). Not only is this confusing and (I would claim) ugly, it
> would also be very easy for clients to accidentally call
> clk_set_rate directly instead of going via OPP, resulting in invalid
> DVFS operating points.  The second problem is that OPP does not
> allow for removing a voltage vote entirely when the clock is
> disabled - which is a fundamental requirement for the design to be
> power optimal.  The third problem is that semantically using OPP for
> requesting specific functional frequencies (i.e. for a  serial bus)
> is an abuse of that framework.  It requires the clients to find the
> "performance level" that matches the specific frequency they
> require, then request that performance level - when what they really
> want is to "set rate" on the clock to a specific, known, frequency.
> 
> We have avoided the deadlock concern in Qualcomm SOCs by never
> having a clock dependency to control a voltage rail.  This is
> admittedly not a good general solution (although it does appear to
> be perfectly acceptable for current SOCs that have such
> clock/voltage requirements).
> 
> So the question to everybody is how to manage clock-driven DVFS in a
> safe and acceptable fashion?  Would an integrated solution in the
> clock framework with a requirement that SOCs using it not have a
> regulator->clock path be acceptable (it works for us at least)?

The way we solved this in Tegra is to have the DVFS layer use notifiers for
clock rate changes and call directly into the DVFS layer in the prepare and
unprepare clock methods to indicate a voltage request can be dropped. On
Tegra210 we also have temperature dependend OPPs. For this reason we don't
user the OPP layer for rails where we support temperature dependency.

Peter.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-04 12:50   ` Ulf Hansson
  2018-07-04 12:54     ` Rafael J. Wysocki
@ 2018-07-20 17:12     ` Stephen Boyd
  2018-07-20 17:56       ` Michael Turquette
                         ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Stephen Boyd @ 2018-07-20 17:12 UTC (permalink / raw)
  To: Ulf Hansson, Viresh Kumar
  Cc: grahamr, linux-clk, Linux PM, Mike Turquette, Doug Anderson,
	Taniya Das, Rajendra Nayak, Amit Nischal, Vincent Guittot,
	Amit Kucheria

Quoting Ulf Hansson (2018-07-04 05:50:40)
> [...]
> =

> >> The third problem is that semantically using OPP for requesting
> >> specific functional frequencies (i.e. for a  serial bus) is an abuse o=
f that
> >> framework.  It requires the clients to find the "performance level" th=
at
> >> matches the specific frequency they require, then request that perform=
ance
> >> level - when what they really want is to "set rate" on the clock to a
> >> specific, known, frequency.
> >
> > This is the prototype of that routine:
> >
> > int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
> >
> > All we need is the target frequency and not an OPP.
> >
> > Maybe we can solve the problems you raised with the OPP core like
> > this:
> >
> > - On a call to dev_pm_opp_set_rate(), we check the currently cached
> >   frequency value. If it is unknown or 0, we call clk_enable() as
> >   well..
> >
> > - When the drivers want things to shut down, they call
> >   dev_pm_opp_set_rate(dev, 0) and on this we drop regulator vote, etc,
> >   and call clk_disable().
> >
> > Will that be acceptable to everyone ?
> =

> For a cross SoC driver, the device may not be hooked up to an OPP
> table. Then it needs to manage the clock anyways. I guess we can do
> that for those cases that needs it.

Right. Something like the USB dwc3 driver will need to be changed and
we'll see how it looks there.

Forcing OPP as the entry point to DVFS on every driver doesn't seem like
a great idea. Some drivers just want to do clk API operations like they
always have but now we need to throw in voltage scaling along with those
clk operations. OPP is a convenient way to make sure drivers can't mess
up the order of operations, agreed, but is that really that hard to get
right? Converting all those drivers to the OPP APIs makes for
contortions and I believe Graham doesn't think we should contort drivers
that way unless they're using devfreq for their clk frequency and
voltage scaling.

Plus, we're adding new and exciting clk APIs around the clk_set_rate()
path. We have min/max variants of clk_set_rate() and we have a way to
set a rate on a clk and make sure nobody else can change it with
clk_set_rate_exclusive(). We would need to proxy all of those clk APIs
through the OPP layer to cater to all the clk clients out there. All of
that to avoid drivers ordering things incorrectly, when we already have
drivers handling the order of operations correctly for powering on
regulators and asserting resets at the right times. I just don't see
what's wrong with drivers doing everything here themselves.

For one thing, a driver should be able to figure out what the
performance state requirement is for a particular frequency. I'd like to
see an API that a driver can pass something like a (device, genpd, clk,
frequency) tuple and get back the performance state required for that
device's clk frequency within that genpd by querying OPP tables. If we
had this API, then SoC vendors could design OPP tables for their on-SoC
devices that describe the set of max frequencies a device can operate at
for a specific performance state and driver authors would be able to
query that information and manually set genpd performance states when
they change clk frequencies. In Qualcomm designs this would be their
"fmax" tables that map a max frequency to a voltage corner. If someone
wanted to fine tune that table and make it into a full frequency plan
OPP table for use by devfreq, then they could add more entries for all
the validated frequencies and voltage corners that are acceptable and
tested and this API would still work. We'll need this sort of table
regardless because we can't expect devices to search for an exact
frequency in an OPP table when they can support hundreds of different
frequencies, like in display or audio situations.

Put succinctly, OPP needs to support a range of frequencies in addition
to the set of frequencies it can support today and it needs to allow
drivers to query that information.

> =

> Another option is to use the ->runtime_suspend|resume() callbacks at
> the genpd level. In principle when the device enters runtime suspend,
> genpd can decide to drop the votes for the device (if some exist). At
> runtime resume, genpd then should re-store the votes for the device it
> had before it became runtime suspended (or if the vote became updated
> in-between).

I'd prefer to see the genpd layer do it instead of OPP layer controlling
the on/off state. Calling dev_pm_opp_set_rate(dev, 0) just looks very
weird and conflates clk frequency with power domain state which isn't
exactly what we want. We want it to be related to the frequency _and_
the enabled state. Plus it's like saying clk_set_rate(clk, 0) should
turn the clk off, which we've never done before.

It would also be good to handle the on/off state in genpd because the
genpd could have a governor attached to it that blocks on/off requests
from drivers if the governor decides that the device is clk gating on
and off quickly but it's not beneficial to change the voltage during the
quick gating scenario because we spend more time changing voltage than
clk gating. Connecting this all to genpd allows us to use the device
power state as a signal to adjust the voltage, but leaves us flexibility
to override it if it makes sense policy wise.

And please remember the other point here that many devices may all be
sharing the same power domain and therefore the voltage that the domain
runs at depends on many different clk frequencies for different devices
and if those clks are enabled or not, aggregated into a single voltage
requirement for the power domain. The runtime suspend state might be
part of that aggregation equation, so that if a device's clk is off and
it's been suspended for some time the voltage requirement for that clk
frequency could be dropped, but if the device is resumed and the clk is
still off then the voltage requirement can be ignored. We only need to
make sure the power domain is at a high enough voltage when the clk is
actually enabled though, so tying the voltage requirement drop to
runtime PM of the device is not exactly correct.

It may be that a device is composed of multiple power domains and those
power domains each have their own clks with voltage requirements too.
In this scenario, connecting the voltage requirement of the genpd to the
runtime PM state of the device doesn't make sense at all. We'll get into
a situation where half the device is "off" because a power domain and
the clk for that power domain are off but we can't drop the voltage
requirement because the overall device isn't runtime suspended. It's
like the single genpd per device problem all over again. Let's not do
that.

So I'd rather see drivers calling some genpd API directly to indicate
that the clk should turn off and the voltage domain constraint has gone
away. We might want to transform OPP into a genpd of its own and then
connect that OPP genpd to a clk domain genpd for some clk
on/off/frequency part and a power domain for the voltage/corner scaling
part. If we did that, then everything is a genpd that gets all
aggregated in the genpd layer and drivers that only use OPP would be OK
with their runtime PM state of their device affecting the OPP genpd's
on/off state of clks and voltage requirements. Drivers that want finer
control would be able to use some new genpd API to turn on and off the
genpd when they want to.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-20 17:12     ` Stephen Boyd
@ 2018-07-20 17:56       ` Michael Turquette
  2018-07-24 23:13         ` Stephen Boyd
  2018-07-23  8:26       ` Peter De Schrijver
  2018-07-31 10:35       ` Ulf Hansson
  2 siblings, 1 reply; 34+ messages in thread
From: Michael Turquette @ 2018-07-20 17:56 UTC (permalink / raw)
  To: Stephen Boyd, Ulf Hansson, Viresh Kumar
  Cc: grahamr, linux-clk, Linux PM, Doug Anderson, Taniya Das,
	Rajendra Nayak, Amit Nischal, Vincent Guittot, Amit Kucheria

Quoting Stephen Boyd (2018-07-20 10:12:29)
> Quoting Ulf Hansson (2018-07-04 05:50:40)
> > [...]
> > =

> > >> The third problem is that semantically using OPP for requesting
> > >> specific functional frequencies (i.e. for a  serial bus) is an abuse=
 of that
> > >> framework.  It requires the clients to find the "performance level" =
that
> > >> matches the specific frequency they require, then request that perfo=
rmance
> > >> level - when what they really want is to "set rate" on the clock to a
> > >> specific, known, frequency.
> > >
> > > This is the prototype of that routine:
> > >
> > > int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq=
);
> > >
> > > All we need is the target frequency and not an OPP.
> > >
> > > Maybe we can solve the problems you raised with the OPP core like
> > > this:
> > >
> > > - On a call to dev_pm_opp_set_rate(), we check the currently cached
> > >   frequency value. If it is unknown or 0, we call clk_enable() as
> > >   well..
> > >
> > > - When the drivers want things to shut down, they call
> > >   dev_pm_opp_set_rate(dev, 0) and on this we drop regulator vote, etc,
> > >   and call clk_disable().
> > >
> > > Will that be acceptable to everyone ?
> > =

> > For a cross SoC driver, the device may not be hooked up to an OPP
> > table. Then it needs to manage the clock anyways. I guess we can do
> > that for those cases that needs it.
> =

> Right. Something like the USB dwc3 driver will need to be changed and
> we'll see how it looks there.
> =

> Forcing OPP as the entry point to DVFS on every driver doesn't seem like
> a great idea. Some drivers just want to do clk API operations like they
> always have but now we need to throw in voltage scaling along with those
> clk operations. OPP is a convenient way to make sure drivers can't mess
> up the order of operations, agreed, but is that really that hard to get
> right? Converting all those drivers to the OPP APIs makes for
> contortions and I believe Graham doesn't think we should contort drivers
> that way unless they're using devfreq for their clk frequency and
> voltage scaling.
> =

> Plus, we're adding new and exciting clk APIs around the clk_set_rate()
> path. We have min/max variants of clk_set_rate() and we have a way to
> set a rate on a clk and make sure nobody else can change it with
> clk_set_rate_exclusive(). We would need to proxy all of those clk APIs
> through the OPP layer to cater to all the clk clients out there. All of
> that to avoid drivers ordering things incorrectly, when we already have
> drivers handling the order of operations correctly for powering on
> regulators and asserting resets at the right times. I just don't see
> what's wrong with drivers doing everything here themselves.
> =

> For one thing, a driver should be able to figure out what the
> performance state requirement is for a particular frequency. I'd like to
> see an API that a driver can pass something like a (device, genpd, clk,
> frequency) tuple and get back the performance state required for that
> device's clk frequency within that genpd by querying OPP tables. If we
> had this API, then SoC vendors could design OPP tables for their on-SoC
> devices that describe the set of max frequencies a device can operate at
> for a specific performance state and driver authors would be able to
> query that information and manually set genpd performance states when
> they change clk frequencies. In Qualcomm designs this would be their

I want to make sure I understand your proposal here. You want consumer
drivers to call the clk api's the manage clks, and you also want them to
use runtime/genpd apis to manage the power rails (with synchronicity
provided by a new api to set the right genpd/power rail state based on
clk freq). Do I have that right?

Sounds OK to me, but why not fold the clk_set_rate part into the genpd
as well? It's possible to have a genpd call clk_enable/clk_disable (see
include/linux/pm_clock.h, drivers/base/power/clock_ops.c and it's
various users under arch/arm/mach-* and drivers/*). Gating clocks within
a genpd callback feels analogous to setting clock rate.

Regards,
Mike

> "fmax" tables that map a max frequency to a voltage corner. If someone
> wanted to fine tune that table and make it into a full frequency plan
> OPP table for use by devfreq, then they could add more entries for all
> the validated frequencies and voltage corners that are acceptable and
> tested and this API would still work. We'll need this sort of table
> regardless because we can't expect devices to search for an exact
> frequency in an OPP table when they can support hundreds of different
> frequencies, like in display or audio situations.
> =

> Put succinctly, OPP needs to support a range of frequencies in addition
> to the set of frequencies it can support today and it needs to allow
> drivers to query that information.
> =

> > =

> > Another option is to use the ->runtime_suspend|resume() callbacks at
> > the genpd level. In principle when the device enters runtime suspend,
> > genpd can decide to drop the votes for the device (if some exist). At
> > runtime resume, genpd then should re-store the votes for the device it
> > had before it became runtime suspended (or if the vote became updated
> > in-between).
> =

> I'd prefer to see the genpd layer do it instead of OPP layer controlling
> the on/off state. Calling dev_pm_opp_set_rate(dev, 0) just looks very
> weird and conflates clk frequency with power domain state which isn't
> exactly what we want. We want it to be related to the frequency _and_
> the enabled state. Plus it's like saying clk_set_rate(clk, 0) should
> turn the clk off, which we've never done before.
> =

> It would also be good to handle the on/off state in genpd because the
> genpd could have a governor attached to it that blocks on/off requests
> from drivers if the governor decides that the device is clk gating on
> and off quickly but it's not beneficial to change the voltage during the
> quick gating scenario because we spend more time changing voltage than
> clk gating. Connecting this all to genpd allows us to use the device
> power state as a signal to adjust the voltage, but leaves us flexibility
> to override it if it makes sense policy wise.
> =

> And please remember the other point here that many devices may all be
> sharing the same power domain and therefore the voltage that the domain
> runs at depends on many different clk frequencies for different devices
> and if those clks are enabled or not, aggregated into a single voltage
> requirement for the power domain. The runtime suspend state might be
> part of that aggregation equation, so that if a device's clk is off and
> it's been suspended for some time the voltage requirement for that clk
> frequency could be dropped, but if the device is resumed and the clk is
> still off then the voltage requirement can be ignored. We only need to
> make sure the power domain is at a high enough voltage when the clk is
> actually enabled though, so tying the voltage requirement drop to
> runtime PM of the device is not exactly correct.
> =

> It may be that a device is composed of multiple power domains and those
> power domains each have their own clks with voltage requirements too.
> In this scenario, connecting the voltage requirement of the genpd to the
> runtime PM state of the device doesn't make sense at all. We'll get into
> a situation where half the device is "off" because a power domain and
> the clk for that power domain are off but we can't drop the voltage
> requirement because the overall device isn't runtime suspended. It's
> like the single genpd per device problem all over again. Let's not do
> that.
> =

> So I'd rather see drivers calling some genpd API directly to indicate
> that the clk should turn off and the voltage domain constraint has gone
> away. We might want to transform OPP into a genpd of its own and then
> connect that OPP genpd to a clk domain genpd for some clk
> on/off/frequency part and a power domain for the voltage/corner scaling
> part. If we did that, then everything is a genpd that gets all
> aggregated in the genpd layer and drivers that only use OPP would be OK
> with their runtime PM state of their device affecting the OPP genpd's
> on/off state of clks and voltage requirements. Drivers that want finer
> control would be able to use some new genpd API to turn on and off the
> genpd when they want to.
>=20

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-20 17:12     ` Stephen Boyd
  2018-07-20 17:56       ` Michael Turquette
@ 2018-07-23  8:26       ` Peter De Schrijver
  2018-07-24 23:04         ` Stephen Boyd
  2018-07-31 10:35       ` Ulf Hansson
  2 siblings, 1 reply; 34+ messages in thread
From: Peter De Schrijver @ 2018-07-23  8:26 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Ulf Hansson, Viresh Kumar, grahamr, linux-clk, Linux PM,
	Mike Turquette, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria

On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
> Quoting Ulf Hansson (2018-07-04 05:50:40)
> > [...]
> > 
> > >> The third problem is that semantically using OPP for requesting
> > >> specific functional frequencies (i.e. for a  serial bus) is an abuse of that
> > >> framework.  It requires the clients to find the "performance level" that
> > >> matches the specific frequency they require, then request that performance
> > >> level - when what they really want is to "set rate" on the clock to a
> > >> specific, known, frequency.
> > >
> > > This is the prototype of that routine:
> > >
> > > int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq);
> > >
> > > All we need is the target frequency and not an OPP.
> > >
> > > Maybe we can solve the problems you raised with the OPP core like
> > > this:
> > >
> > > - On a call to dev_pm_opp_set_rate(), we check the currently cached
> > >   frequency value. If it is unknown or 0, we call clk_enable() as
> > >   well..
> > >
> > > - When the drivers want things to shut down, they call
> > >   dev_pm_opp_set_rate(dev, 0) and on this we drop regulator vote, etc,
> > >   and call clk_disable().
> > >
> > > Will that be acceptable to everyone ?
> > 
> > For a cross SoC driver, the device may not be hooked up to an OPP
> > table. Then it needs to manage the clock anyways. I guess we can do
> > that for those cases that needs it.
> 
> Right. Something like the USB dwc3 driver will need to be changed and
> we'll see how it looks there.
> 
> Forcing OPP as the entry point to DVFS on every driver doesn't seem like
> a great idea. Some drivers just want to do clk API operations like they
> always have but now we need to throw in voltage scaling along with those
> clk operations. OPP is a convenient way to make sure drivers can't mess
> up the order of operations, agreed, but is that really that hard to get
> right? Converting all those drivers to the OPP APIs makes for
> contortions and I believe Graham doesn't think we should contort drivers
> that way unless they're using devfreq for their clk frequency and
> voltage scaling.
> 
> Plus, we're adding new and exciting clk APIs around the clk_set_rate()
> path. We have min/max variants of clk_set_rate() and we have a way to
> set a rate on a clk and make sure nobody else can change it with
> clk_set_rate_exclusive(). We would need to proxy all of those clk APIs
> through the OPP layer to cater to all the clk clients out there. All of
> that to avoid drivers ordering things incorrectly, when we already have
> drivers handling the order of operations correctly for powering on
> regulators and asserting resets at the right times. I just don't see
> what's wrong with drivers doing everything here themselves.
> 
> For one thing, a driver should be able to figure out what the
> performance state requirement is for a particular frequency. I'd like to
> see an API that a driver can pass something like a (device, genpd, clk,
> frequency) tuple and get back the performance state required for that
> device's clk frequency within that genpd by querying OPP tables. If we
> had this API, then SoC vendors could design OPP tables for their on-SoC
> devices that describe the set of max frequencies a device can operate at
> for a specific performance state and driver authors would be able to
> query that information and manually set genpd performance states when
> they change clk frequencies. In Qualcomm designs this would be their
> "fmax" tables that map a max frequency to a voltage corner. If someone
> wanted to fine tune that table and make it into a full frequency plan
> OPP table for use by devfreq, then they could add more entries for all
> the validated frequencies and voltage corners that are acceptable and
> tested and this API would still work. We'll need this sort of table
> regardless because we can't expect devices to search for an exact
> frequency in an OPP table when they can support hundreds of different
> frequencies, like in display or audio situations.
> 

Various reasons why I think the driver is not the right place to handle
the V/f relationship:

1) The V/f relationship is temperature dependent. So the voltage may have
   to be adjusted when the temperature changes. I don't think we should
   make every driver handle this on its own.

2) Not every device with V/f requirements has its own powerdomain. On Tegra
   for example we have 2 voltage rails: core and CPU. (and a 3rd one for GPU
   since Tegra124). So all peripherals (except GPU) share the same voltage
   rail and they are grouped in several domains, one of which cannot be
   powergated. So genpd domains do not align with the V/f curves of the
   peripherals themselves.

Peter.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-23  8:26       ` Peter De Schrijver
@ 2018-07-24 23:04         ` Stephen Boyd
  2018-07-25  5:44           ` Michael Turquette
  0 siblings, 1 reply; 34+ messages in thread
From: Stephen Boyd @ 2018-07-24 23:04 UTC (permalink / raw)
  To: Peter De Schrijver
  Cc: Ulf Hansson, Viresh Kumar, grahamr, linux-clk, Linux PM,
	Mike Turquette, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria

Quoting Peter De Schrijver (2018-07-23 01:26:41)
> On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
> > =

> > For one thing, a driver should be able to figure out what the
> > performance state requirement is for a particular frequency. I'd like to
> > see an API that a driver can pass something like a (device, genpd, clk,
> > frequency) tuple and get back the performance state required for that
> > device's clk frequency within that genpd by querying OPP tables. If we
> > had this API, then SoC vendors could design OPP tables for their on-SoC
> > devices that describe the set of max frequencies a device can operate at
> > for a specific performance state and driver authors would be able to
> > query that information and manually set genpd performance states when
> > they change clk frequencies. In Qualcomm designs this would be their
> > "fmax" tables that map a max frequency to a voltage corner. If someone
> > wanted to fine tune that table and make it into a full frequency plan
> > OPP table for use by devfreq, then they could add more entries for all
> > the validated frequencies and voltage corners that are acceptable and
> > tested and this API would still work. We'll need this sort of table
> > regardless because we can't expect devices to search for an exact
> > frequency in an OPP table when they can support hundreds of different
> > frequencies, like in display or audio situations.
> > =

> =

> Various reasons why I think the driver is not the right place to handle
> the V/f relationship:
> =

> 1) The V/f relationship is temperature dependent. So the voltage may have
>    to be adjusted when the temperature changes. I don't think we should
>    make every driver handle this on its own.

This is AVS? Should be fine to plumb that into some sort of voltage
domain that gets temperature feedback and then adjusts the voltage based
on that? This is basically the same as Qualcomm's "voltage corners" by
the way, just that the voltage is adjusted outside of the Linux kernel
by another processor when the temperature changes.

> =

> 2) Not every device with V/f requirements has its own powerdomain. On Teg=
ra
>    for example we have 2 voltage rails: core and CPU. (and a 3rd one for =
GPU
>    since Tegra124). So all peripherals (except GPU) share the same voltage
>    rail and they are grouped in several domains, one of which cannot be
>    powergated. So genpd domains do not align with the V/f curves of the
>    peripherals themselves.
> =


I'm fairly certain this is true on most SoCs today. There is a main
powerdomain for non-CPU things, and then some sort of CPU powerdomain or
domains for CPU things. Each device in those domains needs to request a
certain performance state on the voltage domain they're in (the "core"
powerdomain in your example) and then that genpd will aggregate those
requests with a max operation to pick the highest state required from
all devices attached to the genpd for the voltage domain.

How does power gating or not power gating the domain matter for this?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-20 17:56       ` Michael Turquette
@ 2018-07-24 23:13         ` Stephen Boyd
  2018-07-25  5:51           ` Michael Turquette
  0 siblings, 1 reply; 34+ messages in thread
From: Stephen Boyd @ 2018-07-24 23:13 UTC (permalink / raw)
  To: Michael Turquette, Ulf Hansson, Viresh Kumar
  Cc: grahamr, linux-clk, Linux PM, Doug Anderson, Taniya Das,
	Rajendra Nayak, Amit Nischal, Vincent Guittot, Amit Kucheria

Quoting Michael Turquette (2018-07-20 10:56:12)
> Quoting Stephen Boyd (2018-07-20 10:12:29)
> > =

> > For one thing, a driver should be able to figure out what the
> > performance state requirement is for a particular frequency. I'd like to
> > see an API that a driver can pass something like a (device, genpd, clk,
> > frequency) tuple and get back the performance state required for that
> > device's clk frequency within that genpd by querying OPP tables. If we
> > had this API, then SoC vendors could design OPP tables for their on-SoC
> > devices that describe the set of max frequencies a device can operate at
> > for a specific performance state and driver authors would be able to
> > query that information and manually set genpd performance states when
> > they change clk frequencies. In Qualcomm designs this would be their
> =

> I want to make sure I understand your proposal here. You want consumer
> drivers to call the clk api's the manage clks, and you also want them to
> use runtime/genpd apis to manage the power rails (with synchronicity
> provided by a new api to set the right genpd/power rail state based on
> clk freq). Do I have that right?

Yes. That is one option that I'd like to explore. We definitely have a
case where drivers don't set any rate at all, they just set a power rail
state with the performance state APIs. But it also seems complicated to
do it all through the OPP APIs, so maybe duplicating a bunch of stuff in
each driver will show that is good or bad.

> =

> Sounds OK to me, but why not fold the clk_set_rate part into the genpd
> as well? It's possible to have a genpd call clk_enable/clk_disable (see
> include/linux/pm_clock.h, drivers/base/power/clock_ops.c and it's
> various users under arch/arm/mach-* and drivers/*). Gating clocks within
> a genpd callback feels analogous to setting clock rate.
> =


I suppose it should be possible to make a genpd for the clk rate and clk
enable/disable state and put that behind the genpd performance state API
so that it mirrors the OPP set_rate API. Then it is analogous to setting
a rate and having a state of 0 mean the voltage requirement is gone. Of
course, then we've replaced the word 'rate' with 'state' and I'm all
happy now, which seems hypocritical! I semi-alluded to this idea in the
second half of my email where I said we can turn OPP into a genpd that
accepts performance state requests via the genpd performance state API,
so I'm happy to look at how this would work too. It would be nice for
drivers to just request a performance state and have that fall into a
genpd for the clk rate and enable/disable state and also go affect the
performance state of a voltage domain genpd. Everything could be done on
the backend in genpd then and drivers wouldn't do much with clks
directly.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-24 23:04         ` Stephen Boyd
@ 2018-07-25  5:44           ` Michael Turquette
  2018-07-25 11:27             ` Peter De Schrijver
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Turquette @ 2018-07-25  5:44 UTC (permalink / raw)
  To: Peter De Schrijver, Stephen Boyd
  Cc: Ulf Hansson, Viresh Kumar, grahamr, linux-clk, Linux PM,
	Doug Anderson, Taniya Das, Rajendra Nayak, Amit Nischal,
	Vincent Guittot, Amit Kucheria

Quoting Stephen Boyd (2018-07-24 16:04:37)
> Quoting Peter De Schrijver (2018-07-23 01:26:41)
> > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
> > > =

> > > For one thing, a driver should be able to figure out what the
> > > performance state requirement is for a particular frequency. I'd like=
 to
> > > see an API that a driver can pass something like a (device, genpd, cl=
k,
> > > frequency) tuple and get back the performance state required for that
> > > device's clk frequency within that genpd by querying OPP tables. If we
> > > had this API, then SoC vendors could design OPP tables for their on-S=
oC
> > > devices that describe the set of max frequencies a device can operate=
 at
> > > for a specific performance state and driver authors would be able to
> > > query that information and manually set genpd performance states when
> > > they change clk frequencies. In Qualcomm designs this would be their
> > > "fmax" tables that map a max frequency to a voltage corner. If someone
> > > wanted to fine tune that table and make it into a full frequency plan
> > > OPP table for use by devfreq, then they could add more entries for all
> > > the validated frequencies and voltage corners that are acceptable and
> > > tested and this API would still work. We'll need this sort of table
> > > regardless because we can't expect devices to search for an exact
> > > frequency in an OPP table when they can support hundreds of different
> > > frequencies, like in display or audio situations.
> > > =

> > =

> > Various reasons why I think the driver is not the right place to handle
> > the V/f relationship:
> > =

> > 1) The V/f relationship is temperature dependent. So the voltage may ha=
ve
> >    to be adjusted when the temperature changes. I don't think we should
> >    make every driver handle this on its own.
> =

> This is AVS? Should be fine to plumb that into some sort of voltage
> domain that gets temperature feedback and then adjusts the voltage based
> on that? This is basically the same as Qualcomm's "voltage corners" by
> the way, just that the voltage is adjusted outside of the Linux kernel
> by another processor when the temperature changes.

Ack to what Stephen said above. Adaptive voltage scaling, corners, body
bias, SMPS modes/efficiency, etc are all just implementation details.

I don't think anyone is suggesting for drivers to take all of the above
into account when setting voltage. I would imagine either a "nominal"
voltage, a voltage "index" or a performance state to be passed from the
driver into the genpd layer.

Peter would that work for you?

> =

> > =

> > 2) Not every device with V/f requirements has its own powerdomain. On T=
egra
> >    for example we have 2 voltage rails: core and CPU. (and a 3rd one fo=
r GPU
> >    since Tegra124). So all peripherals (except GPU) share the same volt=
age
> >    rail and they are grouped in several domains, one of which cannot be
> >    powergated. So genpd domains do not align with the V/f curves of the
> >    peripherals themselves.
> > =

> =

> I'm fairly certain this is true on most SoCs today. There is a main
> powerdomain for non-CPU things, and then some sort of CPU powerdomain or
> domains for CPU things. Each device in those domains needs to request a
> certain performance state on the voltage domain they're in (the "core"
> powerdomain in your example) and then that genpd will aggregate those
> requests with a max operation to pick the highest state required from
> all devices attached to the genpd for the voltage domain.
> =

> How does power gating or not power gating the domain matter for this?
> =


I'll go out on a limb and suggest that nested genpd's take care of
Peter's concern? In Peter's case there are power islands that can clamp
power during idle. Some devices have these, some do not. For
active/performance power management there are the scalable voltage
rails.

I think that in this (very common) pattern the right thing to do is is
have the performance genpds at the top level of the genpd hierarchy.
These map onto the scalable voltage rails for DVFS.

Nested within these performance genpds/rails are the power islands used
for power clamping during device idle.

Peter would that work for you?

Best regards,
Mike

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-24 23:13         ` Stephen Boyd
@ 2018-07-25  5:51           ` Michael Turquette
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Turquette @ 2018-07-25  5:51 UTC (permalink / raw)
  To: Stephen Boyd, Ulf Hansson, Viresh Kumar
  Cc: grahamr, linux-clk, Linux PM, Doug Anderson, Taniya Das,
	Rajendra Nayak, Amit Nischal, Vincent Guittot, Amit Kucheria

Quoting Stephen Boyd (2018-07-24 16:13:00)
> Quoting Michael Turquette (2018-07-20 10:56:12)
> > Quoting Stephen Boyd (2018-07-20 10:12:29)
> > > =

> > > For one thing, a driver should be able to figure out what the
> > > performance state requirement is for a particular frequency. I'd like=
 to
> > > see an API that a driver can pass something like a (device, genpd, cl=
k,
> > > frequency) tuple and get back the performance state required for that
> > > device's clk frequency within that genpd by querying OPP tables. If we
> > > had this API, then SoC vendors could design OPP tables for their on-S=
oC
> > > devices that describe the set of max frequencies a device can operate=
 at
> > > for a specific performance state and driver authors would be able to
> > > query that information and manually set genpd performance states when
> > > they change clk frequencies. In Qualcomm designs this would be their
> > =

> > I want to make sure I understand your proposal here. You want consumer
> > drivers to call the clk api's the manage clks, and you also want them to
> > use runtime/genpd apis to manage the power rails (with synchronicity
> > provided by a new api to set the right genpd/power rail state based on
> > clk freq). Do I have that right?
> =

> Yes. That is one option that I'd like to explore. We definitely have a
> case where drivers don't set any rate at all, they just set a power rail
> state with the performance state APIs. But it also seems complicated to
> do it all through the OPP APIs, so maybe duplicating a bunch of stuff in
> each driver will show that is good or bad.
> =

> > =

> > Sounds OK to me, but why not fold the clk_set_rate part into the genpd
> > as well? It's possible to have a genpd call clk_enable/clk_disable (see
> > include/linux/pm_clock.h, drivers/base/power/clock_ops.c and it's
> > various users under arch/arm/mach-* and drivers/*). Gating clocks within
> > a genpd callback feels analogous to setting clock rate.
> > =

> =

> I suppose it should be possible to make a genpd for the clk rate and clk
> enable/disable state and put that behind the genpd performance state API
> so that it mirrors the OPP set_rate API. Then it is analogous to setting
> a rate and having a state of 0 mean the voltage requirement is gone. Of
> course, then we've replaced the word 'rate' with 'state' and I'm all
> happy now, which seems hypocritical! I semi-alluded to this idea in the
> second half of my email where I said we can turn OPP into a genpd that
> accepts performance state requests via the genpd performance state API,
> so I'm happy to look at how this would work too. It would be nice for
> drivers to just request a performance state and have that fall into a
> genpd for the clk rate and enable/disable state and also go affect the
> performance state of a voltage domain genpd. Everything could be done on
> the backend in genpd then and drivers wouldn't do much with clks
> directly.
> =


Right, and for SoCs with complex power management hardware, I feel that
having a genpd backend that Knows All The Stuff will be really helpful
to the soc platform driver authors.

Lots of out-of-tree Whole Chip Power Management frameworks have shipped
in devices. I wonder if finally uniting performance and idle power
management in genpd will provide a path forward towards an
upstream-acceptable solution?

I think it would be ideal if we could nest all of the power primitives
within this layer (clocks, regulators, pmic/firmware messaging, avs,
etc) instead of having peer layers/subsystems.

Regards,
Mike

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-25  5:44           ` Michael Turquette
@ 2018-07-25 11:27             ` Peter De Schrijver
  2018-07-25 18:40               ` Michael Turquette
  2018-07-31 11:56               ` Ulf Hansson
  0 siblings, 2 replies; 34+ messages in thread
From: Peter De Schrijver @ 2018-07-25 11:27 UTC (permalink / raw)
  To: Michael Turquette
  Cc: Stephen Boyd, Ulf Hansson, Viresh Kumar, grahamr, linux-clk,
	Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria

On Tue, Jul 24, 2018 at 10:44:00PM -0700, Michael Turquette wrote:
> Quoting Stephen Boyd (2018-07-24 16:04:37)
> > Quoting Peter De Schrijver (2018-07-23 01:26:41)
> > > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
> > > > 
> > > > For one thing, a driver should be able to figure out what the
> > > > performance state requirement is for a particular frequency. I'd like to
> > > > see an API that a driver can pass something like a (device, genpd, clk,
> > > > frequency) tuple and get back the performance state required for that
> > > > device's clk frequency within that genpd by querying OPP tables. If we
> > > > had this API, then SoC vendors could design OPP tables for their on-SoC
> > > > devices that describe the set of max frequencies a device can operate at
> > > > for a specific performance state and driver authors would be able to
> > > > query that information and manually set genpd performance states when
> > > > they change clk frequencies. In Qualcomm designs this would be their
> > > > "fmax" tables that map a max frequency to a voltage corner. If someone
> > > > wanted to fine tune that table and make it into a full frequency plan
> > > > OPP table for use by devfreq, then they could add more entries for all
> > > > the validated frequencies and voltage corners that are acceptable and
> > > > tested and this API would still work. We'll need this sort of table
> > > > regardless because we can't expect devices to search for an exact
> > > > frequency in an OPP table when they can support hundreds of different
> > > > frequencies, like in display or audio situations.
> > > > 
> > > 
> > > Various reasons why I think the driver is not the right place to handle
> > > the V/f relationship:
> > > 
> > > 1) The V/f relationship is temperature dependent. So the voltage may have
> > >    to be adjusted when the temperature changes. I don't think we should
> > >    make every driver handle this on its own.
> > 
> > This is AVS? Should be fine to plumb that into some sort of voltage
> > domain that gets temperature feedback and then adjusts the voltage based

For the core rail, it seems the voltage is indeed just adjusted based on
temperature. For the GPU rail, we have equations which calculate the required
voltage as a function of frequency and temperature. In some cases I think
we just cap the frequency if the temperature would be too high to find
a suitable voltage. Fortunately the GPU has its own rail, so it doesn't
necessarily need to be handled the same way.

> > on that? This is basically the same as Qualcomm's "voltage corners" by
> > the way, just that the voltage is adjusted outside of the Linux kernel
> > by another processor when the temperature changes.
> 
> Ack to what Stephen said above. Adaptive voltage scaling, corners, body
> bias, SMPS modes/efficiency, etc are all just implementation details.
> 
> I don't think anyone is suggesting for drivers to take all of the above
> into account when setting voltage. I would imagine either a "nominal"
> voltage, a voltage "index" or a performance state to be passed from the
> driver into the genpd layer.
> 
> Peter would that work for you?
> 

A voltage index should do I think. The reason for a voltage index is that
we have at least one case, where the voltage depends on the mode of the
device. This is the case for the HDMI/DP output serializers (SORx). The
required voltage doesn't only depend on the pixel rate, but also on the
mode (DP or HDMI). One danger is that we must make sure all
drivers of devices sharing a rail, use this API to set their voltage
requirement. If not, weird failures will show up.

> > 
> > > 
> > > 2) Not every device with V/f requirements has its own powerdomain. On Tegra
> > >    for example we have 2 voltage rails: core and CPU. (and a 3rd one for GPU
> > >    since Tegra124). So all peripherals (except GPU) share the same voltage
> > >    rail and they are grouped in several domains, one of which cannot be
> > >    powergated. So genpd domains do not align with the V/f curves of the
> > >    peripherals themselves.
> > > 
> > 
> > I'm fairly certain this is true on most SoCs today. There is a main
> > powerdomain for non-CPU things, and then some sort of CPU powerdomain or
> > domains for CPU things. Each device in those domains needs to request a
> > certain performance state on the voltage domain they're in (the "core"
> > powerdomain in your example) and then that genpd will aggregate those
> > requests with a max operation to pick the highest state required from
> > all devices attached to the genpd for the voltage domain.
> > 
> > How does power gating or not power gating the domain matter for this?
> > 
> 
> I'll go out on a limb and suggest that nested genpd's take care of
> Peter's concern? In Peter's case there are power islands that can clamp
> power during idle. Some devices have these, some do not. For
> active/performance power management there are the scalable voltage
> rails.
> 
> I think that in this (very common) pattern the right thing to do is is
> have the performance genpds at the top level of the genpd hierarchy.
> These map onto the scalable voltage rails for DVFS.
> 
> Nested within these performance genpds/rails are the power islands used
> for power clamping during device idle.
> 
> Peter would that work for you?
> 

So the voltage genpd would then not have any on/off controls?

Peter.

> Best regards,
> Mike

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-25 11:27             ` Peter De Schrijver
@ 2018-07-25 18:40               ` Michael Turquette
  2018-07-31 11:56               ` Ulf Hansson
  1 sibling, 0 replies; 34+ messages in thread
From: Michael Turquette @ 2018-07-25 18:40 UTC (permalink / raw)
  To: Peter De Schrijver
  Cc: Stephen Boyd, Ulf Hansson, Viresh Kumar, grahamr, linux-clk,
	Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria

Quoting Peter De Schrijver (2018-07-25 04:27:02)
> On Tue, Jul 24, 2018 at 10:44:00PM -0700, Michael Turquette wrote:
> > Quoting Stephen Boyd (2018-07-24 16:04:37)
> > > Quoting Peter De Schrijver (2018-07-23 01:26:41)
> > > > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
> > > > > =

> > > > > For one thing, a driver should be able to figure out what the
> > > > > performance state requirement is for a particular frequency. I'd =
like to
> > > > > see an API that a driver can pass something like a (device, genpd=
, clk,
> > > > > frequency) tuple and get back the performance state required for =
that
> > > > > device's clk frequency within that genpd by querying OPP tables. =
If we
> > > > > had this API, then SoC vendors could design OPP tables for their =
on-SoC
> > > > > devices that describe the set of max frequencies a device can ope=
rate at
> > > > > for a specific performance state and driver authors would be able=
 to
> > > > > query that information and manually set genpd performance states =
when
> > > > > they change clk frequencies. In Qualcomm designs this would be th=
eir
> > > > > "fmax" tables that map a max frequency to a voltage corner. If so=
meone
> > > > > wanted to fine tune that table and make it into a full frequency =
plan
> > > > > OPP table for use by devfreq, then they could add more entries fo=
r all
> > > > > the validated frequencies and voltage corners that are acceptable=
 and
> > > > > tested and this API would still work. We'll need this sort of tab=
le
> > > > > regardless because we can't expect devices to search for an exact
> > > > > frequency in an OPP table when they can support hundreds of diffe=
rent
> > > > > frequencies, like in display or audio situations.
> > > > > =

> > > > =

> > > > Various reasons why I think the driver is not the right place to ha=
ndle
> > > > the V/f relationship:
> > > > =

> > > > 1) The V/f relationship is temperature dependent. So the voltage ma=
y have
> > > >    to be adjusted when the temperature changes. I don't think we sh=
ould
> > > >    make every driver handle this on its own.
> > > =

> > > This is AVS? Should be fine to plumb that into some sort of voltage
> > > domain that gets temperature feedback and then adjusts the voltage ba=
sed
> =

> For the core rail, it seems the voltage is indeed just adjusted based on
> temperature. For the GPU rail, we have equations which calculate the requ=
ired
> voltage as a function of frequency and temperature. In some cases I think
> we just cap the frequency if the temperature would be too high to find
> a suitable voltage. Fortunately the GPU has its own rail, so it doesn't
> necessarily need to be handled the same way.
> =

> > > on that? This is basically the same as Qualcomm's "voltage corners" by
> > > the way, just that the voltage is adjusted outside of the Linux kernel
> > > by another processor when the temperature changes.
> > =

> > Ack to what Stephen said above. Adaptive voltage scaling, corners, body
> > bias, SMPS modes/efficiency, etc are all just implementation details.
> > =

> > I don't think anyone is suggesting for drivers to take all of the above
> > into account when setting voltage. I would imagine either a "nominal"
> > voltage, a voltage "index" or a performance state to be passed from the
> > driver into the genpd layer.
> > =

> > Peter would that work for you?
> > =

> =

> A voltage index should do I think. The reason for a voltage index is that
> we have at least one case, where the voltage depends on the mode of the
> device. This is the case for the HDMI/DP output serializers (SORx). The
> required voltage doesn't only depend on the pixel rate, but also on the
> mode (DP or HDMI). One danger is that we must make sure all
> drivers of devices sharing a rail, use this API to set their voltage
> requirement. If not, weird failures will show up.
> =

> > > =

> > > > =

> > > > 2) Not every device with V/f requirements has its own powerdomain. =
On Tegra
> > > >    for example we have 2 voltage rails: core and CPU. (and a 3rd on=
e for GPU
> > > >    since Tegra124). So all peripherals (except GPU) share the same =
voltage
> > > >    rail and they are grouped in several domains, one of which canno=
t be
> > > >    powergated. So genpd domains do not align with the V/f curves of=
 the
> > > >    peripherals themselves.
> > > > =

> > > =

> > > I'm fairly certain this is true on most SoCs today. There is a main
> > > powerdomain for non-CPU things, and then some sort of CPU powerdomain=
 or
> > > domains for CPU things. Each device in those domains needs to request=
 a
> > > certain performance state on the voltage domain they're in (the "core"
> > > powerdomain in your example) and then that genpd will aggregate those
> > > requests with a max operation to pick the highest state required from
> > > all devices attached to the genpd for the voltage domain.
> > > =

> > > How does power gating or not power gating the domain matter for this?
> > > =

> > =

> > I'll go out on a limb and suggest that nested genpd's take care of
> > Peter's concern? In Peter's case there are power islands that can clamp
> > power during idle. Some devices have these, some do not. For
> > active/performance power management there are the scalable voltage
> > rails.
> > =

> > I think that in this (very common) pattern the right thing to do is is
> > have the performance genpds at the top level of the genpd hierarchy.
> > These map onto the scalable voltage rails for DVFS.
> > =

> > Nested within these performance genpds/rails are the power islands used
> > for power clamping during device idle.
> > =

> > Peter would that work for you?
> > =

> =

> So the voltage genpd would then not have any on/off controls?

I don't see why a single genpd could not provide callbacks for both
on/off and performance. No reason it can't do both.

Modeling the power clamping islands as nested genpds will often be a
more accurate model of the SoC. And then turning off the whole top-level
genpd/rail will be useful for the system-wide PM or deeper power
collapse.

Regards,
Mike

> =

> Peter.
> =

> > Best regards,
> > Mike

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-20 17:12     ` Stephen Boyd
  2018-07-20 17:56       ` Michael Turquette
  2018-07-23  8:26       ` Peter De Schrijver
@ 2018-07-31 10:35       ` Ulf Hansson
  2018-08-03 21:11         ` Michael Turquette
  2 siblings, 1 reply; 34+ messages in thread
From: Ulf Hansson @ 2018-07-31 10:35 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Viresh Kumar, grahamr, linux-clk, Linux PM, Mike Turquette,
	Doug Anderson, Taniya Das, Rajendra Nayak, Amit Nischal,
	Vincent Guittot, Amit Kucheria

[...]

>>
>> Another option is to use the ->runtime_suspend|resume() callbacks at
>> the genpd level. In principle when the device enters runtime suspend,
>> genpd can decide to drop the votes for the device (if some exist). At
>> runtime resume, genpd then should re-store the votes for the device it
>> had before it became runtime suspended (or if the vote became updated
>> in-between).
>
> I'd prefer to see the genpd layer do it instead of OPP layer controlling
> the on/off state. Calling dev_pm_opp_set_rate(dev, 0) just looks very
> weird and conflates clk frequency with power domain state which isn't
> exactly what we want. We want it to be related to the frequency _and_
> the enabled state. Plus it's like saying clk_set_rate(clk, 0) should
> turn the clk off, which we've never done before.

Perhaps you misunderstood my suggestion!?

Instead of having the driver to call dev_pm_opp_set_rate(dev, 0) from
their ->runtime_suspend() callbacks, my suggestion is to let genpd,
from its ->runtime_suspend() callback, temporarily drop the requested
performance level for the device. Hence genpd also needs to re-compute
and set a new aggregated performance value for the PM domain.

The purpose was to make it transparent for drivers, as all they need
do is whatever they did before.

>
> It would also be good to handle the on/off state in genpd because the
> genpd could have a governor attached to it that blocks on/off requests
> from drivers if the governor decides that the device is clk gating on
> and off quickly but it's not beneficial to change the voltage during the
> quick gating scenario because we spend more time changing voltage than
> clk gating. Connecting this all to genpd allows us to use the device
> power state as a signal to adjust the voltage, but leaves us flexibility
> to override it if it makes sense policy wise.

Well, genpd already manages the on/off state!?

Of course, I fully agree to the above, which seems also to be a good
reason to why my suggestion makes sense. Or am I missing something
here?

>
> And please remember the other point here that many devices may all be
> sharing the same power domain and therefore the voltage that the domain
> runs at depends on many different clk frequencies for different devices
> and if those clks are enabled or not, aggregated into a single voltage
> requirement for the power domain.

Right, the aggregation in genpd is done per device - not per clock. I
don't understand your concern, but again I may be missing something!?

> The runtime suspend state might be
> part of that aggregation equation, so that if a device's clk is off and
> it's been suspended for some time the voltage requirement for that clk
> frequency could be dropped, but if the device is resumed and the clk is
> still off then the voltage requirement can be ignored. We only need to
> make sure the power domain is at a high enough voltage when the clk is
> actually enabled though, so tying the voltage requirement drop to
> runtime PM of the device is not exactly correct.

I see.

For my understanding; you are saying that there may be use-cases when
one want to keep the device runtime resumed, but without having its
clock ungated, right?

My guess is that this is a rather unusual case, no?

The point is, I think we can simply have the driver to call
dev_pm_opp_set_rate() to cover optimizations for these scenarios.

>
> It may be that a device is composed of multiple power domains and those
> power domains each have their own clks with voltage requirements too.
> In this scenario, connecting the voltage requirement of the genpd to the
> runtime PM state of the device doesn't make sense at all. We'll get into
> a situation where half the device is "off" because a power domain and
> the clk for that power domain are off but we can't drop the voltage
> requirement because the overall device isn't runtime suspended. It's
> like the single genpd per device problem all over again. Let's not do
> that.

I am *not* suggesting to remove the APIs in genpd for setting a
performance level. We can still do that and that should be sufficient
to cover any corner cases, don't you think?

My suggestion was only to allow genpd to drop the votes (temporarily)
for a device that becomes runtime suspended.

>
> So I'd rather see drivers calling some genpd API directly to indicate
> that the clk should turn off and the voltage domain constraint has gone
> away. We might want to transform OPP into a genpd of its own and then
> connect that OPP genpd to a clk domain genpd for some clk
> on/off/frequency part and a power domain for the voltage/corner scaling
> part. If we did that, then everything is a genpd that gets all
> aggregated in the genpd layer and drivers that only use OPP would be OK
> with their runtime PM state of their device affecting the OPP genpd's
> on/off state of clks and voltage requirements. Drivers that want finer
> control would be able to use some new genpd API to turn on and off the
> genpd when they want to.

Sound like we need a white-board. :-)

Anyway, I am reading the following replies in the thread to hopefully
understand your suggestions a bit better.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-25 11:27             ` Peter De Schrijver
  2018-07-25 18:40               ` Michael Turquette
@ 2018-07-31 11:56               ` Ulf Hansson
  2018-07-31 20:02                 ` grahamr
  2018-08-03 23:05                 ` Michael Turquette
  1 sibling, 2 replies; 34+ messages in thread
From: Ulf Hansson @ 2018-07-31 11:56 UTC (permalink / raw)
  To: Peter De Schrijver, Michael Turquette, Stephen Boyd
  Cc: Viresh Kumar, grahamr, linux-clk, Linux PM, Doug Anderson,
	Taniya Das, Rajendra Nayak, Amit Nischal, Vincent Guittot,
	Amit Kucheria

On 25 July 2018 at 13:27, Peter De Schrijver <pdeschrijver@nvidia.com> wrote:
> On Tue, Jul 24, 2018 at 10:44:00PM -0700, Michael Turquette wrote:
>> Quoting Stephen Boyd (2018-07-24 16:04:37)
>> > Quoting Peter De Schrijver (2018-07-23 01:26:41)
>> > > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
>> > > >
>> > > > For one thing, a driver should be able to figure out what the
>> > > > performance state requirement is for a particular frequency. I'd like to
>> > > > see an API that a driver can pass something like a (device, genpd, clk,
>> > > > frequency) tuple and get back the performance state required for that
>> > > > device's clk frequency within that genpd by querying OPP tables. If we
>> > > > had this API, then SoC vendors could design OPP tables for their on-SoC
>> > > > devices that describe the set of max frequencies a device can operate at
>> > > > for a specific performance state and driver authors would be able to
>> > > > query that information and manually set genpd performance states when
>> > > > they change clk frequencies. In Qualcomm designs this would be their
>> > > > "fmax" tables that map a max frequency to a voltage corner. If someone
>> > > > wanted to fine tune that table and make it into a full frequency plan
>> > > > OPP table for use by devfreq, then they could add more entries for all
>> > > > the validated frequencies and voltage corners that are acceptable and
>> > > > tested and this API would still work. We'll need this sort of table
>> > > > regardless because we can't expect devices to search for an exact
>> > > > frequency in an OPP table when they can support hundreds of different
>> > > > frequencies, like in display or audio situations.
>> > > >
>> > >
>> > > Various reasons why I think the driver is not the right place to handle
>> > > the V/f relationship:
>> > >
>> > > 1) The V/f relationship is temperature dependent. So the voltage may have
>> > >    to be adjusted when the temperature changes. I don't think we should
>> > >    make every driver handle this on its own.
>> >
>> > This is AVS? Should be fine to plumb that into some sort of voltage
>> > domain that gets temperature feedback and then adjusts the voltage based
>
> For the core rail, it seems the voltage is indeed just adjusted based on
> temperature. For the GPU rail, we have equations which calculate the required
> voltage as a function of frequency and temperature. In some cases I think
> we just cap the frequency if the temperature would be too high to find
> a suitable voltage. Fortunately the GPU has its own rail, so it doesn't
> necessarily need to be handled the same way.
>
>> > on that? This is basically the same as Qualcomm's "voltage corners" by
>> > the way, just that the voltage is adjusted outside of the Linux kernel
>> > by another processor when the temperature changes.
>>
>> Ack to what Stephen said above. Adaptive voltage scaling, corners, body
>> bias, SMPS modes/efficiency, etc are all just implementation details.
>>
>> I don't think anyone is suggesting for drivers to take all of the above
>> into account when setting voltage. I would imagine either a "nominal"
>> voltage, a voltage "index" or a performance state to be passed from the
>> driver into the genpd layer.
>>
>> Peter would that work for you?
>>
>
> A voltage index should do I think. The reason for a voltage index is that
> we have at least one case, where the voltage depends on the mode of the
> device. This is the case for the HDMI/DP output serializers (SORx). The
> required voltage doesn't only depend on the pixel rate, but also on the
> mode (DP or HDMI). One danger is that we must make sure all
> drivers of devices sharing a rail, use this API to set their voltage
> requirement. If not, weird failures will show up.

I am trying to understand the proposal, but it seems like I am failing. Sorry.

Genpd already have an API for requesting a performance state for a
device, dev_pm_genpd_set_performance_state() - and genpd aggregates
the votes per PM domain.

The idea so far, is that most votes should come via the OPP layer,
dev_pm_opp_set_rate(), but for special cases users may decide to call
dev_pm_genpd_set_performance_state() directly.

So far so good?

Following the conversation, it seems like the suggestion is to *not*
go through the OPP layer, but instead always let drivers call
dev_pm_genpd_set_performance_state(). Why is that a benefit and why is
that preferred?

If I understood the main concern raised by Graham/Stephen, was that we
didn't want to make the driver call yet another API to manage OPPs. So
then calling dev_pm_genpd_set_performance_state() is fine, but not
dev_pm_opp_set_rate()?

Moreover, how would the driver know which performance state it shall
request? Sounds like a OPP translation is needed anyways, so even more
things falls into the responsibility of the driver, in case it shall
call dev_pm_genpd_set_performance_state(). It doesn't sound right to
me.

Finally, trying to make use of the ->stop|start() callbacks in genpd,
can be useful for SoC specific operations that we want to make
transparent for generic drivers. However, I wonder if dealing with
OPPs from there really makes sense, need to think more about that.

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-31 11:56               ` Ulf Hansson
@ 2018-07-31 20:02                 ` grahamr
  2018-08-23 13:20                   ` Ulf Hansson
  2018-09-18 17:25                   ` Kevin Hilman
  2018-08-03 23:05                 ` Michael Turquette
  1 sibling, 2 replies; 34+ messages in thread
From: grahamr @ 2018-07-31 20:02 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Peter De Schrijver, Michael Turquette, Stephen Boyd,
	Viresh Kumar, linux-clk, Linux PM, Doug Anderson, Taniya Das,
	Rajendra Nayak, Amit Nischal, Vincent Guittot, Amit Kucheria,
	linux-clk-owner

I have two significant concerns with using a wrapper framework to 
support DVFS.  The first may be Qualcomm-specific, and more of a 
practical concern rather than a principled one.  We (clock driver 
owners) get voltage/frequency operating points from the clock controller 
designers in a single data-set, it is not information that is provided 
to or coming from the clock consumer hardware or software teams.  
Tracking, updating, and debugging frequency/voltage mismatches falls 
completely into the clock team, so moving the responsibility of handling 
the relationship in code to the consumer would be a disaster (SDM845 has 
almost 200 clock domains and over 800 clock branches - not all 
controlled by Linux but a large proportion are).  It would mean all 
clock consumers using any clk_set_rate or clk_enable APIs would need to 
make the appropriate voltage requests as well - and even if they could 
look up the required voltage based on frequency from some central 
location (which I expect would be delivered and maintained by the clock 
driver team) the chance that everyone would do that correctly is, 
frankly, zero.  The types of problems that arise from under-volting a 
clock generator or consumer are very hard to debug (since it can 
essentially result in random behavior).

My second concern is that we do have voltage requirements coming from 
the clock controller itself, not related to the clock consumer.  The 
clock hardware components themselves (PLLs, muxes, etc) require voltage 
levels that may differ from that of the final consumer of the output 
clock.  We could hack all these requirements into the frequency/voltage 
tuple table that consumers look up, but at that point we have diverged 
fairly dramatically from Stephen's principle that the clock framework 
should not directly require any voltage - fundamentally they do (at 
least ours do).

Our most recent patch that Taniya posted has gone in the direction 
similar to Tegra - instead of having the framework handle it, we use 
prepare and set_rate hooks to implement voltage (corner) voting in the 
qcom drivers via the new genpd.  This is not fully proven yet but is so 
far working well and will likely be our internal solution going forward 
if the framework consensus is to force consumers to manage their 
frequency-based voltage requirements themselves - I just do not see that 
as a practical solution for a complicated SoC with a large, highly 
distributed, clock tree.  That being said I do see potential future 
deadlock race conditions between clk and genpd which concern me - it 
remains a downside of that coupling.

Would there be some way to prevent consumers from directly calling 
clk_set_rate or clk_enable and force them to go via another framework 
for these calls?  It would at least prevent people from using the 
"wrong" interface and bypassing voltage requirements.  That of course 
means having to mirror any of the clk APIs that update clock state into 
genpd/opp, which Stephen finds distasteful (and I agree).

Graham



On 2018-07-31 04:56, Ulf Hansson wrote:
> On 25 July 2018 at 13:27, Peter De Schrijver <pdeschrijver@nvidia.com> 
> wrote:
>> On Tue, Jul 24, 2018 at 10:44:00PM -0700, Michael Turquette wrote:
>>> Quoting Stephen Boyd (2018-07-24 16:04:37)
>>> > Quoting Peter De Schrijver (2018-07-23 01:26:41)
>>> > > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
>>> > > >
>>> > > > For one thing, a driver should be able to figure out what the
>>> > > > performance state requirement is for a particular frequency. I'd like to
>>> > > > see an API that a driver can pass something like a (device, genpd, clk,
>>> > > > frequency) tuple and get back the performance state required for that
>>> > > > device's clk frequency within that genpd by querying OPP tables. If we
>>> > > > had this API, then SoC vendors could design OPP tables for their on-SoC
>>> > > > devices that describe the set of max frequencies a device can operate at
>>> > > > for a specific performance state and driver authors would be able to
>>> > > > query that information and manually set genpd performance states when
>>> > > > they change clk frequencies. In Qualcomm designs this would be their
>>> > > > "fmax" tables that map a max frequency to a voltage corner. If someone
>>> > > > wanted to fine tune that table and make it into a full frequency plan
>>> > > > OPP table for use by devfreq, then they could add more entries for all
>>> > > > the validated frequencies and voltage corners that are acceptable and
>>> > > > tested and this API would still work. We'll need this sort of table
>>> > > > regardless because we can't expect devices to search for an exact
>>> > > > frequency in an OPP table when they can support hundreds of different
>>> > > > frequencies, like in display or audio situations.
>>> > > >
>>> > >
>>> > > Various reasons why I think the driver is not the right place to handle
>>> > > the V/f relationship:
>>> > >
>>> > > 1) The V/f relationship is temperature dependent. So the voltage may have
>>> > >    to be adjusted when the temperature changes. I don't think we should
>>> > >    make every driver handle this on its own.
>>> >
>>> > This is AVS? Should be fine to plumb that into some sort of voltage
>>> > domain that gets temperature feedback and then adjusts the voltage based
>> 
>> For the core rail, it seems the voltage is indeed just adjusted based 
>> on
>> temperature. For the GPU rail, we have equations which calculate the 
>> required
>> voltage as a function of frequency and temperature. In some cases I 
>> think
>> we just cap the frequency if the temperature would be too high to find
>> a suitable voltage. Fortunately the GPU has its own rail, so it 
>> doesn't
>> necessarily need to be handled the same way.
>> 
>>> > on that? This is basically the same as Qualcomm's "voltage corners" by
>>> > the way, just that the voltage is adjusted outside of the Linux kernel
>>> > by another processor when the temperature changes.
>>> 
>>> Ack to what Stephen said above. Adaptive voltage scaling, corners, 
>>> body
>>> bias, SMPS modes/efficiency, etc are all just implementation details.
>>> 
>>> I don't think anyone is suggesting for drivers to take all of the 
>>> above
>>> into account when setting voltage. I would imagine either a "nominal"
>>> voltage, a voltage "index" or a performance state to be passed from 
>>> the
>>> driver into the genpd layer.
>>> 
>>> Peter would that work for you?
>>> 
>> 
>> A voltage index should do I think. The reason for a voltage index is 
>> that
>> we have at least one case, where the voltage depends on the mode of 
>> the
>> device. This is the case for the HDMI/DP output serializers (SORx). 
>> The
>> required voltage doesn't only depend on the pixel rate, but also on 
>> the
>> mode (DP or HDMI). One danger is that we must make sure all
>> drivers of devices sharing a rail, use this API to set their voltage
>> requirement. If not, weird failures will show up.
> 
> I am trying to understand the proposal, but it seems like I am failing. 
> Sorry.
> 
> Genpd already have an API for requesting a performance state for a
> device, dev_pm_genpd_set_performance_state() - and genpd aggregates
> the votes per PM domain.
> 
> The idea so far, is that most votes should come via the OPP layer,
> dev_pm_opp_set_rate(), but for special cases users may decide to call
> dev_pm_genpd_set_performance_state() directly.
> 
> So far so good?
> 
> Following the conversation, it seems like the suggestion is to *not*
> go through the OPP layer, but instead always let drivers call
> dev_pm_genpd_set_performance_state(). Why is that a benefit and why is
> that preferred?
> 
> If I understood the main concern raised by Graham/Stephen, was that we
> didn't want to make the driver call yet another API to manage OPPs. So
> then calling dev_pm_genpd_set_performance_state() is fine, but not
> dev_pm_opp_set_rate()?
> 
> Moreover, how would the driver know which performance state it shall
> request? Sounds like a OPP translation is needed anyways, so even more
> things falls into the responsibility of the driver, in case it shall
> call dev_pm_genpd_set_performance_state(). It doesn't sound right to
> me.
> 
> Finally, trying to make use of the ->stop|start() callbacks in genpd,
> can be useful for SoC specific operations that we want to make
> transparent for generic drivers. However, I wonder if dealing with
> OPPs from there really makes sense, need to think more about that.
> 
> [...]
> 
> Kind regards
> Uffe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-clk" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-31 10:35       ` Ulf Hansson
@ 2018-08-03 21:11         ` Michael Turquette
  2018-08-23 11:10           ` Ulf Hansson
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Turquette @ 2018-08-03 21:11 UTC (permalink / raw)
  To: Stephen Boyd, Ulf Hansson
  Cc: Viresh Kumar, grahamr, linux-clk, Linux PM, Doug Anderson,
	Taniya Das, Rajendra Nayak, Amit Nischal, Vincent Guittot,
	Amit Kucheria

Quoting Ulf Hansson (2018-07-31 03:35:50)
> [...]
> =

> >>
> >> Another option is to use the ->runtime_suspend|resume() callbacks at
> >> the genpd level. In principle when the device enters runtime suspend,
> >> genpd can decide to drop the votes for the device (if some exist). At
> >> runtime resume, genpd then should re-store the votes for the device it
> >> had before it became runtime suspended (or if the vote became updated
> >> in-between).
> >
> > I'd prefer to see the genpd layer do it instead of OPP layer controlling
> > the on/off state. Calling dev_pm_opp_set_rate(dev, 0) just looks very
> > weird and conflates clk frequency with power domain state which isn't
> > exactly what we want. We want it to be related to the frequency _and_
> > the enabled state. Plus it's like saying clk_set_rate(clk, 0) should
> > turn the clk off, which we've never done before.
> =

> Perhaps you misunderstood my suggestion!?
> =

> Instead of having the driver to call dev_pm_opp_set_rate(dev, 0) from
> their ->runtime_suspend() callbacks, my suggestion is to let genpd,
> from its ->runtime_suspend() callback, temporarily drop the requested
> performance level for the device. Hence genpd also needs to re-compute
> and set a new aggregated performance value for the PM domain.
> =

> The purpose was to make it transparent for drivers, as all they need
> do is whatever they did before.
> =

> >
> > It would also be good to handle the on/off state in genpd because the
> > genpd could have a governor attached to it that blocks on/off requests
> > from drivers if the governor decides that the device is clk gating on
> > and off quickly but it's not beneficial to change the voltage during the
> > quick gating scenario because we spend more time changing voltage than
> > clk gating. Connecting this all to genpd allows us to use the device
> > power state as a signal to adjust the voltage, but leaves us flexibility
> > to override it if it makes sense policy wise.
> =

> Well, genpd already manages the on/off state!?
> =

> Of course, I fully agree to the above, which seems also to be a good
> reason to why my suggestion makes sense. Or am I missing something
> here?
> =

> >
> > And please remember the other point here that many devices may all be
> > sharing the same power domain and therefore the voltage that the domain
> > runs at depends on many different clk frequencies for different devices
> > and if those clks are enabled or not, aggregated into a single voltage
> > requirement for the power domain.
> =

> Right, the aggregation in genpd is done per device - not per clock. I
> don't understand your concern, but again I may be missing something!?

I believe that this is an argument against using OPP as the performance
selection mechanism for all SoCs. It's data structures, along with the
fact that is seems to have no awareness of device idleness, means that
it's not really sufficient for more complex SoCs.

> =

> > The runtime suspend state might be
> > part of that aggregation equation, so that if a device's clk is off and
> > it's been suspended for some time the voltage requirement for that clk
> > frequency could be dropped, but if the device is resumed and the clk is
> > still off then the voltage requirement can be ignored. We only need to
> > make sure the power domain is at a high enough voltage when the clk is
> > actually enabled though, so tying the voltage requirement drop to
> > runtime PM of the device is not exactly correct.
> =

> I see.
> =

> For my understanding; you are saying that there may be use-cases when
> one want to keep the device runtime resumed, but without having its
> clock ungated, right?
> =

> My guess is that this is a rather unusual case, no?

This is a common case if I understand Stephen's example: Imagine a power
island that scales voltage and may power clamp to zero. Now imagine that
within that power island are a dozen separate functional devices, each
consuming a dedicated functional clock input signal.

It's possible to have the power island turned on so that one device can
function, but the other 11 devices will remain clock gated.

I assume the right way to handle this is to nest a dozen device-level
genpd's within the power island-level genpd? That mirrors the topology I
described above, and allows the lowest level genpds (functional
device-level) to gate and ungate clocks, and the parent genpd (power
island-level) will only be concerned with voltage control.

> =

> The point is, I think we can simply have the driver to call
> dev_pm_opp_set_rate() to cover optimizations for these scenarios.
> =

> >
> > It may be that a device is composed of multiple power domains and those
> > power domains each have their own clks with voltage requirements too.
> > In this scenario, connecting the voltage requirement of the genpd to the
> > runtime PM state of the device doesn't make sense at all. We'll get into
> > a situation where half the device is "off" because a power domain and
> > the clk for that power domain are off but we can't drop the voltage
> > requirement because the overall device isn't runtime suspended. It's
> > like the single genpd per device problem all over again. Let's not do
> > that.
> =

> I am *not* suggesting to remove the APIs in genpd for setting a
> performance level. We can still do that and that should be sufficient
> to cover any corner cases, don't you think?
> =

> My suggestion was only to allow genpd to drop the votes (temporarily)
> for a device that becomes runtime suspended.
> =

> >
> > So I'd rather see drivers calling some genpd API directly to indicate
> > that the clk should turn off and the voltage domain constraint has gone
> > away. We might want to transform OPP into a genpd of its own and then
> > connect that OPP genpd to a clk domain genpd for some clk
> > on/off/frequency part and a power domain for the voltage/corner scaling
> > part. If we did that, then everything is a genpd that gets all
> > aggregated in the genpd layer and drivers that only use OPP would be OK
> > with their runtime PM state of their device affecting the OPP genpd's
> > on/off state of clks and voltage requirements. Drivers that want finer
> > control would be able to use some new genpd API to turn on and off the
> > genpd when they want to.
> =

> Sound like we need a white-board. :-)

We really do. Maybe I'll make the trip to Vancouver in September :-)

I don't like the idea to "transform OPP into a genpd of its own", as
suggested by Stephen above. But I think all he needs to solve this issue
is nesting some functional clock-level genpds inside of a parent power
island-level genpd. This is a repeat of the three paragraphs I've
written above.

This approach achieves the same effect as turning opp into a genpd, but
I'm not inclined to try and change the OPP library to solve this
problem.

I think OPP is fine as-is: it makes it easy to handle the very simplest
DVFS cases.  Stephen's requirements (and that of most mobile SoCs) goes
beyond what the OPP library can and should offer.

Regards,
Mike

> =

> Anyway, I am reading the following replies in the thread to hopefully
> understand your suggestions a bit better.
> =

> Kind regards
> Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-31 11:56               ` Ulf Hansson
  2018-07-31 20:02                 ` grahamr
@ 2018-08-03 23:05                 ` Michael Turquette
  2018-08-23 12:13                   ` Ulf Hansson
  1 sibling, 1 reply; 34+ messages in thread
From: Michael Turquette @ 2018-08-03 23:05 UTC (permalink / raw)
  To: Peter De Schrijver, Stephen Boyd, Ulf Hansson
  Cc: Viresh Kumar, grahamr, linux-clk, Linux PM, Doug Anderson,
	Taniya Das, Rajendra Nayak, Amit Nischal, Vincent Guittot,
	Amit Kucheria

Quoting Ulf Hansson (2018-07-31 04:56:46)
> On 25 July 2018 at 13:27, Peter De Schrijver <pdeschrijver@nvidia.com> wr=
ote:
> > On Tue, Jul 24, 2018 at 10:44:00PM -0700, Michael Turquette wrote:
> >> Quoting Stephen Boyd (2018-07-24 16:04:37)
> >> > Quoting Peter De Schrijver (2018-07-23 01:26:41)
> >> > > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
> >> > > >
> >> > > > For one thing, a driver should be able to figure out what the
> >> > > > performance state requirement is for a particular frequency. I'd=
 like to
> >> > > > see an API that a driver can pass something like a (device, genp=
d, clk,
> >> > > > frequency) tuple and get back the performance state required for=
 that
> >> > > > device's clk frequency within that genpd by querying OPP tables.=
 If we
> >> > > > had this API, then SoC vendors could design OPP tables for their=
 on-SoC
> >> > > > devices that describe the set of max frequencies a device can op=
erate at
> >> > > > for a specific performance state and driver authors would be abl=
e to
> >> > > > query that information and manually set genpd performance states=
 when
> >> > > > they change clk frequencies. In Qualcomm designs this would be t=
heir
> >> > > > "fmax" tables that map a max frequency to a voltage corner. If s=
omeone
> >> > > > wanted to fine tune that table and make it into a full frequency=
 plan
> >> > > > OPP table for use by devfreq, then they could add more entries f=
or all
> >> > > > the validated frequencies and voltage corners that are acceptabl=
e and
> >> > > > tested and this API would still work. We'll need this sort of ta=
ble
> >> > > > regardless because we can't expect devices to search for an exact
> >> > > > frequency in an OPP table when they can support hundreds of diff=
erent
> >> > > > frequencies, like in display or audio situations.
> >> > > >
> >> > >
> >> > > Various reasons why I think the driver is not the right place to h=
andle
> >> > > the V/f relationship:
> >> > >
> >> > > 1) The V/f relationship is temperature dependent. So the voltage m=
ay have
> >> > >    to be adjusted when the temperature changes. I don't think we s=
hould
> >> > >    make every driver handle this on its own.
> >> >
> >> > This is AVS? Should be fine to plumb that into some sort of voltage
> >> > domain that gets temperature feedback and then adjusts the voltage b=
ased
> >
> > For the core rail, it seems the voltage is indeed just adjusted based on
> > temperature. For the GPU rail, we have equations which calculate the re=
quired
> > voltage as a function of frequency and temperature. In some cases I thi=
nk
> > we just cap the frequency if the temperature would be too high to find
> > a suitable voltage. Fortunately the GPU has its own rail, so it doesn't
> > necessarily need to be handled the same way.
> >
> >> > on that? This is basically the same as Qualcomm's "voltage corners" =
by
> >> > the way, just that the voltage is adjusted outside of the Linux kern=
el
> >> > by another processor when the temperature changes.
> >>
> >> Ack to what Stephen said above. Adaptive voltage scaling, corners, body
> >> bias, SMPS modes/efficiency, etc are all just implementation details.
> >>
> >> I don't think anyone is suggesting for drivers to take all of the above
> >> into account when setting voltage. I would imagine either a "nominal"
> >> voltage, a voltage "index" or a performance state to be passed from the
> >> driver into the genpd layer.
> >>
> >> Peter would that work for you?
> >>
> >
> > A voltage index should do I think. The reason for a voltage index is th=
at
> > we have at least one case, where the voltage depends on the mode of the
> > device. This is the case for the HDMI/DP output serializers (SORx). The
> > required voltage doesn't only depend on the pixel rate, but also on the
> > mode (DP or HDMI). One danger is that we must make sure all
> > drivers of devices sharing a rail, use this API to set their voltage
> > requirement. If not, weird failures will show up.
> =

> I am trying to understand the proposal, but it seems like I am failing. S=
orry.
> =

> Genpd already have an API for requesting a performance state for a
> device, dev_pm_genpd_set_performance_state() - and genpd aggregates
> the votes per PM domain.
> =

> The idea so far, is that most votes should come via the OPP layer,
> dev_pm_opp_set_rate(), but for special cases users may decide to call
> dev_pm_genpd_set_performance_state() directly.
> =

> So far so good?

This almost matches what I had in mind.

> =

> Following the conversation, it seems like the suggestion is to *not*
> go through the OPP layer, but instead always let drivers call
> dev_pm_genpd_set_performance_state(). Why is that a benefit and why is
> that preferred?

I think that consumer drivers should not directly invoke the genpd apis
for performance power management, but instead access them through a
pm_runtime interface, just as they do when managing idle power
management.

This pm_runtime performance interface does not exist yet. There is an
argument that the opp library is this interface, but I'll address that
at the very end of this message.

> =

> If I understood the main concern raised by Graham/Stephen, was that we
> didn't want to make the driver call yet another API to manage OPPs. So
> then calling dev_pm_genpd_set_performance_state() is fine, but not
> dev_pm_opp_set_rate()?

Speaking only for myself, I want a single performance interface that
encapsulates clocks, regulators, pm microcontrollers, firmware
interfaces or whatever are required on the SoC to achieve the
performance level desired.

There will likely be a need for some consumer drivers to mix calls to
low-level primitives (clks, regulators) with this new pm_runtime
performance api, but that is the exception to the rule.

> =

> Moreover, how would the driver know which performance state it shall
> request? Sounds like a OPP translation is needed anyways, so even more
> things falls into the responsibility of the driver, in case it shall
> call dev_pm_genpd_set_performance_state(). It doesn't sound right to
> me.

Translation implies that consumer drivers are making requests in Hertz
to begin with. I think we should challenge this assumption.  Devicetree
gives us the tools we need to link SoC-specific performance states to
Driver-specific use cases and configurations. Clock rates will be the
right way to model this for many devices, but arbitrary perf indexes
will also make sense for others. Translation should happen in DT.

> Finally, trying to make use of the ->stop|start() callbacks in genpd,
> can be useful for SoC specific operations that we want to make
> transparent for generic drivers. However, I wonder if dealing with
> OPPs from there really makes sense, need to think more about that.

To start with, I suggest we follow the principle of simplicity and
separate active and idle power management operations as much as
possible. We can always make it into a tangled mess later ;-)

I mentioned several times that I do not want to require OPP to be used
for controlling performance. I'll try to explain that now:

My main issue with OPP is that it is really *just a library* for dealing
with voltage/frequency tuples. Over time it has added in some generic
logic for setting clock rate and regulator voltage, but this is not a
solution for everyone. It's fine to cover the simple cases, but complex
SoCs will not find this approach useful.

In fact, having dev_pm_opp_set_rate make a call to
dev_pm_genpd_set_performance_state is completely backwards. Some genpd
implementations may make use of the OPP library as a data store for
supported frequency and voltage tuples, but not all of them will. Having
the tuple library tell the genpd layer what to do is an upside down
design.

The changes to add performance awareness to genpd allows us to write
"performance provider" drivers, which is good. But forcing the consumer
api to be the OPP library is severely limiting.

What is missing is a generic, less opinionated consumer api that gives
performance provider drivers (aka genpd callbacks) the flexibility to
support complex hardware and use cases. For example, how does the OPP
library work for a firmware interface that controls performance? Or a
message passing mechanism to a power management microcontroller? In
either case, it might not make sense to model everything in terms of
hertz or volts.

No doubt genpd, the performance provider, can handle the firmware and pm
microcontroller cases mentioned above. But the OPP library, the half of
the interface used by the performance consumer, does not fit well.

Put another way, there exists a set of performance scalable devices.
Within that set is a subset of devices for which it makes sense to model
their performance via clock rates. And within _that_ subset there exists
a subset of devices for whom dev_pm_opp_set_rate will be sufficient for
performance management. That leaves a lot of devices unaccounted for.

Finally, when it comes time to consider chip-wide current limits,
adaptive voltage scaling, thermal considerations, body bias, etc, I
think that the tables provided by the OPP library just won't cut it.
We'll need SoC vendors to write real performance provider drivers and
glue all of this together in that backend. The only way to make that
work is to provide a less opinionated consumer-facing API that does not
assume that everything fits neatly into a table comprised of frequency &
voltage pairs.

Regards,
Mike

> =


> =

> [...]
> =

> Kind regards
> Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-08-03 21:11         ` Michael Turquette
@ 2018-08-23 11:10           ` Ulf Hansson
  0 siblings, 0 replies; 34+ messages in thread
From: Ulf Hansson @ 2018-08-23 11:10 UTC (permalink / raw)
  To: Michael Turquette
  Cc: Stephen Boyd, Viresh Kumar, grahamr, linux-clk, Linux PM,
	Doug Anderson, Taniya Das, Rajendra Nayak, Amit Nischal,
	Vincent Guittot, Amit Kucheria

On 3 August 2018 at 23:11, Michael Turquette <mturquette@baylibre.com> wrote:
> Quoting Ulf Hansson (2018-07-31 03:35:50)
>> [...]
>>
>> >>
>> >> Another option is to use the ->runtime_suspend|resume() callbacks at
>> >> the genpd level. In principle when the device enters runtime suspend,
>> >> genpd can decide to drop the votes for the device (if some exist). At
>> >> runtime resume, genpd then should re-store the votes for the device it
>> >> had before it became runtime suspended (or if the vote became updated
>> >> in-between).
>> >
>> > I'd prefer to see the genpd layer do it instead of OPP layer controlling
>> > the on/off state. Calling dev_pm_opp_set_rate(dev, 0) just looks very
>> > weird and conflates clk frequency with power domain state which isn't
>> > exactly what we want. We want it to be related to the frequency _and_
>> > the enabled state. Plus it's like saying clk_set_rate(clk, 0) should
>> > turn the clk off, which we've never done before.
>>
>> Perhaps you misunderstood my suggestion!?
>>
>> Instead of having the driver to call dev_pm_opp_set_rate(dev, 0) from
>> their ->runtime_suspend() callbacks, my suggestion is to let genpd,
>> from its ->runtime_suspend() callback, temporarily drop the requested
>> performance level for the device. Hence genpd also needs to re-compute
>> and set a new aggregated performance value for the PM domain.
>>
>> The purpose was to make it transparent for drivers, as all they need
>> do is whatever they did before.
>>
>> >
>> > It would also be good to handle the on/off state in genpd because the
>> > genpd could have a governor attached to it that blocks on/off requests
>> > from drivers if the governor decides that the device is clk gating on
>> > and off quickly but it's not beneficial to change the voltage during the
>> > quick gating scenario because we spend more time changing voltage than
>> > clk gating. Connecting this all to genpd allows us to use the device
>> > power state as a signal to adjust the voltage, but leaves us flexibility
>> > to override it if it makes sense policy wise.
>>
>> Well, genpd already manages the on/off state!?
>>
>> Of course, I fully agree to the above, which seems also to be a good
>> reason to why my suggestion makes sense. Or am I missing something
>> here?
>>
>> >
>> > And please remember the other point here that many devices may all be
>> > sharing the same power domain and therefore the voltage that the domain
>> > runs at depends on many different clk frequencies for different devices
>> > and if those clks are enabled or not, aggregated into a single voltage
>> > requirement for the power domain.
>>
>> Right, the aggregation in genpd is done per device - not per clock. I
>> don't understand your concern, but again I may be missing something!?
>
> I believe that this is an argument against using OPP as the performance
> selection mechanism for all SoCs. It's data structures, along with the
> fact that is seems to have no awareness of device idleness, means that
> it's not really sufficient for more complex SoCs.

Sorry, but this does not make sense to me. :-)

For sure genpd has knowledge about "device's idleness", at least those
that have been attached to it. This is controlled via genpd's
->runtime_suspend() callback. See genpd_runtime_suspend() and
__genpd_runtime_suspend(), which is being invoked per device. What am
I missing here?

>
>>
>> > The runtime suspend state might be
>> > part of that aggregation equation, so that if a device's clk is off and
>> > it's been suspended for some time the voltage requirement for that clk
>> > frequency could be dropped, but if the device is resumed and the clk is
>> > still off then the voltage requirement can be ignored. We only need to
>> > make sure the power domain is at a high enough voltage when the clk is
>> > actually enabled though, so tying the voltage requirement drop to
>> > runtime PM of the device is not exactly correct.
>>
>> I see.
>>
>> For my understanding; you are saying that there may be use-cases when
>> one want to keep the device runtime resumed, but without having its
>> clock ungated, right?
>>
>> My guess is that this is a rather unusual case, no?
>
> This is a common case if I understand Stephen's example: Imagine a power
> island that scales voltage and may power clamp to zero. Now imagine that
> within that power island are a dozen separate functional devices, each
> consuming a dedicated functional clock input signal.

Right.

>
> It's possible to have the power island turned on so that one device can
> function, but the other 11 devices will remain clock gated.

Right.

>
> I assume the right way to handle this is to nest a dozen device-level
> genpd's within the power island-level genpd?

No.

Devices that belongs to the same "power island" shall be attached to
the same PM domain (genpd).

Each device within the same PM domain shall be separately managed
through runtime PM, via its corresponding runtime PM callbacks.

The job of genpd is to manage the PM domain, which means keeping it
powered on, in case any of its attached devices are in use (runtime
resumed). Not until all devices in the same PM domain becomes unused
(runtime suspended), then genpd can try to power off the PM domain.

> That mirrors the topology I
> described above, and allows the lowest level genpds (functional
> device-level) to gate and ungate clocks, and the parent genpd (power
> island-level) will only be concerned with voltage control.

Well, if there is "power island" topology, then that of course
can/shall be modeled by using genpd, but unless I am missing
something, that was not what you described above.

Anyway, let's try to use the right terminology of genpd, which are
master and subdomains, rather than parent/child.

>
>>
>> The point is, I think we can simply have the driver to call
>> dev_pm_opp_set_rate() to cover optimizations for these scenarios.
>>
>> >
>> > It may be that a device is composed of multiple power domains and those
>> > power domains each have their own clks with voltage requirements too.
>> > In this scenario, connecting the voltage requirement of the genpd to the
>> > runtime PM state of the device doesn't make sense at all. We'll get into
>> > a situation where half the device is "off" because a power domain and
>> > the clk for that power domain are off but we can't drop the voltage
>> > requirement because the overall device isn't runtime suspended. It's
>> > like the single genpd per device problem all over again. Let's not do
>> > that.
>>
>> I am *not* suggesting to remove the APIs in genpd for setting a
>> performance level. We can still do that and that should be sufficient
>> to cover any corner cases, don't you think?
>>
>> My suggestion was only to allow genpd to drop the votes (temporarily)
>> for a device that becomes runtime suspended.
>>
>> >
>> > So I'd rather see drivers calling some genpd API directly to indicate
>> > that the clk should turn off and the voltage domain constraint has gone
>> > away. We might want to transform OPP into a genpd of its own and then
>> > connect that OPP genpd to a clk domain genpd for some clk
>> > on/off/frequency part and a power domain for the voltage/corner scaling
>> > part. If we did that, then everything is a genpd that gets all
>> > aggregated in the genpd layer and drivers that only use OPP would be OK
>> > with their runtime PM state of their device affecting the OPP genpd's
>> > on/off state of clks and voltage requirements. Drivers that want finer
>> > control would be able to use some new genpd API to turn on and off the
>> > genpd when they want to.
>>
>> Sound like we need a white-board. :-)
>
> We really do. Maybe I'll make the trip to Vancouver in September :-)

Yes, it would be great to meet at Linaro connect! Please come! :-)

Also, I have already asked Rafael to bring up the DVFS as a topic
during our power management track at LPC in Vancouver in November.
Let's see what gets scheduled.

>
> I don't like the idea to "transform OPP into a genpd of its own", as
> suggested by Stephen above. But I think all he needs to solve this issue
> is nesting some functional clock-level genpds inside of a parent power
> island-level genpd. This is a repeat of the three paragraphs I've
> written above.
>
> This approach achieves the same effect as turning opp into a genpd, but
> I'm not inclined to try and change the OPP library to solve this
> problem.
>
> I think OPP is fine as-is: it makes it easy to handle the very simplest
> DVFS cases.  Stephen's requirements (and that of most mobile SoCs) goes
> beyond what the OPP library can and should offer.
>

The above seems reasonable.

On the other hand, Graham recently clarifies for QCOM SoCs, that clock
controllers themselves have OPP constraints and not only clock
consumers. To me this changes the situation, quite fundamentally as
well.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-08-03 23:05                 ` Michael Turquette
@ 2018-08-23 12:13                   ` Ulf Hansson
  2018-09-18 22:48                     ` Michael Turquette
  0 siblings, 1 reply; 34+ messages in thread
From: Ulf Hansson @ 2018-08-23 12:13 UTC (permalink / raw)
  To: Michael Turquette
  Cc: Peter De Schrijver, Stephen Boyd, Viresh Kumar, grahamr,
	linux-clk, Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria

On 4 August 2018 at 01:05, Michael Turquette <mturquette@baylibre.com> wrote:
> Quoting Ulf Hansson (2018-07-31 04:56:46)
>> On 25 July 2018 at 13:27, Peter De Schrijver <pdeschrijver@nvidia.com> wrote:
>> > On Tue, Jul 24, 2018 at 10:44:00PM -0700, Michael Turquette wrote:
>> >> Quoting Stephen Boyd (2018-07-24 16:04:37)
>> >> > Quoting Peter De Schrijver (2018-07-23 01:26:41)
>> >> > > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
>> >> > > >
>> >> > > > For one thing, a driver should be able to figure out what the
>> >> > > > performance state requirement is for a particular frequency. I'd like to
>> >> > > > see an API that a driver can pass something like a (device, genpd, clk,
>> >> > > > frequency) tuple and get back the performance state required for that
>> >> > > > device's clk frequency within that genpd by querying OPP tables. If we
>> >> > > > had this API, then SoC vendors could design OPP tables for their on-SoC
>> >> > > > devices that describe the set of max frequencies a device can operate at
>> >> > > > for a specific performance state and driver authors would be able to
>> >> > > > query that information and manually set genpd performance states when
>> >> > > > they change clk frequencies. In Qualcomm designs this would be their
>> >> > > > "fmax" tables that map a max frequency to a voltage corner. If someone
>> >> > > > wanted to fine tune that table and make it into a full frequency plan
>> >> > > > OPP table for use by devfreq, then they could add more entries for all
>> >> > > > the validated frequencies and voltage corners that are acceptable and
>> >> > > > tested and this API would still work. We'll need this sort of table
>> >> > > > regardless because we can't expect devices to search for an exact
>> >> > > > frequency in an OPP table when they can support hundreds of different
>> >> > > > frequencies, like in display or audio situations.
>> >> > > >
>> >> > >
>> >> > > Various reasons why I think the driver is not the right place to handle
>> >> > > the V/f relationship:
>> >> > >
>> >> > > 1) The V/f relationship is temperature dependent. So the voltage may have
>> >> > >    to be adjusted when the temperature changes. I don't think we should
>> >> > >    make every driver handle this on its own.
>> >> >
>> >> > This is AVS? Should be fine to plumb that into some sort of voltage
>> >> > domain that gets temperature feedback and then adjusts the voltage based
>> >
>> > For the core rail, it seems the voltage is indeed just adjusted based on
>> > temperature. For the GPU rail, we have equations which calculate the required
>> > voltage as a function of frequency and temperature. In some cases I think
>> > we just cap the frequency if the temperature would be too high to find
>> > a suitable voltage. Fortunately the GPU has its own rail, so it doesn't
>> > necessarily need to be handled the same way.
>> >
>> >> > on that? This is basically the same as Qualcomm's "voltage corners" by
>> >> > the way, just that the voltage is adjusted outside of the Linux kernel
>> >> > by another processor when the temperature changes.
>> >>
>> >> Ack to what Stephen said above. Adaptive voltage scaling, corners, body
>> >> bias, SMPS modes/efficiency, etc are all just implementation details.
>> >>
>> >> I don't think anyone is suggesting for drivers to take all of the above
>> >> into account when setting voltage. I would imagine either a "nominal"
>> >> voltage, a voltage "index" or a performance state to be passed from the
>> >> driver into the genpd layer.
>> >>
>> >> Peter would that work for you?
>> >>
>> >
>> > A voltage index should do I think. The reason for a voltage index is that
>> > we have at least one case, where the voltage depends on the mode of the
>> > device. This is the case for the HDMI/DP output serializers (SORx). The
>> > required voltage doesn't only depend on the pixel rate, but also on the
>> > mode (DP or HDMI). One danger is that we must make sure all
>> > drivers of devices sharing a rail, use this API to set their voltage
>> > requirement. If not, weird failures will show up.
>>
>> I am trying to understand the proposal, but it seems like I am failing. Sorry.
>>
>> Genpd already have an API for requesting a performance state for a
>> device, dev_pm_genpd_set_performance_state() - and genpd aggregates
>> the votes per PM domain.
>>
>> The idea so far, is that most votes should come via the OPP layer,
>> dev_pm_opp_set_rate(), but for special cases users may decide to call
>> dev_pm_genpd_set_performance_state() directly.
>>
>> So far so good?
>
> This almost matches what I had in mind.

Ok, a step in the right direction. :-)

>
>>
>> Following the conversation, it seems like the suggestion is to *not*
>> go through the OPP layer, but instead always let drivers call
>> dev_pm_genpd_set_performance_state(). Why is that a benefit and why is
>> that preferred?
>
> I think that consumer drivers should not directly invoke the genpd apis
> for performance power management, but instead access them through a
> pm_runtime interface, just as they do when managing idle power
> management.
>
> This pm_runtime performance interface does not exist yet. There is an
> argument that the opp library is this interface, but I'll address that
> at the very end of this message.
>
>>
>> If I understood the main concern raised by Graham/Stephen, was that we
>> didn't want to make the driver call yet another API to manage OPPs. So
>> then calling dev_pm_genpd_set_performance_state() is fine, but not
>> dev_pm_opp_set_rate()?
>
> Speaking only for myself, I want a single performance interface that
> encapsulates clocks, regulators, pm microcontrollers, firmware
> interfaces or whatever are required on the SoC to achieve the
> performance level desired.
>
> There will likely be a need for some consumer drivers to mix calls to
> low-level primitives (clks, regulators) with this new pm_runtime
> performance api, but that is the exception to the rule.
>
>>
>> Moreover, how would the driver know which performance state it shall
>> request? Sounds like a OPP translation is needed anyways, so even more
>> things falls into the responsibility of the driver, in case it shall
>> call dev_pm_genpd_set_performance_state(). It doesn't sound right to
>> me.
>
> Translation implies that consumer drivers are making requests in Hertz
> to begin with. I think we should challenge this assumption.  Devicetree
> gives us the tools we need to link SoC-specific performance states to
> Driver-specific use cases and configurations. Clock rates will be the
> right way to model this for many devices, but arbitrary perf indexes
> will also make sense for others. Translation should happen in DT.

Right.

As a matter of fact we already have some DT bindings to translate
clock/voltage to perf states for QCOM. Or maybe those didn't get fully
merged!?

>
>> Finally, trying to make use of the ->stop|start() callbacks in genpd,
>> can be useful for SoC specific operations that we want to make
>> transparent for generic drivers. However, I wonder if dealing with
>> OPPs from there really makes sense, need to think more about that.
>
> To start with, I suggest we follow the principle of simplicity and
> separate active and idle power management operations as much as
> possible. We can always make it into a tangled mess later ;-)

If I understand, you are questioning part of the support for
performance states through genpd?

>
> I mentioned several times that I do not want to require OPP to be used
> for controlling performance. I'll try to explain that now:
>
> My main issue with OPP is that it is really *just a library* for dealing
> with voltage/frequency tuples. Over time it has added in some generic
> logic for setting clock rate and regulator voltage, but this is not a
> solution for everyone. It's fine to cover the simple cases, but complex
> SoCs will not find this approach useful.
>
> In fact, having dev_pm_opp_set_rate make a call to
> dev_pm_genpd_set_performance_state is completely backwards. Some genpd
> implementations may make use of the OPP library as a data store for
> supported frequency and voltage tuples, but not all of them will. Having
> the tuple library tell the genpd layer what to do is an upside down
> design.
>
> The changes to add performance awareness to genpd allows us to write
> "performance provider" drivers, which is good. But forcing the consumer
> api to be the OPP library is severely limiting.

The hole point of why we extended the infrastructure of genpd to deal
with performance states, was to take advantage of the already modeled
power topology (devices being attached to and grouped into
master/subdomains).

I understand that you more or less seems happy with this approach, but
not the consumer part of it. Okay, fair enough, I somewhat agree as
well.

>
> What is missing is a generic, less opinionated consumer api that gives
> performance provider drivers (aka genpd callbacks) the flexibility to
> support complex hardware and use cases. For example, how does the OPP
> library work for a firmware interface that controls performance? Or a
> message passing mechanism to a power management microcontroller? In
> either case, it might not make sense to model everything in terms of
> hertz or volts.
>
> No doubt genpd, the performance provider, can handle the firmware and pm
> microcontroller cases mentioned above. But the OPP library, the half of
> the interface used by the performance consumer, does not fit well.
>
> Put another way, there exists a set of performance scalable devices.
> Within that set is a subset of devices for which it makes sense to model
> their performance via clock rates. And within _that_ subset there exists
> a subset of devices for whom dev_pm_opp_set_rate will be sufficient for
> performance management. That leaves a lot of devices unaccounted for.

You are right, not all devices can use a clock rate as to request a
performance state. On the other hand, we seem still to talk about
OPPs, as "performance operating points". Right!?

Then, instead of trying to invent something completely new, why don't
we just consider to extend the OPP library with another API? Something
along the lines of, dev_pm_opp_set_performance_state().

What would happen behind that API, would then be OPP provider
specific. Some provider may call dev_pm_genpd_set_performance_state(),
but some don't.

>
> Finally, when it comes time to consider chip-wide current limits,
> adaptive voltage scaling, thermal considerations, body bias, etc, I
> think that the tables provided by the OPP library just won't cut it.
> We'll need SoC vendors to write real performance provider drivers and
> glue all of this together in that backend. The only way to make that
> work is to provide a less opinionated consumer-facing API that does not
> assume that everything fits neatly into a table comprised of frequency &
> voltage pairs.

Sure, this makes sense.

Although, the part that is still a bit fuzzy, is how and "OPP
provider" would interact with a "genpd provider". I guessing those
simply sometimes needs to be managed by the same "driver" (or whatever
piece of code that controls this).

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-31 20:02                 ` grahamr
@ 2018-08-23 13:20                   ` Ulf Hansson
  2018-09-18 23:00                     ` Michael Turquette
  2018-09-18 17:25                   ` Kevin Hilman
  1 sibling, 1 reply; 34+ messages in thread
From: Ulf Hansson @ 2018-08-23 13:20 UTC (permalink / raw)
  To: grahamr
  Cc: Peter De Schrijver, Michael Turquette, Stephen Boyd,
	Viresh Kumar, linux-clk, Linux PM, Doug Anderson, Taniya Das,
	Rajendra Nayak, Amit Nischal, Vincent Guittot, Amit Kucheria,
	linux-clk-owner

On 31 July 2018 at 22:02,  <grahamr@codeaurora.org> wrote:
> I have two significant concerns with using a wrapper framework to support
> DVFS.  The first may be Qualcomm-specific, and more of a practical concern
> rather than a principled one.  We (clock driver owners) get
> voltage/frequency operating points from the clock controller designers in a
> single data-set, it is not information that is provided to or coming from
> the clock consumer hardware or software teams.  Tracking, updating, and
> debugging frequency/voltage mismatches falls completely into the clock team,
> so moving the responsibility of handling the relationship in code to the
> consumer would be a disaster (SDM845 has almost 200 clock domains and over
> 800 clock branches - not all controlled by Linux but a large proportion
> are).  It would mean all clock consumers using any clk_set_rate or
> clk_enable APIs would need to make the appropriate voltage requests as well
> - and even if they could look up the required voltage based on frequency
> from some central location (which I expect would be delivered and maintained
> by the clock driver team) the chance that everyone would do that correctly
> is, frankly, zero.  The types of problems that arise from under-volting a
> clock generator or consumer are very hard to debug (since it can essentially
> result in random behavior).

Okay, I see.

Wouldn't a nice script that does a search/replace work out? :-)

>
> My second concern is that we do have voltage requirements coming from the
> clock controller itself, not related to the clock consumer.  The clock
> hardware components themselves (PLLs, muxes, etc) require voltage levels
> that may differ from that of the final consumer of the output clock.  We
> could hack all these requirements into the frequency/voltage tuple table
> that consumers look up, but at that point we have diverged fairly
> dramatically from Stephen's principle that the clock framework should not
> directly require any voltage - fundamentally they do (at least ours do).

This changes my view, as I didn't know about these kind of cases.

First, it seems like you need to associate a struct device with the
clock controller, such that it can be attached to its corresponding PM
domain (genpd). Of course, then you also needs to deploy runtime PM
support for the clock driver for this clock controller device. Do note
that runtime PM is already supported by the clock core, so should be
trivial. Why, because this is needed to properly allow genpd to
aggregates the votes for the PM domain(s), in case there are other
devices in the same PM domain (or if there are dependencies to
subdomains).

Also, if I am not mistaken, the runtime PM enablement is something
that is already being used (or at least tried out) for some Exynos
clock drivers. I recall there were some discussions around locking
issues around the runtime PM support in the clock core. Let me see if
can search the mail archive to see if I find out if/what went wrong, I
will come back to this.

Anyway, in regards to control the performance state for these clock
controller devices, to me it seems like there are no other way, but
explicitly allow clock drivers to call an "OPP API" to request a
performance state. Simply, because it's the clock driver that needs
the performance state for its device. Whether the "OPP API" is the
new, dev_pm_genpd_set_performance_state() or something not even
invented yet, is another question.

My conclusion so far is, that we seems to fall back to a potential
locking problem. In regards to that, I am wondering whether that is
actually more of hypothetical problem than a real problem for your
case.

> Our most recent patch that Taniya posted has gone in the direction similar
> to Tegra - instead of having the framework handle it, we use prepare and
> set_rate hooks to implement voltage (corner) voting in the qcom drivers via
> the new genpd.  This is not fully proven yet but is so far working well and
> will likely be our internal solution going forward if the framework
> consensus is to force consumers to manage their frequency-based voltage
> requirements themselves - I just do not see that as a practical solution for
> a complicated SoC with a large, highly distributed, clock tree.  That being
> said I do see potential future deadlock race conditions between clk and
> genpd which concern me - it remains a downside of that coupling.
>
> Would there be some way to prevent consumers from directly calling
> clk_set_rate or clk_enable and force them to go via another framework for
> these calls?  It would at least prevent people from using the "wrong"
> interface and bypassing voltage requirements.  That of course means having
> to mirror any of the clk APIs that update clock state into genpd/opp, which
> Stephen finds distasteful (and I agree).

I am not sure about this. Sound like an awful wrapper API.

However, what Mike seems to express the need for, is a common consumer
OPP API to set a performance state for a device, but also a way to to
allow SoC specific performance state providers to manage the backend
parts of actually changing the performance state.

I am wondering whether this could be a way forward, maybe not exactly
what you was looking for, but perhaps it can address your concerns?

[...]

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-07-31 20:02                 ` grahamr
  2018-08-23 13:20                   ` Ulf Hansson
@ 2018-09-18 17:25                   ` Kevin Hilman
  1 sibling, 0 replies; 34+ messages in thread
From: Kevin Hilman @ 2018-09-18 17:25 UTC (permalink / raw)
  To: grahamr
  Cc: Ulf Hansson, Peter De Schrijver, Michael Turquette, Stephen Boyd,
	Viresh Kumar, linux-clk, Linux PM, Doug Anderson, Taniya Das,
	Rajendra Nayak, Amit Nischal, Vincent Guittot, Amit Kucheria,
	linux-clk-owner

[ sorry, a little late to the party ]

[...]

> Would there be some way to prevent consumers from directly calling
> clk_set_rate or clk_enable and force them to go via another framework
> for these calls?  It would at least prevent people from using the
> "wrong" interface and bypassing voltage requirements.  That of course
> means having to mirror any of the clk APIs that update clock state
> into genpd/opp, which Stephen finds distasteful (and I agree).

Chiming in late to this thread with a a bit of historical perspecitve
(and over simplification): doing exactly this was part of the goal of of
genpd in the first place.

Before genpd, it used to be the case where lots of drivers only used the
clk API for idle power management (clock gating).  Essentially a bunch
of clk_enable/disable calls based on activity.  This didn't scale well
for lots of reasons, but a primary one was that the same IP could be
integrated differnetly on differnent SoCs, thus have different clocks
for gating (or no clock gating.)

With genpd, we could remove all of the clock-specific stuff from
consumers and replace with a more generic pm_runtime calls, putting the
intelligence, such as SoC integration knowledge and some goverors around
whether or not to *actually* gate clocks, into the genpd
callbacks/governors, which, being pluggable, could be highly SoC
specific.

IMO, at a high level, in this thread, we're having exactly the same
conversation again, with the difference being we're talking about
active/performace states and not idle states.

While active states are definitely more complex, and have more
dependencies, I don't see why the general approach on of the solution
should not be the same as it was for idle management.

Basically, I'm agreeing with the general direction this thread seems to
be going that using the new active/performance features of genpd, and
having genpd callbacks/goverors that Know All The Things seems like the
obvious direction to go.

However, the hard part remains, and that is defining the right consumer
layer to give all the hints/requests so that the super-smart genpd
governors can do the right thing.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-08-23 12:13                   ` Ulf Hansson
@ 2018-09-18 22:48                     ` Michael Turquette
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Turquette @ 2018-09-18 22:48 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Peter De Schrijver, Stephen Boyd, Viresh Kumar, grahamr,
	linux-clk, Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria

Quoting Ulf Hansson (2018-08-23 05:13:40)
> On 4 August 2018 at 01:05, Michael Turquette <mturquette@baylibre.com> wr=
ote:
> > Quoting Ulf Hansson (2018-07-31 04:56:46)
> >> On 25 July 2018 at 13:27, Peter De Schrijver <pdeschrijver@nvidia.com>=
 wrote:
> >> > On Tue, Jul 24, 2018 at 10:44:00PM -0700, Michael Turquette wrote:
> >> >> Quoting Stephen Boyd (2018-07-24 16:04:37)
> >> >> > Quoting Peter De Schrijver (2018-07-23 01:26:41)
> >> >> > > On Fri, Jul 20, 2018 at 10:12:29AM -0700, Stephen Boyd wrote:
> >> >> > > >
> >> >> > > > For one thing, a driver should be able to figure out what the
> >> >> > > > performance state requirement is for a particular frequency. =
I'd like to
> >> >> > > > see an API that a driver can pass something like a (device, g=
enpd, clk,
> >> >> > > > frequency) tuple and get back the performance state required =
for that
> >> >> > > > device's clk frequency within that genpd by querying OPP tabl=
es. If we
> >> >> > > > had this API, then SoC vendors could design OPP tables for th=
eir on-SoC
> >> >> > > > devices that describe the set of max frequencies a device can=
 operate at
> >> >> > > > for a specific performance state and driver authors would be =
able to
> >> >> > > > query that information and manually set genpd performance sta=
tes when
> >> >> > > > they change clk frequencies. In Qualcomm designs this would b=
e their
> >> >> > > > "fmax" tables that map a max frequency to a voltage corner. I=
f someone
> >> >> > > > wanted to fine tune that table and make it into a full freque=
ncy plan
> >> >> > > > OPP table for use by devfreq, then they could add more entrie=
s for all
> >> >> > > > the validated frequencies and voltage corners that are accept=
able and
> >> >> > > > tested and this API would still work. We'll need this sort of=
 table
> >> >> > > > regardless because we can't expect devices to search for an e=
xact
> >> >> > > > frequency in an OPP table when they can support hundreds of d=
ifferent
> >> >> > > > frequencies, like in display or audio situations.
> >> >> > > >
> >> >> > >
> >> >> > > Various reasons why I think the driver is not the right place t=
o handle
> >> >> > > the V/f relationship:
> >> >> > >
> >> >> > > 1) The V/f relationship is temperature dependent. So the voltag=
e may have
> >> >> > >    to be adjusted when the temperature changes. I don't think w=
e should
> >> >> > >    make every driver handle this on its own.
> >> >> >
> >> >> > This is AVS? Should be fine to plumb that into some sort of volta=
ge
> >> >> > domain that gets temperature feedback and then adjusts the voltag=
e based
> >> >
> >> > For the core rail, it seems the voltage is indeed just adjusted base=
d on
> >> > temperature. For the GPU rail, we have equations which calculate the=
 required
> >> > voltage as a function of frequency and temperature. In some cases I =
think
> >> > we just cap the frequency if the temperature would be too high to fi=
nd
> >> > a suitable voltage. Fortunately the GPU has its own rail, so it does=
n't
> >> > necessarily need to be handled the same way.
> >> >
> >> >> > on that? This is basically the same as Qualcomm's "voltage corner=
s" by
> >> >> > the way, just that the voltage is adjusted outside of the Linux k=
ernel
> >> >> > by another processor when the temperature changes.
> >> >>
> >> >> Ack to what Stephen said above. Adaptive voltage scaling, corners, =
body
> >> >> bias, SMPS modes/efficiency, etc are all just implementation detail=
s.
> >> >>
> >> >> I don't think anyone is suggesting for drivers to take all of the a=
bove
> >> >> into account when setting voltage. I would imagine either a "nomina=
l"
> >> >> voltage, a voltage "index" or a performance state to be passed from=
 the
> >> >> driver into the genpd layer.
> >> >>
> >> >> Peter would that work for you?
> >> >>
> >> >
> >> > A voltage index should do I think. The reason for a voltage index is=
 that
> >> > we have at least one case, where the voltage depends on the mode of =
the
> >> > device. This is the case for the HDMI/DP output serializers (SORx). =
The
> >> > required voltage doesn't only depend on the pixel rate, but also on =
the
> >> > mode (DP or HDMI). One danger is that we must make sure all
> >> > drivers of devices sharing a rail, use this API to set their voltage
> >> > requirement. If not, weird failures will show up.
> >>
> >> I am trying to understand the proposal, but it seems like I am failing=
. Sorry.
> >>
> >> Genpd already have an API for requesting a performance state for a
> >> device, dev_pm_genpd_set_performance_state() - and genpd aggregates
> >> the votes per PM domain.
> >>
> >> The idea so far, is that most votes should come via the OPP layer,
> >> dev_pm_opp_set_rate(), but for special cases users may decide to call
> >> dev_pm_genpd_set_performance_state() directly.
> >>
> >> So far so good?
> >
> > This almost matches what I had in mind.
> =

> Ok, a step in the right direction. :-)

Wow, I completely missed this response! Sorry for only seeing it just
now.

> =

> >
> >>
> >> Following the conversation, it seems like the suggestion is to *not*
> >> go through the OPP layer, but instead always let drivers call
> >> dev_pm_genpd_set_performance_state(). Why is that a benefit and why is
> >> that preferred?
> >
> > I think that consumer drivers should not directly invoke the genpd apis
> > for performance power management, but instead access them through a
> > pm_runtime interface, just as they do when managing idle power
> > management.
> >
> > This pm_runtime performance interface does not exist yet. There is an
> > argument that the opp library is this interface, but I'll address that
> > at the very end of this message.
> >
> >>
> >> If I understood the main concern raised by Graham/Stephen, was that we
> >> didn't want to make the driver call yet another API to manage OPPs. So
> >> then calling dev_pm_genpd_set_performance_state() is fine, but not
> >> dev_pm_opp_set_rate()?
> >
> > Speaking only for myself, I want a single performance interface that
> > encapsulates clocks, regulators, pm microcontrollers, firmware
> > interfaces or whatever are required on the SoC to achieve the
> > performance level desired.
> >
> > There will likely be a need for some consumer drivers to mix calls to
> > low-level primitives (clks, regulators) with this new pm_runtime
> > performance api, but that is the exception to the rule.
> >
> >>
> >> Moreover, how would the driver know which performance state it shall
> >> request? Sounds like a OPP translation is needed anyways, so even more
> >> things falls into the responsibility of the driver, in case it shall
> >> call dev_pm_genpd_set_performance_state(). It doesn't sound right to
> >> me.
> >
> > Translation implies that consumer drivers are making requests in Hertz
> > to begin with. I think we should challenge this assumption.  Devicetree
> > gives us the tools we need to link SoC-specific performance states to
> > Driver-specific use cases and configurations. Clock rates will be the
> > right way to model this for many devices, but arbitrary perf indexes
> > will also make sense for others. Translation should happen in DT.
> =

> Right.
> =

> As a matter of fact we already have some DT bindings to translate
> clock/voltage to perf states for QCOM. Or maybe those didn't get fully
> merged!?
> =

> >
> >> Finally, trying to make use of the ->stop|start() callbacks in genpd,
> >> can be useful for SoC specific operations that we want to make
> >> transparent for generic drivers. However, I wonder if dealing with
> >> OPPs from there really makes sense, need to think more about that.
> >
> > To start with, I suggest we follow the principle of simplicity and
> > separate active and idle power management operations as much as
> > possible. We can always make it into a tangled mess later ;-)
> =

> If I understand, you are questioning part of the support for
> performance states through genpd?
> =

> >
> > I mentioned several times that I do not want to require OPP to be used
> > for controlling performance. I'll try to explain that now:
> >
> > My main issue with OPP is that it is really *just a library* for dealing
> > with voltage/frequency tuples. Over time it has added in some generic
> > logic for setting clock rate and regulator voltage, but this is not a
> > solution for everyone. It's fine to cover the simple cases, but complex
> > SoCs will not find this approach useful.
> >
> > In fact, having dev_pm_opp_set_rate make a call to
> > dev_pm_genpd_set_performance_state is completely backwards. Some genpd
> > implementations may make use of the OPP library as a data store for
> > supported frequency and voltage tuples, but not all of them will. Having
> > the tuple library tell the genpd layer what to do is an upside down
> > design.
> >
> > The changes to add performance awareness to genpd allows us to write
> > "performance provider" drivers, which is good. But forcing the consumer
> > api to be the OPP library is severely limiting.
> =

> The hole point of why we extended the infrastructure of genpd to deal
> with performance states, was to take advantage of the already modeled
> power topology (devices being attached to and grouped into
> master/subdomains).
> =

> I understand that you more or less seems happy with this approach, but
> not the consumer part of it. Okay, fair enough, I somewhat agree as
> well.

Correct.

> =

> >
> > What is missing is a generic, less opinionated consumer api that gives
> > performance provider drivers (aka genpd callbacks) the flexibility to
> > support complex hardware and use cases. For example, how does the OPP
> > library work for a firmware interface that controls performance? Or a
> > message passing mechanism to a power management microcontroller? In
> > either case, it might not make sense to model everything in terms of
> > hertz or volts.
> >
> > No doubt genpd, the performance provider, can handle the firmware and pm
> > microcontroller cases mentioned above. But the OPP library, the half of
> > the interface used by the performance consumer, does not fit well.
> >
> > Put another way, there exists a set of performance scalable devices.
> > Within that set is a subset of devices for which it makes sense to model
> > their performance via clock rates. And within _that_ subset there exists
> > a subset of devices for whom dev_pm_opp_set_rate will be sufficient for
> > performance management. That leaves a lot of devices unaccounted for.
> =

> You are right, not all devices can use a clock rate as to request a
> performance state. On the other hand, we seem still to talk about
> OPPs, as "performance operating points". Right!?
> =

> Then, instead of trying to invent something completely new, why don't
> we just consider to extend the OPP library with another API? Something
> along the lines of, dev_pm_opp_set_performance_state().

Because it's sort of a layering violation and the consumer api is
incongruent with the existing runtime pm stuff used for idle.

A genpd performance provider might want to use the OPP library as a part
of its own internal state machine (ie setting clocks and regulators).
Isn't it weird to have that be the same layer that consumers call into?

Furthermore, there aren't a lot of upstream examples of
dev_pm_opp_set_rate. I only see it used in drivers/cpufreq/cpufreq-dt.c
as of 4.19-rc4. So there is no inertia preventing us from considering
using the pm runtime interface (which nicely handles the consumer side
of genpd idle operations) to handle the consumer side of the genpd
active/performance operations.

Thus now is the time to decide whether or not to introduce
dev_pm_opp_set_performance_state. There will be no upstream impact on
"existing consumers" to migrate them away from the OPP library since
there are basically no existing user except for cpufreq-dt.

My vote is to not introduce a new api into the OPP library and
instead explore whether performance/active consumer apis could co-exist
with their idle counterparts in the pm runtime layer.

> =

> What would happen behind that API, would then be OPP provider
> specific. Some provider may call dev_pm_genpd_set_performance_state(),
> but some don't.
> =

> >
> > Finally, when it comes time to consider chip-wide current limits,
> > adaptive voltage scaling, thermal considerations, body bias, etc, I
> > think that the tables provided by the OPP library just won't cut it.
> > We'll need SoC vendors to write real performance provider drivers and
> > glue all of this together in that backend. The only way to make that
> > work is to provide a less opinionated consumer-facing API that does not
> > assume that everything fits neatly into a table comprised of frequency &
> > voltage pairs.
> =

> Sure, this makes sense.
> =

> Although, the part that is still a bit fuzzy, is how and "OPP
> provider" would interact with a "genpd provider". I guessing those
> simply sometimes needs to be managed by the same "driver" (or whatever
> piece of code that controls this).

I think genpd is the same as "performance provider" or "idle provider".
If a genpd's machine-specific backend uses OPP then it can be an OPP
provider too, but if not then no big deal.

My thinking today is that for performance awareness and DVFS in Linux,
I'd like for genpd to be mandatory and OPP to be optional.

Best regards,
Mike

p.s: One final rant here about using the OPP clock/regulator logic in
dev_pm_opp_set_rate as some sort of measuring stick:

Setting CPU performance is just about the simplest use case for any
performance scaling on modern ARM platforms (that do not use SCPI or
whatever). CPUs typically have a dedicated clock subtree with no impact
to other IPs, dedicated power rails, and they are glitchless (meaning
that frequency changes happen without having to gate the logic or stall
accesses or synchronize busses or whatever). We should try to design
something that can scale to the complex use cases where a PLL drives a
ton of peripherals that have conflicting needs, etc. Scaling CPU
frequency is a very low bar to meet.

> =

> Kind regards
> Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-08-23 13:20                   ` Ulf Hansson
@ 2018-09-18 23:00                     ` Michael Turquette
  2018-09-19  7:05                       ` Geert Uytterhoeven
  2018-09-25 21:26                       ` grahamr
  0 siblings, 2 replies; 34+ messages in thread
From: Michael Turquette @ 2018-09-18 23:00 UTC (permalink / raw)
  To: Ulf Hansson, grahamr
  Cc: Peter De Schrijver, Stephen Boyd, Viresh Kumar, linux-clk,
	Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria, linux-clk-owner

Quoting Ulf Hansson (2018-08-23 06:20:11)
> On 31 July 2018 at 22:02,  <grahamr@codeaurora.org> wrote:
> > I have two significant concerns with using a wrapper framework to suppo=
rt
> > DVFS.  The first may be Qualcomm-specific, and more of a practical conc=
ern
> > rather than a principled one.  We (clock driver owners) get
> > voltage/frequency operating points from the clock controller designers =
in a
> > single data-set, it is not information that is provided to or coming fr=
om
> > the clock consumer hardware or software teams.  Tracking, updating, and
> > debugging frequency/voltage mismatches falls completely into the clock =
team,
> > so moving the responsibility of handling the relationship in code to the
> > consumer would be a disaster (SDM845 has almost 200 clock domains and o=
ver
> > 800 clock branches - not all controlled by Linux but a large proportion
> > are).  It would mean all clock consumers using any clk_set_rate or
> > clk_enable APIs would need to make the appropriate voltage requests as =
well
> > - and even if they could look up the required voltage based on frequency
> > from some central location (which I expect would be delivered and maint=
ained
> > by the clock driver team) the chance that everyone would do that correc=
tly
> > is, frankly, zero.  The types of problems that arise from under-volting=
 a
> > clock generator or consumer are very hard to debug (since it can essent=
ially
> > result in random behavior).
> =

> Okay, I see.
> =

> Wouldn't a nice script that does a search/replace work out? :-)
> =

> >
> > My second concern is that we do have voltage requirements coming from t=
he
> > clock controller itself, not related to the clock consumer.  The clock
> > hardware components themselves (PLLs, muxes, etc) require voltage levels
> > that may differ from that of the final consumer of the output clock.  We
> > could hack all these requirements into the frequency/voltage tuple table
> > that consumers look up, but at that point we have diverged fairly
> > dramatically from Stephen's principle that the clock framework should n=
ot
> > directly require any voltage - fundamentally they do (at least ours do).
> =

> This changes my view, as I didn't know about these kind of cases.
> =

> First, it seems like you need to associate a struct device with the
> clock controller, such that it can be attached to its corresponding PM
> domain (genpd). Of course, then you also needs to deploy runtime PM
> support for the clock driver for this clock controller device. Do note
> that runtime PM is already supported by the clock core, so should be
> trivial. Why, because this is needed to properly allow genpd to
> aggregates the votes for the PM domain(s), in case there are other
> devices in the same PM domain (or if there are dependencies to
> subdomains).

Your struct device can be considered as Done. Stephen and I have been
forcing clock driver authors to write proper platform drivers for a
while now.

> =

> Also, if I am not mistaken, the runtime PM enablement is something
> that is already being used (or at least tried out) for some Exynos
> clock drivers. I recall there were some discussions around locking
> issues around the runtime PM support in the clock core. Let me see if
> can search the mail archive to see if I find out if/what went wrong, I
> will come back to this.

This was mostly related to idle power management issues, but yes there
is some basic runtime pm awareness in the clock framework.

> =

> Anyway, in regards to control the performance state for these clock
> controller devices, to me it seems like there are no other way, but
> explicitly allow clock drivers to call an "OPP API" to request a
> performance state. Simply, because it's the clock driver that needs
> the performance state for its device. Whether the "OPP API" is the
> new, dev_pm_genpd_set_performance_state() or something not even
> invented yet, is another question.

I completely agree, with the exception that I don't think it will be an
"OPP API" but instead I hope it will be some runtime pm performance api.

> =

> My conclusion so far is, that we seems to fall back to a potential
> locking problem. In regards to that, I am wondering whether that is
> actually more of hypothetical problem than a real problem for your
> case.

For reference, this is why we allow reentrancy into the clock framework.
It is common that consumer A calls clk_set_rate to set clock X to a
rate, but in order for clock X to acheive that rate the clock provider
might need to call clk_set_rate on another clock. We support reentrancy
for this type of case.

The problem described by Graham seems analogous. There are times when a
performance provider itself will need to adjust it's own performance (as
consumed by some other parent provider). I'm under the impression that
runtime pm allows reentrancy and genpd allows for nested genpds, so
hopefully this should Just Work.

> =

> > Our most recent patch that Taniya posted has gone in the direction simi=
lar
> > to Tegra - instead of having the framework handle it, we use prepare and
> > set_rate hooks to implement voltage (corner) voting in the qcom drivers=
 via
> > the new genpd.  This is not fully proven yet but is so far working well=
 and
> > will likely be our internal solution going forward if the framework
> > consensus is to force consumers to manage their frequency-based voltage
> > requirements themselves - I just do not see that as a practical solutio=
n for
> > a complicated SoC with a large, highly distributed, clock tree.  That b=
eing
> > said I do see potential future deadlock race conditions between clk and
> > genpd which concern me - it remains a downside of that coupling.
> >
> > Would there be some way to prevent consumers from directly calling
> > clk_set_rate or clk_enable and force them to go via another framework f=
or
> > these calls?  It would at least prevent people from using the "wrong"
> > interface and bypassing voltage requirements.  That of course means hav=
ing
> > to mirror any of the clk APIs that update clock state into genpd/opp, w=
hich
> > Stephen finds distasteful (and I agree).
> =

> I am not sure about this. Sound like an awful wrapper API.

Yeah, overloading the prepare callbacks is just a symptom of the greater
problem: we don't have a real DVFS api.

Regards,
Mike

> =

> However, what Mike seems to express the need for, is a common consumer
> OPP API to set a performance state for a device, but also a way to to
> allow SoC specific performance state providers to manage the backend
> parts of actually changing the performance state.
> =

> I am wondering whether this could be a way forward, maybe not exactly
> what you was looking for, but perhaps it can address your concerns?
> =

> [...]
> =

> Kind regards
> Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-09-18 23:00                     ` Michael Turquette
@ 2018-09-19  7:05                       ` Geert Uytterhoeven
  2018-09-19 18:07                         ` Michael Turquette
  2018-09-25 21:26                       ` grahamr
  1 sibling, 1 reply; 34+ messages in thread
From: Geert Uytterhoeven @ 2018-09-19  7:05 UTC (permalink / raw)
  To: Michael Turquette
  Cc: Ulf Hansson, grahamr, Peter De Schrijver, Stephen Boyd,
	Viresh Kumar, linux-clk, Linux PM list, Doug Anderson, tdas,
	Rajendra Nayak, anischal, Vincent Guittot, amit.kucheria,
	linux-clk-owner

On Wed, Sep 19, 2018 at 1:01 AM Michael Turquette
<mturquette@baylibre.com> wrote:
> Quoting Ulf Hansson (2018-08-23 06:20:11)
> > Anyway, in regards to control the performance state for these clock
> > controller devices, to me it seems like there are no other way, but
> > explicitly allow clock drivers to call an "OPP API" to request a
> > performance state. Simply, because it's the clock driver that needs
> > the performance state for its device. Whether the "OPP API" is the
> > new, dev_pm_genpd_set_performance_state() or something not even
> > invented yet, is another question.
>
> I completely agree, with the exception that I don't think it will be an
> "OPP API" but instead I hope it will be some runtime pm performance api.
>
> > My conclusion so far is, that we seems to fall back to a potential
> > locking problem. In regards to that, I am wondering whether that is
> > actually more of hypothetical problem than a real problem for your
> > case.
>
> For reference, this is why we allow reentrancy into the clock framework.
> It is common that consumer A calls clk_set_rate to set clock X to a
> rate, but in order for clock X to acheive that rate the clock provider
> might need to call clk_set_rate on another clock. We support reentrancy
> for this type of case.
>
> The problem described by Graham seems analogous. There are times when a
> performance provider itself will need to adjust it's own performance (as
> consumed by some other parent provider). I'm under the impression that
> runtime pm allows reentrancy and genpd allows for nested genpds, so
> hopefully this should Just Work.

BTW, from time to time, lockdep spews out warnings about genpd/clock
interaction. I believe they are false positives.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-09-19  7:05                       ` Geert Uytterhoeven
@ 2018-09-19 18:07                         ` Michael Turquette
  2018-09-25 13:11                           ` Geert Uytterhoeven
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Turquette @ 2018-09-19 18:07 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Ulf Hansson, grahamr, Peter De Schrijver, Stephen Boyd,
	Viresh Kumar, linux-clk, Linux PM list, Doug Anderson, tdas,
	Rajendra Nayak, anischal, Vincent Guittot, amit.kucheria,
	linux-clk-owner

Quoting Geert Uytterhoeven (2018-09-19 00:05:59)
> On Wed, Sep 19, 2018 at 1:01 AM Michael Turquette
> <mturquette@baylibre.com> wrote:
> > Quoting Ulf Hansson (2018-08-23 06:20:11)
> > > Anyway, in regards to control the performance state for these clock
> > > controller devices, to me it seems like there are no other way, but
> > > explicitly allow clock drivers to call an "OPP API" to request a
> > > performance state. Simply, because it's the clock driver that needs
> > > the performance state for its device. Whether the "OPP API" is the
> > > new, dev_pm_genpd_set_performance_state() or something not even
> > > invented yet, is another question.
> >
> > I completely agree, with the exception that I don't think it will be an
> > "OPP API" but instead I hope it will be some runtime pm performance api.
> >
> > > My conclusion so far is, that we seems to fall back to a potential
> > > locking problem. In regards to that, I am wondering whether that is
> > > actually more of hypothetical problem than a real problem for your
> > > case.
> >
> > For reference, this is why we allow reentrancy into the clock framework.
> > It is common that consumer A calls clk_set_rate to set clock X to a
> > rate, but in order for clock X to acheive that rate the clock provider
> > might need to call clk_set_rate on another clock. We support reentrancy
> > for this type of case.
> >
> > The problem described by Graham seems analogous. There are times when a
> > performance provider itself will need to adjust it's own performance (as
> > consumed by some other parent provider). I'm under the impression that
> > runtime pm allows reentrancy and genpd allows for nested genpds, so
> > hopefully this should Just Work.
> =

> BTW, from time to time, lockdep spews out warnings about genpd/clock
> interaction. I believe they are false positives.

I'm happy to take a look at a spew if you can capture it and tell me how
to reproduce.

Thanks,
Mike

> =

> Gr{oetje,eeting}s,
> =

>                         Geert
> =

> -- =

> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m6=
8k.org
> =

> In personal conversations with technical people, I call myself a hacker. =
But
> when I'm talking to journalists I just say "programmer" or something like=
 that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-09-19 18:07                         ` Michael Turquette
@ 2018-09-25 13:11                           ` Geert Uytterhoeven
  0 siblings, 0 replies; 34+ messages in thread
From: Geert Uytterhoeven @ 2018-09-25 13:11 UTC (permalink / raw)
  To: Michael Turquette
  Cc: Ulf Hansson, grahamr, Peter De Schrijver, Stephen Boyd,
	Viresh Kumar, linux-clk, Linux PM list, Doug Anderson, tdas,
	Rajendra Nayak, anischal, Vincent Guittot, amit.kucheria,
	linux-clk-owner

Hi Mike,

On Wed, Sep 19, 2018 at 8:16 PM Michael Turquette
<mturquette@baylibre.com> wrote:
> Quoting Geert Uytterhoeven (2018-09-19 00:05:59)
> > On Wed, Sep 19, 2018 at 1:01 AM Michael Turquette
> > <mturquette@baylibre.com> wrote:
> > > Quoting Ulf Hansson (2018-08-23 06:20:11)
> > > > Anyway, in regards to control the performance state for these clock
> > > > controller devices, to me it seems like there are no other way, but
> > > > explicitly allow clock drivers to call an "OPP API" to request a
> > > > performance state. Simply, because it's the clock driver that needs
> > > > the performance state for its device. Whether the "OPP API" is the
> > > > new, dev_pm_genpd_set_performance_state() or something not even
> > > > invented yet, is another question.
> > >
> > > I completely agree, with the exception that I don't think it will be an
> > > "OPP API" but instead I hope it will be some runtime pm performance api.
> > >
> > > > My conclusion so far is, that we seems to fall back to a potential
> > > > locking problem. In regards to that, I am wondering whether that is
> > > > actually more of hypothetical problem than a real problem for your
> > > > case.
> > >
> > > For reference, this is why we allow reentrancy into the clock framework.
> > > It is common that consumer A calls clk_set_rate to set clock X to a
> > > rate, but in order for clock X to acheive that rate the clock provider
> > > might need to call clk_set_rate on another clock. We support reentrancy
> > > for this type of case.
> > >
> > > The problem described by Graham seems analogous. There are times when a
> > > performance provider itself will need to adjust it's own performance (as
> > > consumed by some other parent provider). I'm under the impression that
> > > runtime pm allows reentrancy and genpd allows for nested genpds, so
> > > hopefully this should Just Work.
> >
> > BTW, from time to time, lockdep spews out warnings about genpd/clock
> > interaction. I believe they are false positives.
>
> I'm happy to take a look at a spew if you can capture it and tell me how
> to reproduce.

Unfortunately I cannot trigger it at will.
Just saw it again (on Renesas Salvator-XS with R-Car H3 ES2.0), so here's
the spew:

    genpd_add_device: Add e6601000.crypto to always-on

    ======================================================
    WARNING: possible circular locking dependency detected
    4.19.0-rc5-arm64-renesas-03635-gf5e6b692aefccbe2 #55 Not tainted
    ------------------------------------------------------
    swapper/0/1 is trying to acquire lock:
    (____ptrval____) (prepare_lock){+.+.}, at: clk_prepare_lock+0x48/0xa8

    but task is already holding lock:
    (____ptrval____) (of_clk_mutex){+.+.}, at:
__of_clk_get_from_provider.part.23+0x3c/0x120

    which lock already depends on the new lock.


    the existing dependency chain (in reverse order) is:

    -> #5 (of_clk_mutex){+.+.}:
   __mutex_lock+0x70/0x7f8
   mutex_lock_nested+0x1c/0x28
   __of_clk_get_from_provider.part.23+0x3c/0x120
   of_clk_get_from_provider+0x20/0x30
   cpg_mssr_attach_dev+0xb0/0x1c0
   genpd_add_device.constprop.12+0x94/0x270
   __genpd_dev_pm_attach+0xb4/0x220
   genpd_dev_pm_attach+0x64/0x70
   dev_pm_domain_attach+0x1c/0x30
   platform_drv_probe+0x38/0xa0
   really_probe+0x1c4/0x298
   driver_probe_device+0x58/0x100
   __driver_attach+0xdc/0xe0
   bus_for_each_dev+0x74/0xc8
   driver_attach+0x20/0x28
   bus_add_driver+0x1a4/0x210
   driver_register+0x60/0x110
   __platform_driver_register+0x40/0x48
   sh_mobile_i2c_adap_init+0x18/0x20
   do_one_initcall+0x180/0x36c
   kernel_init_freeable+0x470/0x518
   kernel_init+0x10/0x108
   ret_from_fork+0x10/0x1c

    -> #4 (&genpd->mlock){+.+.}:
   __mutex_lock+0x70/0x7f8
   mutex_lock_nested+0x1c/0x28
   genpd_lock_mtx+0x14/0x20
   genpd_runtime_resume+0xbc/0x240
   __rpm_callback+0xdc/0x228
   rpm_callback+0x20/0x80
   rpm_resume+0x4e0/0x7b0
   __pm_runtime_resume+0x50/0x78
   usb_dmac_alloc_chan_resources+0x88/0xa0
   dma_chan_get+0x28/0x80
   find_candidate+0x108/0x1a8
   __dma_request_channel+0x68/0xc8
   usb_dmac_of_xlate+0x60/0x88
   of_dma_request_slave_channel+0x174/0x280
   dma_request_chan+0x38/0x1d8
   usbhsf_dma_init.isra.8+0x64/0x128
   usbhs_fifo_probe+0x70/0x148
   usbhs_probe+0x2e4/0x670
   platform_drv_probe+0x50/0xa0
   really_probe+0x1c4/0x298
   driver_probe_device+0x58/0x100
   __driver_attach+0xdc/0xe0
   bus_for_each_dev+0x74/0xc8
   driver_attach+0x20/0x28
   bus_add_driver+0x1a4/0x210
   driver_register+0x60/0x110
   __platform_driver_register+0x40/0x48
   renesas_usbhs_driver_init+0x18/0x20
   do_one_initcall+0x180/0x36c
   kernel_init_freeable+0x470/0x518
   kernel_init+0x10/0x108
   ret_from_fork+0x10/0x1c

    -> #3 (dma_list_mutex){+.+.}:
   __mutex_lock+0x70/0x7f8
   mutex_lock_nested+0x1c/0x28
   __dma_request_channel+0x38/0xc8
   usb_dmac_of_xlate+0x60/0x88
   of_dma_request_slave_channel+0x174/0x280
   dma_request_chan+0x38/0x1d8
   usbhsf_dma_init.isra.8+0x64/0x128
   usbhs_fifo_probe+0x70/0x148
   usbhs_probe+0x2e4/0x670
   platform_drv_probe+0x50/0xa0
   really_probe+0x1c4/0x298
   driver_probe_device+0x58/0x100
   __driver_attach+0xdc/0xe0
   bus_for_each_dev+0x74/0xc8
   driver_attach+0x20/0x28
   bus_add_driver+0x1a4/0x210
   driver_register+0x60/0x110
   __platform_driver_register+0x40/0x48
   renesas_usbhs_driver_init+0x18/0x20
   do_one_initcall+0x180/0x36c
   kernel_init_freeable+0x470/0x518
   kernel_init+0x10/0x108
   ret_from_fork+0x10/0x1c

    -> #2 (of_dma_lock){+.+.}:
   __mutex_lock+0x70/0x7f8
   mutex_lock_nested+0x1c/0x28
   of_dma_request_slave_channel+0x130/0x280
   dma_request_chan+0x38/0x1d8
   rcar_i2c_master_xfer+0x154/0x480
   __i2c_transfer+0x148/0x8e0
   i2c_smbus_xfer_emulated+0x158/0x5b0
   __i2c_smbus_xfer+0x17c/0x7f8
   i2c_smbus_xfer+0x64/0x98
   i2c_smbus_read_byte_data+0x40/0x70
   cs2000_bset.isra.1+0x2c/0x68
   __cs2000_set_rate.constprop.7+0x80/0x148
   cs2000_probe+0xd4/0x260
   i2c_device_probe+0x290/0x2b0
   really_probe+0x1c4/0x298
   driver_probe_device+0x58/0x100
   __device_attach_driver+0x90/0xd0
   bus_for_each_drv+0x64/0xc8
   __device_attach+0xd8/0x130
   device_initial_probe+0x10/0x18
   bus_probe_device+0x98/0xa0
   device_add+0x3f8/0x620
   device_register+0x1c/0x28
   i2c_new_device+0x12c/0x310
   of_i2c_register_device+0x74/0x98
   of_i2c_register_devices+0xa4/0x150
   i2c_register_adapter+0x160/0x418
   __i2c_add_numbered_adapter+0x60/0x98
   i2c_add_adapter+0xa4/0xd0
   i2c_add_numbered_adapter+0x24/0x30
   rcar_i2c_probe+0x33c/0x450
   platform_drv_probe+0x50/0xa0
   really_probe+0x1c4/0x298
   driver_probe_device+0x58/0x100
   __driver_attach+0xdc/0xe0
   bus_for_each_dev+0x74/0xc8
   driver_attach+0x20/0x28
   bus_add_driver+0x1a4/0x210
   driver_register+0x60/0x110
   __platform_driver_register+0x40/0x48
   rcar_i2c_driver_init+0x18/0x20
   do_one_initcall+0x180/0x36c
   kernel_init_freeable+0x470/0x518
   kernel_init+0x10/0x108
   ret_from_fork+0x10/0x1c

    -> #1 (i2c_register_adapter){+.+.}:
   rt_mutex_lock_nested+0x2c/0x48
   i2c_adapter_lock_bus+0x20/0x30
   i2c_smbus_xfer+0x44/0x98
   i2c_smbus_read_byte_data+0x40/0x70
   cs2000_recalc_rate+0x38/0x90
   clk_register+0x3b8/0x6e8
   clk_hw_register+0xc/0x20
   cs2000_probe+0x134/0x260
   i2c_device_probe+0x290/0x2b0
   really_probe+0x1c4/0x298
   driver_probe_device+0x58/0x100
   __device_attach_driver+0x90/0xd0
   bus_for_each_drv+0x64/0xc8
   __device_attach+0xd8/0x130
   device_initial_probe+0x10/0x18
   bus_probe_device+0x98/0xa0
   device_add+0x3f8/0x620
   device_register+0x1c/0x28
   i2c_new_device+0x12c/0x310
   of_i2c_register_device+0x74/0x98
   of_i2c_register_devices+0xa4/0x150
   i2c_register_adapter+0x160/0x418
   __i2c_add_numbered_adapter+0x60/0x98
   i2c_add_adapter+0xa4/0xd0
   i2c_add_numbered_adapter+0x24/0x30
   rcar_i2c_probe+0x33c/0x450
   platform_drv_probe+0x50/0xa0
   really_probe+0x1c4/0x298
   driver_probe_device+0x58/0x100
   __driver_attach+0xdc/0xe0
   bus_for_each_dev+0x74/0xc8
   driver_attach+0x20/0x28
   bus_add_driver+0x1a4/0x210
   driver_register+0x60/0x110
   __platform_driver_register+0x40/0x48
   rcar_i2c_driver_init+0x18/0x20
   do_one_initcall+0x180/0x36c
   kernel_init_freeable+0x470/0x518
   kernel_init+0x10/0x108
   ret_from_fork+0x10/0x1c

    -> #0 (prepare_lock){+.+.}:
   lock_acquire+0xc0/0x228
   __mutex_lock+0x70/0x7f8
   mutex_lock_nested+0x1c/0x28
   clk_prepare_lock+0x48/0xa8
   __clk_create_clk.part.21+0x68/0xa0
   __of_clk_get_from_provider.part.23+0xd0/0x120
   __of_clk_get_from_provider+0x10/0x20
   __of_clk_get+0x88/0xa0
   __of_clk_get_by_name+0xa4/0x130
   clk_get+0x30/0x70
   devm_clk_get+0x4c/0xa8
   ccree_probe+0xa4/0x5f8
   platform_drv_probe+0x50/0xa0
   really_probe+0x1c4/0x298
   driver_probe_device+0x58/0x100
   __driver_attach+0xdc/0xe0
   bus_for_each_dev+0x74/0xc8
   driver_attach+0x20/0x28
   bus_add_driver+0x1a4/0x210
   driver_register+0x60/0x110
   __platform_driver_register+0x40/0x48
   ccree_init+0x24/0x2c
   do_one_initcall+0x180/0x36c
   kernel_init_freeable+0x470/0x518
   kernel_init+0x10/0x108
   ret_from_fork+0x10/0x1c

    other info that might help us debug this:

    Chain exists of:
      prepare_lock --> &genpd->mlock --> of_clk_mutex

     Possible unsafe locking scenario:

   CPU0                    CPU1
   ----                    ----
      lock(of_clk_mutex);
   lock(&genpd->mlock);
   lock(of_clk_mutex);
      lock(prepare_lock);

     *** DEADLOCK ***

    2 locks held by swapper/0/1:
     #0: (____ptrval____) (&dev->mutex){....}, at: __driver_attach+0x5c/0xe0
     #1: (____ptrval____) (of_clk_mutex){+.+.}, at:
__of_clk_get_from_provider.part.23+0x3c/0x120

    stack backtrace:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.19.0-rc5-arm64-renesas-03635-gf5e6b692aefccbe2 #55
    Hardware name: Renesas Salvator-X 2nd version board based on
r8a7795 ES2.0+ (DT)
    Call trace:
     dump_backtrace+0x0/0x1c8
     show_stack+0x14/0x20
     dump_stack+0xbc/0xf4
     print_circular_bug.isra.19+0x1d4/0x2e8
     __lock_acquire+0x1804/0x1868
     lock_acquire+0xc0/0x228
     __mutex_lock+0x70/0x7f8
     mutex_lock_nested+0x1c/0x28
     clk_prepare_lock+0x48/0xa8
     __clk_create_clk.part.21+0x68/0xa0
     __of_clk_get_from_provider.part.23+0xd0/0x120
     __of_clk_get_from_provider+0x10/0x20
     __of_clk_get+0x88/0xa0
     __of_clk_get_by_name+0xa4/0x130
     clk_get+0x30/0x70
     devm_clk_get+0x4c/0xa8
     ccree_probe+0xa4/0x5f8
     platform_drv_probe+0x50/0xa0
     really_probe+0x1c4/0x298
     driver_probe_device+0x58/0x100
     __driver_attach+0xdc/0xe0
     bus_for_each_dev+0x74/0xc8
     driver_attach+0x20/0x28
     bus_add_driver+0x1a4/0x210
     driver_register+0x60/0x110
     __platform_driver_register+0x40/0x48
     ccree_init+0x24/0x2c
     do_one_initcall+0x180/0x36c
     kernel_init_freeable+0x470/0x518
     kernel_init+0x10/0x108
     ret_from_fork+0x10/0x1c

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-09-18 23:00                     ` Michael Turquette
  2018-09-19  7:05                       ` Geert Uytterhoeven
@ 2018-09-25 21:26                       ` grahamr
  2018-10-01 19:00                         ` Michael Turquette
  1 sibling, 1 reply; 34+ messages in thread
From: grahamr @ 2018-09-25 21:26 UTC (permalink / raw)
  To: Michael Turquette
  Cc: Ulf Hansson, Peter De Schrijver, Stephen Boyd, Viresh Kumar,
	linux-clk, Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria, linux-clk-owner

On 2018-09-18 16:00, Michael Turquette wrote:
> Quoting Ulf Hansson (2018-08-23 06:20:11)
>> On 31 July 2018 at 22:02,  <grahamr@codeaurora.org> wrote:
>> > I have two significant concerns with using a wrapper framework to support
>> > DVFS.  The first may be Qualcomm-specific, and more of a practical concern
>> > rather than a principled one.  We (clock driver owners) get
>> > voltage/frequency operating points from the clock controller designers in a
>> > single data-set, it is not information that is provided to or coming from
>> > the clock consumer hardware or software teams.  Tracking, updating, and
>> > debugging frequency/voltage mismatches falls completely into the clock team,
>> > so moving the responsibility of handling the relationship in code to the
>> > consumer would be a disaster (SDM845 has almost 200 clock domains and over
>> > 800 clock branches - not all controlled by Linux but a large proportion
>> > are).  It would mean all clock consumers using any clk_set_rate or
>> > clk_enable APIs would need to make the appropriate voltage requests as well
>> > - and even if they could look up the required voltage based on frequency
>> > from some central location (which I expect would be delivered and maintained
>> > by the clock driver team) the chance that everyone would do that correctly
>> > is, frankly, zero.  The types of problems that arise from under-volting a
>> > clock generator or consumer are very hard to debug (since it can essentially
>> > result in random behavior).
>> 
>> Okay, I see.
>> 
>> Wouldn't a nice script that does a search/replace work out? :-)
>> 
>> >
>> > My second concern is that we do have voltage requirements coming from the
>> > clock controller itself, not related to the clock consumer.  The clock
>> > hardware components themselves (PLLs, muxes, etc) require voltage levels
>> > that may differ from that of the final consumer of the output clock.  We
>> > could hack all these requirements into the frequency/voltage tuple table
>> > that consumers look up, but at that point we have diverged fairly
>> > dramatically from Stephen's principle that the clock framework should not
>> > directly require any voltage - fundamentally they do (at least ours do).
>> 
>> This changes my view, as I didn't know about these kind of cases.
>> 
>> First, it seems like you need to associate a struct device with the
>> clock controller, such that it can be attached to its corresponding PM
>> domain (genpd). Of course, then you also needs to deploy runtime PM
>> support for the clock driver for this clock controller device. Do note
>> that runtime PM is already supported by the clock core, so should be
>> trivial. Why, because this is needed to properly allow genpd to
>> aggregates the votes for the PM domain(s), in case there are other
>> devices in the same PM domain (or if there are dependencies to
>> subdomains).
> 
> Your struct device can be considered as Done. Stephen and I have been
> forcing clock driver authors to write proper platform drivers for a
> while now.
> 
>> 
>> Also, if I am not mistaken, the runtime PM enablement is something
>> that is already being used (or at least tried out) for some Exynos
>> clock drivers. I recall there were some discussions around locking
>> issues around the runtime PM support in the clock core. Let me see if
>> can search the mail archive to see if I find out if/what went wrong, I
>> will come back to this.
> 
> This was mostly related to idle power management issues, but yes there
> is some basic runtime pm awareness in the clock framework.
> 
>> 
>> Anyway, in regards to control the performance state for these clock
>> controller devices, to me it seems like there are no other way, but
>> explicitly allow clock drivers to call an "OPP API" to request a
>> performance state. Simply, because it's the clock driver that needs
>> the performance state for its device. Whether the "OPP API" is the
>> new, dev_pm_genpd_set_performance_state() or something not even
>> invented yet, is another question.
> 
> I completely agree, with the exception that I don't think it will be an
> "OPP API" but instead I hope it will be some runtime pm performance 
> api.

If we allow the clock framework to use runtime pm to request performance 
levels for its own voltage requirements, what is the real difference in 
having it cover all voltage requirements based on the chosen clock 
frequency/state (because on/off affect the voltage requirement as well 
as the rate)?  From an implementation and data structure point of view 
there is no difference at all - we will need to track a voltage 
requirement per clock operating point for the clock controller needs.  
Including the consumer requirements as well adds nothing and removes the 
need for any consumer changes to themselves use runtime pm.
I get the principle of having the consumer deal with their own specific 
needs, but the consumers in the SOCs I've seen do not know what their 
voltage requirements are - it's data managed by the clock provider.  It 
seems once the door is open to have the clock driver use runtime pm, why 
not allow SOCs with that kind of data management policy to build in the 
consumer requirements that way as well since it is zero extra work?

Graham




>> 
>> My conclusion so far is, that we seems to fall back to a potential
>> locking problem. In regards to that, I am wondering whether that is
>> actually more of hypothetical problem than a real problem for your
>> case.
> 
> For reference, this is why we allow reentrancy into the clock 
> framework.
> It is common that consumer A calls clk_set_rate to set clock X to a
> rate, but in order for clock X to acheive that rate the clock provider
> might need to call clk_set_rate on another clock. We support reentrancy
> for this type of case.
> 
> The problem described by Graham seems analogous. There are times when a
> performance provider itself will need to adjust it's own performance 
> (as
> consumed by some other parent provider). I'm under the impression that
> runtime pm allows reentrancy and genpd allows for nested genpds, so
> hopefully this should Just Work.
> 
>> 
>> > Our most recent patch that Taniya posted has gone in the direction similar
>> > to Tegra - instead of having the framework handle it, we use prepare and
>> > set_rate hooks to implement voltage (corner) voting in the qcom drivers via
>> > the new genpd.  This is not fully proven yet but is so far working well and
>> > will likely be our internal solution going forward if the framework
>> > consensus is to force consumers to manage their frequency-based voltage
>> > requirements themselves - I just do not see that as a practical solution for
>> > a complicated SoC with a large, highly distributed, clock tree.  That being
>> > said I do see potential future deadlock race conditions between clk and
>> > genpd which concern me - it remains a downside of that coupling.
>> >
>> > Would there be some way to prevent consumers from directly calling
>> > clk_set_rate or clk_enable and force them to go via another framework for
>> > these calls?  It would at least prevent people from using the "wrong"
>> > interface and bypassing voltage requirements.  That of course means having
>> > to mirror any of the clk APIs that update clock state into genpd/opp, which
>> > Stephen finds distasteful (and I agree).
>> 
>> I am not sure about this. Sound like an awful wrapper API.
> 
> Yeah, overloading the prepare callbacks is just a symptom of the 
> greater
> problem: we don't have a real DVFS api.
> 
> Regards,
> Mike
> 
>> 
>> However, what Mike seems to express the need for, is a common consumer
>> OPP API to set a performance state for a device, but also a way to to
>> allow SoC specific performance state providers to manage the backend
>> parts of actually changing the performance state.
>> 
>> I am wondering whether this could be a way forward, maybe not exactly
>> what you was looking for, but perhaps it can address your concerns?
>> 
>> [...]
>> 
>> Kind regards
>> Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-09-25 21:26                       ` grahamr
@ 2018-10-01 19:00                         ` Michael Turquette
  2018-10-04  0:37                           ` Graham Roff
  0 siblings, 1 reply; 34+ messages in thread
From: Michael Turquette @ 2018-10-01 19:00 UTC (permalink / raw)
  To: grahamr
  Cc: Ulf Hansson, Peter De Schrijver, Stephen Boyd, Viresh Kumar,
	linux-clk, Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria, linux-clk-owner

Quoting grahamr@codeaurora.org (2018-09-25 14:26:25)
> On 2018-09-18 16:00, Michael Turquette wrote:
> > Quoting Ulf Hansson (2018-08-23 06:20:11)
> >> On 31 July 2018 at 22:02,  <grahamr@codeaurora.org> wrote:
> >> > I have two significant concerns with using a wrapper framework to support
> >> > DVFS.  The first may be Qualcomm-specific, and more of a practical concern
> >> > rather than a principled one.  We (clock driver owners) get
> >> > voltage/frequency operating points from the clock controller designers in a
> >> > single data-set, it is not information that is provided to or coming from
> >> > the clock consumer hardware or software teams.  Tracking, updating, and
> >> > debugging frequency/voltage mismatches falls completely into the clock team,
> >> > so moving the responsibility of handling the relationship in code to the
> >> > consumer would be a disaster (SDM845 has almost 200 clock domains and over
> >> > 800 clock branches - not all controlled by Linux but a large proportion
> >> > are).  It would mean all clock consumers using any clk_set_rate or
> >> > clk_enable APIs would need to make the appropriate voltage requests as well
> >> > - and even if they could look up the required voltage based on frequency
> >> > from some central location (which I expect would be delivered and maintained
> >> > by the clock driver team) the chance that everyone would do that correctly
> >> > is, frankly, zero.  The types of problems that arise from under-volting a
> >> > clock generator or consumer are very hard to debug (since it can essentially
> >> > result in random behavior).
> >> 
> >> Okay, I see.
> >> 
> >> Wouldn't a nice script that does a search/replace work out? :-)
> >> 
> >> >
> >> > My second concern is that we do have voltage requirements coming from the
> >> > clock controller itself, not related to the clock consumer.  The clock
> >> > hardware components themselves (PLLs, muxes, etc) require voltage levels
> >> > that may differ from that of the final consumer of the output clock.  We
> >> > could hack all these requirements into the frequency/voltage tuple table
> >> > that consumers look up, but at that point we have diverged fairly
> >> > dramatically from Stephen's principle that the clock framework should not
> >> > directly require any voltage - fundamentally they do (at least ours do).
> >> 
> >> This changes my view, as I didn't know about these kind of cases.
> >> 
> >> First, it seems like you need to associate a struct device with the
> >> clock controller, such that it can be attached to its corresponding PM
> >> domain (genpd). Of course, then you also needs to deploy runtime PM
> >> support for the clock driver for this clock controller device. Do note
> >> that runtime PM is already supported by the clock core, so should be
> >> trivial. Why, because this is needed to properly allow genpd to
> >> aggregates the votes for the PM domain(s), in case there are other
> >> devices in the same PM domain (or if there are dependencies to
> >> subdomains).
> > 
> > Your struct device can be considered as Done. Stephen and I have been
> > forcing clock driver authors to write proper platform drivers for a
> > while now.
> > 
> >> 
> >> Also, if I am not mistaken, the runtime PM enablement is something
> >> that is already being used (or at least tried out) for some Exynos
> >> clock drivers. I recall there were some discussions around locking
> >> issues around the runtime PM support in the clock core. Let me see if
> >> can search the mail archive to see if I find out if/what went wrong, I
> >> will come back to this.
> > 
> > This was mostly related to idle power management issues, but yes there
> > is some basic runtime pm awareness in the clock framework.
> > 
> >> 
> >> Anyway, in regards to control the performance state for these clock
> >> controller devices, to me it seems like there are no other way, but
> >> explicitly allow clock drivers to call an "OPP API" to request a
> >> performance state. Simply, because it's the clock driver that needs
> >> the performance state for its device. Whether the "OPP API" is the
> >> new, dev_pm_genpd_set_performance_state() or something not even
> >> invented yet, is another question.
> > 
> > I completely agree, with the exception that I don't think it will be an
> > "OPP API" but instead I hope it will be some runtime pm performance 
> > api.
> 
> If we allow the clock framework to use runtime pm to request performance 
> levels for its own voltage requirements, what is the real difference in 
> having it cover all voltage requirements based on the chosen clock 
> frequency/state (because on/off affect the voltage requirement as well 
> as the rate)?

I had to re-read this a dozen times. At first I thought you were
challenging the idea that a clock provider should act like a voltage
consumer (in the case where the clock generator has performance
requirements of its own).

But now I think that you're asking: why can't the clock provider driver
know everything about how the soc is integrated and set the voltage
based on the whole clock tree, thus sparing the consumer drivers having
to think about their voltage at all. Do I have that right?

There are a bunch of problems with this. I really do see how it
simplifies your life since you invested a bunch of time and code into an
out-of-tree implementation that does this already, but let's take the
long view on this design topic:

1) how does your approach solve the problem for consumer drivers that
are not integrated within the SoC? What if a consumer driver consumes a
clkout pin derived from your in-SoC clock generator, but is powered by
an external PMIC? Where does this integration data live? Who owns it?

2) how does your approach solve for multiple clock provider drivers that
might need to manage shared voltage rails? How do they coordinate their
needs and requests?

3) clock data is already big. If we start lumping in freq/volt tuples
and use-case configuration data into those drivers, someone up the chain
is going to start complaining eventually. We've seen it before.

> From an implementation and data structure point of view 
> there is no difference at all - we will need to track a voltage 
> requirement per clock operating point for the clock controller needs.  

The code (and data) has to live somewhere, sure.

> Including the consumer requirements as well adds nothing and removes the 
> need for any consumer changes to themselves use runtime pm.

Except for the examples I mentioned above, especially #1 up above.

> I get the principle of having the consumer deal with their own specific 
> needs, but the consumers in the SOCs I've seen do not know what their 
> voltage requirements are - it's data managed by the clock provider.  It 

I guess I wasn't clear on the call at Connect: consumers should never
concern themselves with their voltage requirements. The performance API
will primarily use hertz as the perf value passed in, especially for the
SoCs you care about. Consumer drivers don't need to think about voltage.

And it's only "data managed by the clock provider" because that's what
you've hacked together out-of-tree.

> seems once the door is open to have the clock driver use runtime pm, why 
> not allow SOCs with that kind of data management policy to build in the 
> consumer requirements that way as well since it is zero extra work?

Because it's not zero extra work. None of the clock drivers that use
runtime pm use it for anything like active or performance management.
It's used for idle power management.

Regardless of which design route that we take, we'll have to glue
voltages to clock frequencies and make that code work. There is not a
case where we get something for free.

Best regards,
Mike

> 
> Graham
> 
> 
> 
> 
> >> 
> >> My conclusion so far is, that we seems to fall back to a potential
> >> locking problem. In regards to that, I am wondering whether that is
> >> actually more of hypothetical problem than a real problem for your
> >> case.
> > 
> > For reference, this is why we allow reentrancy into the clock 
> > framework.
> > It is common that consumer A calls clk_set_rate to set clock X to a
> > rate, but in order for clock X to acheive that rate the clock provider
> > might need to call clk_set_rate on another clock. We support reentrancy
> > for this type of case.
> > 
> > The problem described by Graham seems analogous. There are times when a
> > performance provider itself will need to adjust it's own performance 
> > (as
> > consumed by some other parent provider). I'm under the impression that
> > runtime pm allows reentrancy and genpd allows for nested genpds, so
> > hopefully this should Just Work.
> > 
> >> 
> >> > Our most recent patch that Taniya posted has gone in the direction similar
> >> > to Tegra - instead of having the framework handle it, we use prepare and
> >> > set_rate hooks to implement voltage (corner) voting in the qcom drivers via
> >> > the new genpd.  This is not fully proven yet but is so far working well and
> >> > will likely be our internal solution going forward if the framework
> >> > consensus is to force consumers to manage their frequency-based voltage
> >> > requirements themselves - I just do not see that as a practical solution for
> >> > a complicated SoC with a large, highly distributed, clock tree.  That being
> >> > said I do see potential future deadlock race conditions between clk and
> >> > genpd which concern me - it remains a downside of that coupling.
> >> >
> >> > Would there be some way to prevent consumers from directly calling
> >> > clk_set_rate or clk_enable and force them to go via another framework for
> >> > these calls?  It would at least prevent people from using the "wrong"
> >> > interface and bypassing voltage requirements.  That of course means having
> >> > to mirror any of the clk APIs that update clock state into genpd/opp, which
> >> > Stephen finds distasteful (and I agree).
> >> 
> >> I am not sure about this. Sound like an awful wrapper API.
> > 
> > Yeah, overloading the prepare callbacks is just a symptom of the 
> > greater
> > problem: we don't have a real DVFS api.
> > 
> > Regards,
> > Mike
> > 
> >> 
> >> However, what Mike seems to express the need for, is a common consumer
> >> OPP API to set a performance state for a device, but also a way to to
> >> allow SoC specific performance state providers to manage the backend
> >> parts of actually changing the performance state.
> >> 
> >> I am wondering whether this could be a way forward, maybe not exactly
> >> what you was looking for, but perhaps it can address your concerns?
> >> 
> >> [...]
> >> 
> >> Kind regards
> >> Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-10-01 19:00                         ` Michael Turquette
@ 2018-10-04  0:37                           ` Graham Roff
  2018-10-04 21:23                             ` Michael Turquette
  0 siblings, 1 reply; 34+ messages in thread
From: Graham Roff @ 2018-10-04  0:37 UTC (permalink / raw)
  To: Michael Turquette
  Cc: Ulf Hansson, Peter De Schrijver, Stephen Boyd, Viresh Kumar,
	linux-clk, Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria, linux-clk-owner

On 2018-10-01 12:00, Michael Turquette wrote:
> Quoting grahamr@codeaurora.org (2018-09-25 14:26:25)
>> On 2018-09-18 16:00, Michael Turquette wrote:
>> > Quoting Ulf Hansson (2018-08-23 06:20:11)
>> >> On 31 July 2018 at 22:02,  <grahamr@codeaurora.org> wrote:
>> >> > I have two significant concerns with using a wrapper framework to support
>> >> > DVFS.  The first may be Qualcomm-specific, and more of a practical concern
>> >> > rather than a principled one.  We (clock driver owners) get
>> >> > voltage/frequency operating points from the clock controller designers in a
>> >> > single data-set, it is not information that is provided to or coming from
>> >> > the clock consumer hardware or software teams.  Tracking, updating, and
>> >> > debugging frequency/voltage mismatches falls completely into the clock team,
>> >> > so moving the responsibility of handling the relationship in code to the
>> >> > consumer would be a disaster (SDM845 has almost 200 clock domains and over
>> >> > 800 clock branches - not all controlled by Linux but a large proportion
>> >> > are).  It would mean all clock consumers using any clk_set_rate or
>> >> > clk_enable APIs would need to make the appropriate voltage requests as well
>> >> > - and even if they could look up the required voltage based on frequency
>> >> > from some central location (which I expect would be delivered and maintained
>> >> > by the clock driver team) the chance that everyone would do that correctly
>> >> > is, frankly, zero.  The types of problems that arise from under-volting a
>> >> > clock generator or consumer are very hard to debug (since it can essentially
>> >> > result in random behavior).
>> >>
>> >> Okay, I see.
>> >>
>> >> Wouldn't a nice script that does a search/replace work out? :-)
>> >>
>> >> >
>> >> > My second concern is that we do have voltage requirements coming from the
>> >> > clock controller itself, not related to the clock consumer.  The clock
>> >> > hardware components themselves (PLLs, muxes, etc) require voltage levels
>> >> > that may differ from that of the final consumer of the output clock.  We
>> >> > could hack all these requirements into the frequency/voltage tuple table
>> >> > that consumers look up, but at that point we have diverged fairly
>> >> > dramatically from Stephen's principle that the clock framework should not
>> >> > directly require any voltage - fundamentally they do (at least ours do).
>> >>
>> >> This changes my view, as I didn't know about these kind of cases.
>> >>
>> >> First, it seems like you need to associate a struct device with the
>> >> clock controller, such that it can be attached to its corresponding PM
>> >> domain (genpd). Of course, then you also needs to deploy runtime PM
>> >> support for the clock driver for this clock controller device. Do note
>> >> that runtime PM is already supported by the clock core, so should be
>> >> trivial. Why, because this is needed to properly allow genpd to
>> >> aggregates the votes for the PM domain(s), in case there are other
>> >> devices in the same PM domain (or if there are dependencies to
>> >> subdomains).
>> >
>> > Your struct device can be considered as Done. Stephen and I have been
>> > forcing clock driver authors to write proper platform drivers for a
>> > while now.
>> >
>> >>
>> >> Also, if I am not mistaken, the runtime PM enablement is something
>> >> that is already being used (or at least tried out) for some Exynos
>> >> clock drivers. I recall there were some discussions around locking
>> >> issues around the runtime PM support in the clock core. Let me see if
>> >> can search the mail archive to see if I find out if/what went wrong, I
>> >> will come back to this.
>> >
>> > This was mostly related to idle power management issues, but yes there
>> > is some basic runtime pm awareness in the clock framework.
>> >
>> >>
>> >> Anyway, in regards to control the performance state for these clock
>> >> controller devices, to me it seems like there are no other way, but
>> >> explicitly allow clock drivers to call an "OPP API" to request a
>> >> performance state. Simply, because it's the clock driver that needs
>> >> the performance state for its device. Whether the "OPP API" is the
>> >> new, dev_pm_genpd_set_performance_state() or something not even
>> >> invented yet, is another question.
>> >
>> > I completely agree, with the exception that I don't think it will be an
>> > "OPP API" but instead I hope it will be some runtime pm performance
>> > api.
>> 
>> If we allow the clock framework to use runtime pm to request 
>> performance
>> levels for its own voltage requirements, what is the real difference 
>> in
>> having it cover all voltage requirements based on the chosen clock
>> frequency/state (because on/off affect the voltage requirement as well
>> as the rate)?
> 
> I had to re-read this a dozen times. At first I thought you were
> challenging the idea that a clock provider should act like a voltage
> consumer (in the case where the clock generator has performance
> requirements of its own).
> 
> But now I think that you're asking: why can't the clock provider driver
> know everything about how the soc is integrated and set the voltage
> based on the whole clock tree, thus sparing the consumer drivers having
> to think about their voltage at all. Do I have that right?

In a sense that is what I'm asking, but not as a request for any 
additional framework support - just as how a vendor could set up their 
performance states for the clock controller itself.
As we have discussed, the clock controller needs to itself be a consumer 
of performance states from runtime pm (the requirement is for sure, the 
use of runtime pm is at least the current proposal for how to support 
it).  This is absolutely the case for PLLs, internal clock generators, 
etc.  The performance state required will vary based on what frequency 
these internal clock components are running at.  So structurally we need 
to track and aggregate a performance state for the clock controller 
based on the frequencies of various components it managers.  I believe 
that this is understood and accepted?  Once that framework is in place, 
then it is completely transparent to the framework design what or whose 
requirements are covered by the performance state associated with each 
frequency.  I can include the consumer's voltage requirement without any 
change apart from updating the already existing data structure in the 
clock controller.


> There are a bunch of problems with this. I really do see how it
> simplifies your life since you invested a bunch of time and code into 
> an
> out-of-tree implementation that does this already, but let's take the
> long view on this design topic:
> 
> 1) how does your approach solve the problem for consumer drivers that
> are not integrated within the SoC? What if a consumer driver consumes a
> clkout pin derived from your in-SoC clock generator, but is powered by
> an external PMIC? Where does this integration data live? Who owns it?

I do not think this is related actually.  A consumer driver with 
independent voltage requirements can manage them via runtime pm 
performance states just like how the clock controller does.  genpd will 
aggregate requirements from different consumers perfectly fine.


> 2) how does your approach solve for multiple clock provider drivers 
> that
> might need to manage shared voltage rails? How do they coordinate their
> needs and requests?

Again, genpd aggregates under the hood.  I am confused why this question 
is relevant though, whether or not one of the consumers of a voltage 
rail is a clock controller or not, aggregation is required between 
multiple users.  If USB and UART both share a rail and request it via 
runtime pm state changes, genpd still has to aggregate.



> 3) clock data is already big. If we start lumping in freq/volt tuples
> and use-case configuration data into those drivers, someone up the 
> chain
> is going to start complaining eventually. We've seen it before.

I don't have a choice, our HW has voltage requirements based on PLL and 
RCG (clock generator) frequencies.  The clock controller is a voltage 
consumer, and needs to aggregate it's own requirements across each clock 
node.


>> From an implementation and data structure point of view
>> there is no difference at all - we will need to track a voltage
>> requirement per clock operating point for the clock controller needs.
> 
> The code (and data) has to live somewhere, sure.
> 
>> Including the consumer requirements as well adds nothing and removes 
>> the
>> need for any consumer changes to themselves use runtime pm.
> 
> Except for the examples I mentioned above, especially #1 up above.
> 
>> I get the principle of having the consumer deal with their own 
>> specific
>> needs, but the consumers in the SOCs I've seen do not know what their
>> voltage requirements are - it's data managed by the clock provider.  
>> It
> 
> I guess I wasn't clear on the call at Connect: consumers should never
> concern themselves with their voltage requirements. The performance API
> will primarily use hertz as the perf value passed in, especially for 
> the
> SoCs you care about. Consumer drivers don't need to think about 
> voltage.

Now I'm confused.  When you say the consumer drivers don't need to think 
about voltage, do you just mean the software driver code itself?  
Because the runtime pm data and any governors would need to be owned by 
the same developer (they are the ones who understand their performance 
state requirements), so in that sense the consumer very much needs to 
think about the voltage when they put that piece in place.  This is one 
of the key challenges we (at least Qualcomm) faces in moving the voltage 
requirement completely outside of the clock controller.


> And it's only "data managed by the clock provider" because that's what
> you've hacked together out-of-tree.
> 
>> seems once the door is open to have the clock driver use runtime pm, 
>> why
>> not allow SOCs with that kind of data management policy to build in 
>> the
>> consumer requirements that way as well since it is zero extra work?
> 
> Because it's not zero extra work. None of the clock drivers that use
> runtime pm use it for anything like active or performance management.
> It's used for idle power management.

Yes, however in order to satisfy the clock controller's own voltage 
requirements, we will need to build in support for active performance 
state management in the clock framework.  Once that is there, I argue it 
is zero extra work to use it for all clock/voltage voting.

Graham



> Regardless of which design route that we take, we'll have to glue
> voltages to clock frequencies and make that code work. There is not a
> case where we get something for free.
> 
> Best regards,
> Mike
> 
>> 
>> Graham
>> 
>> 
>> 
>> 
>> >>
>> >> My conclusion so far is, that we seems to fall back to a potential
>> >> locking problem. In regards to that, I am wondering whether that is
>> >> actually more of hypothetical problem than a real problem for your
>> >> case.
>> >
>> > For reference, this is why we allow reentrancy into the clock
>> > framework.
>> > It is common that consumer A calls clk_set_rate to set clock X to a
>> > rate, but in order for clock X to acheive that rate the clock provider
>> > might need to call clk_set_rate on another clock. We support reentrancy
>> > for this type of case.
>> >
>> > The problem described by Graham seems analogous. There are times when a
>> > performance provider itself will need to adjust it's own performance
>> > (as
>> > consumed by some other parent provider). I'm under the impression that
>> > runtime pm allows reentrancy and genpd allows for nested genpds, so
>> > hopefully this should Just Work.
>> >
>> >>
>> >> > Our most recent patch that Taniya posted has gone in the direction similar
>> >> > to Tegra - instead of having the framework handle it, we use prepare and
>> >> > set_rate hooks to implement voltage (corner) voting in the qcom drivers via
>> >> > the new genpd.  This is not fully proven yet but is so far working well and
>> >> > will likely be our internal solution going forward if the framework
>> >> > consensus is to force consumers to manage their frequency-based voltage
>> >> > requirements themselves - I just do not see that as a practical solution for
>> >> > a complicated SoC with a large, highly distributed, clock tree.  That being
>> >> > said I do see potential future deadlock race conditions between clk and
>> >> > genpd which concern me - it remains a downside of that coupling.
>> >> >
>> >> > Would there be some way to prevent consumers from directly calling
>> >> > clk_set_rate or clk_enable and force them to go via another framework for
>> >> > these calls?  It would at least prevent people from using the "wrong"
>> >> > interface and bypassing voltage requirements.  That of course means having
>> >> > to mirror any of the clk APIs that update clock state into genpd/opp, which
>> >> > Stephen finds distasteful (and I agree).
>> >>
>> >> I am not sure about this. Sound like an awful wrapper API.
>> >
>> > Yeah, overloading the prepare callbacks is just a symptom of the
>> > greater
>> > problem: we don't have a real DVFS api.
>> >
>> > Regards,
>> > Mike
>> >
>> >>
>> >> However, what Mike seems to express the need for, is a common consumer
>> >> OPP API to set a performance state for a device, but also a way to to
>> >> allow SoC specific performance state providers to manage the backend
>> >> parts of actually changing the performance state.
>> >>
>> >> I am wondering whether this could be a way forward, maybe not exactly
>> >> what you was looking for, but perhaps it can address your concerns?
>> >>
>> >> [...]
>> >>
>> >> Kind regards
>> >> Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFD] Voltage dependencies for clocks (DVFS)
  2018-10-04  0:37                           ` Graham Roff
@ 2018-10-04 21:23                             ` Michael Turquette
  0 siblings, 0 replies; 34+ messages in thread
From: Michael Turquette @ 2018-10-04 21:23 UTC (permalink / raw)
  To: Graham Roff
  Cc: Ulf Hansson, Peter De Schrijver, Stephen Boyd, Viresh Kumar,
	linux-clk, Linux PM, Doug Anderson, Taniya Das, Rajendra Nayak,
	Amit Nischal, Vincent Guittot, Amit Kucheria, linux-clk-owner

Quoting Graham Roff (2018-10-03 17:37:54)
> On 2018-10-01 12:00, Michael Turquette wrote:
> > Quoting grahamr@codeaurora.org (2018-09-25 14:26:25)
> >> On 2018-09-18 16:00, Michael Turquette wrote:
> >> > Quoting Ulf Hansson (2018-08-23 06:20:11)
> >> >> On 31 July 2018 at 22:02,  <grahamr@codeaurora.org> wrote:
> >> >> > I have two significant concerns with using a wrapper framework to support
> >> >> > DVFS.  The first may be Qualcomm-specific, and more of a practical concern
> >> >> > rather than a principled one.  We (clock driver owners) get
> >> >> > voltage/frequency operating points from the clock controller designers in a
> >> >> > single data-set, it is not information that is provided to or coming from
> >> >> > the clock consumer hardware or software teams.  Tracking, updating, and
> >> >> > debugging frequency/voltage mismatches falls completely into the clock team,
> >> >> > so moving the responsibility of handling the relationship in code to the
> >> >> > consumer would be a disaster (SDM845 has almost 200 clock domains and over
> >> >> > 800 clock branches - not all controlled by Linux but a large proportion
> >> >> > are).  It would mean all clock consumers using any clk_set_rate or
> >> >> > clk_enable APIs would need to make the appropriate voltage requests as well
> >> >> > - and even if they could look up the required voltage based on frequency
> >> >> > from some central location (which I expect would be delivered and maintained
> >> >> > by the clock driver team) the chance that everyone would do that correctly
> >> >> > is, frankly, zero.  The types of problems that arise from under-volting a
> >> >> > clock generator or consumer are very hard to debug (since it can essentially
> >> >> > result in random behavior).
> >> >>
> >> >> Okay, I see.
> >> >>
> >> >> Wouldn't a nice script that does a search/replace work out? :-)
> >> >>
> >> >> >
> >> >> > My second concern is that we do have voltage requirements coming from the
> >> >> > clock controller itself, not related to the clock consumer.  The clock
> >> >> > hardware components themselves (PLLs, muxes, etc) require voltage levels
> >> >> > that may differ from that of the final consumer of the output clock.  We
> >> >> > could hack all these requirements into the frequency/voltage tuple table
> >> >> > that consumers look up, but at that point we have diverged fairly
> >> >> > dramatically from Stephen's principle that the clock framework should not
> >> >> > directly require any voltage - fundamentally they do (at least ours do).
> >> >>
> >> >> This changes my view, as I didn't know about these kind of cases.
> >> >>
> >> >> First, it seems like you need to associate a struct device with the
> >> >> clock controller, such that it can be attached to its corresponding PM
> >> >> domain (genpd). Of course, then you also needs to deploy runtime PM
> >> >> support for the clock driver for this clock controller device. Do note
> >> >> that runtime PM is already supported by the clock core, so should be
> >> >> trivial. Why, because this is needed to properly allow genpd to
> >> >> aggregates the votes for the PM domain(s), in case there are other
> >> >> devices in the same PM domain (or if there are dependencies to
> >> >> subdomains).
> >> >
> >> > Your struct device can be considered as Done. Stephen and I have been
> >> > forcing clock driver authors to write proper platform drivers for a
> >> > while now.
> >> >
> >> >>
> >> >> Also, if I am not mistaken, the runtime PM enablement is something
> >> >> that is already being used (or at least tried out) for some Exynos
> >> >> clock drivers. I recall there were some discussions around locking
> >> >> issues around the runtime PM support in the clock core. Let me see if
> >> >> can search the mail archive to see if I find out if/what went wrong, I
> >> >> will come back to this.
> >> >
> >> > This was mostly related to idle power management issues, but yes there
> >> > is some basic runtime pm awareness in the clock framework.
> >> >
> >> >>
> >> >> Anyway, in regards to control the performance state for these clock
> >> >> controller devices, to me it seems like there are no other way, but
> >> >> explicitly allow clock drivers to call an "OPP API" to request a
> >> >> performance state. Simply, because it's the clock driver that needs
> >> >> the performance state for its device. Whether the "OPP API" is the
> >> >> new, dev_pm_genpd_set_performance_state() or something not even
> >> >> invented yet, is another question.
> >> >
> >> > I completely agree, with the exception that I don't think it will be an
> >> > "OPP API" but instead I hope it will be some runtime pm performance
> >> > api.
> >> 
> >> If we allow the clock framework to use runtime pm to request 
> >> performance
> >> levels for its own voltage requirements, what is the real difference 
> >> in
> >> having it cover all voltage requirements based on the chosen clock
> >> frequency/state (because on/off affect the voltage requirement as well
> >> as the rate)?
> > 
> > I had to re-read this a dozen times. At first I thought you were
> > challenging the idea that a clock provider should act like a voltage
> > consumer (in the case where the clock generator has performance
> > requirements of its own).
> > 
> > But now I think that you're asking: why can't the clock provider driver
> > know everything about how the soc is integrated and set the voltage
> > based on the whole clock tree, thus sparing the consumer drivers having
> > to think about their voltage at all. Do I have that right?
> 
> In a sense that is what I'm asking, but not as a request for any 
> additional framework support - just as how a vendor could set up their 
> performance states for the clock controller itself.

I think that this is the crux of the issue. The approach that you are
advocating for is to stuff the DVFS complexity for your platforms into
the clk drivers. Even if your clock drivers use genpd to configure a
voltage a rail, your approach still makes the clk framework the DVFS
interface for consumer drivers (io controllers, network drivers,
multimedia, etc).

My approach is that DVFS is a generic concept that everyone has been
using for over a decade. Let's not solve the problem for one platform by
bloating that platform's clock driver. Let's actually Do The Right Thing
upstream by addressing this issue in a scalable way for all platforms
that care about DVFS.

There was a time when every platform defined their own struct clk and
implemented all of their own clock tree logic from scratch (or more
often from copy-paste). Let's not repeat that design anti-pattern for
DVFS.

Stephen and I have been discussing an initial proposal. Eventually this
will make it to the list for comments.

Regards,
Mike

> As we have discussed, the clock controller needs to itself be a consumer 
> of performance states from runtime pm (the requirement is for sure, the 
> use of runtime pm is at least the current proposal for how to support 
> it).  This is absolutely the case for PLLs, internal clock generators, 
> etc.  The performance state required will vary based on what frequency 
> these internal clock components are running at.  So structurally we need 
> to track and aggregate a performance state for the clock controller 
> based on the frequencies of various components it managers.  I believe 
> that this is understood and accepted?  Once that framework is in place, 
> then it is completely transparent to the framework design what or whose 
> requirements are covered by the performance state associated with each 
> frequency.  I can include the consumer's voltage requirement without any 
> change apart from updating the already existing data structure in the 
> clock controller.
> 
> 
> > There are a bunch of problems with this. I really do see how it
> > simplifies your life since you invested a bunch of time and code into 
> > an
> > out-of-tree implementation that does this already, but let's take the
> > long view on this design topic:
> > 
> > 1) how does your approach solve the problem for consumer drivers that
> > are not integrated within the SoC? What if a consumer driver consumes a
> > clkout pin derived from your in-SoC clock generator, but is powered by
> > an external PMIC? Where does this integration data live? Who owns it?
> 
> I do not think this is related actually.  A consumer driver with 
> independent voltage requirements can manage them via runtime pm 
> performance states just like how the clock controller does.  genpd will 
> aggregate requirements from different consumers perfectly fine.
> 
> 
> > 2) how does your approach solve for multiple clock provider drivers 
> > that
> > might need to manage shared voltage rails? How do they coordinate their
> > needs and requests?
> 
> Again, genpd aggregates under the hood.  I am confused why this question 
> is relevant though, whether or not one of the consumers of a voltage 
> rail is a clock controller or not, aggregation is required between 
> multiple users.  If USB and UART both share a rail and request it via 
> runtime pm state changes, genpd still has to aggregate.
> 
> 
> 
> > 3) clock data is already big. If we start lumping in freq/volt tuples
> > and use-case configuration data into those drivers, someone up the 
> > chain
> > is going to start complaining eventually. We've seen it before.
> 
> I don't have a choice, our HW has voltage requirements based on PLL and 
> RCG (clock generator) frequencies.  The clock controller is a voltage 
> consumer, and needs to aggregate it's own requirements across each clock 
> node.
> 
> 
> >> From an implementation and data structure point of view
> >> there is no difference at all - we will need to track a voltage
> >> requirement per clock operating point for the clock controller needs.
> > 
> > The code (and data) has to live somewhere, sure.
> > 
> >> Including the consumer requirements as well adds nothing and removes 
> >> the
> >> need for any consumer changes to themselves use runtime pm.
> > 
> > Except for the examples I mentioned above, especially #1 up above.
> > 
> >> I get the principle of having the consumer deal with their own 
> >> specific
> >> needs, but the consumers in the SOCs I've seen do not know what their
> >> voltage requirements are - it's data managed by the clock provider.  
> >> It
> > 
> > I guess I wasn't clear on the call at Connect: consumers should never
> > concern themselves with their voltage requirements. The performance API
> > will primarily use hertz as the perf value passed in, especially for 
> > the
> > SoCs you care about. Consumer drivers don't need to think about 
> > voltage.
> 
> Now I'm confused.  When you say the consumer drivers don't need to think 
> about voltage, do you just mean the software driver code itself?  
> Because the runtime pm data and any governors would need to be owned by 
> the same developer (they are the ones who understand their performance 
> state requirements), so in that sense the consumer very much needs to 
> think about the voltage when they put that piece in place.  This is one 
> of the key challenges we (at least Qualcomm) faces in moving the voltage 
> requirement completely outside of the clock controller.
> 
> 
> > And it's only "data managed by the clock provider" because that's what
> > you've hacked together out-of-tree.
> > 
> >> seems once the door is open to have the clock driver use runtime pm, 
> >> why
> >> not allow SOCs with that kind of data management policy to build in 
> >> the
> >> consumer requirements that way as well since it is zero extra work?
> > 
> > Because it's not zero extra work. None of the clock drivers that use
> > runtime pm use it for anything like active or performance management.
> > It's used for idle power management.
> 
> Yes, however in order to satisfy the clock controller's own voltage 
> requirements, we will need to build in support for active performance 
> state management in the clock framework.  Once that is there, I argue it 
> is zero extra work to use it for all clock/voltage voting.
> 
> Graham
> 
> 
> 
> > Regardless of which design route that we take, we'll have to glue
> > voltages to clock frequencies and make that code work. There is not a
> > case where we get something for free.
> > 
> > Best regards,
> > Mike
> > 
> >> 
> >> Graham
> >> 
> >> 
> >> 
> >> 
> >> >>
> >> >> My conclusion so far is, that we seems to fall back to a potential
> >> >> locking problem. In regards to that, I am wondering whether that is
> >> >> actually more of hypothetical problem than a real problem for your
> >> >> case.
> >> >
> >> > For reference, this is why we allow reentrancy into the clock
> >> > framework.
> >> > It is common that consumer A calls clk_set_rate to set clock X to a
> >> > rate, but in order for clock X to acheive that rate the clock provider
> >> > might need to call clk_set_rate on another clock. We support reentrancy
> >> > for this type of case.
> >> >
> >> > The problem described by Graham seems analogous. There are times when a
> >> > performance provider itself will need to adjust it's own performance
> >> > (as
> >> > consumed by some other parent provider). I'm under the impression that
> >> > runtime pm allows reentrancy and genpd allows for nested genpds, so
> >> > hopefully this should Just Work.
> >> >
> >> >>
> >> >> > Our most recent patch that Taniya posted has gone in the direction similar
> >> >> > to Tegra - instead of having the framework handle it, we use prepare and
> >> >> > set_rate hooks to implement voltage (corner) voting in the qcom drivers via
> >> >> > the new genpd.  This is not fully proven yet but is so far working well and
> >> >> > will likely be our internal solution going forward if the framework
> >> >> > consensus is to force consumers to manage their frequency-based voltage
> >> >> > requirements themselves - I just do not see that as a practical solution for
> >> >> > a complicated SoC with a large, highly distributed, clock tree.  That being
> >> >> > said I do see potential future deadlock race conditions between clk and
> >> >> > genpd which concern me - it remains a downside of that coupling.
> >> >> >
> >> >> > Would there be some way to prevent consumers from directly calling
> >> >> > clk_set_rate or clk_enable and force them to go via another framework for
> >> >> > these calls?  It would at least prevent people from using the "wrong"
> >> >> > interface and bypassing voltage requirements.  That of course means having
> >> >> > to mirror any of the clk APIs that update clock state into genpd/opp, which
> >> >> > Stephen finds distasteful (and I agree).
> >> >>
> >> >> I am not sure about this. Sound like an awful wrapper API.
> >> >
> >> > Yeah, overloading the prepare callbacks is just a symptom of the
> >> > greater
> >> > problem: we don't have a real DVFS api.
> >> >
> >> > Regards,
> >> > Mike
> >> >
> >> >>
> >> >> However, what Mike seems to express the need for, is a common consumer
> >> >> OPP API to set a performance state for a device, but also a way to to
> >> >> allow SoC specific performance state providers to manage the backend
> >> >> parts of actually changing the performance state.
> >> >>
> >> >> I am wondering whether this could be a way forward, maybe not exactly
> >> >> what you was looking for, but perhaps it can address your concerns?
> >> >>
> >> >> [...]
> >> >>
> >> >> Kind regards
> >> >> Uffe

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-10-04 21:25 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-18 20:44 [RFD] Voltage dependencies for clocks (DVFS) grahamr
2018-07-02  5:13 ` Michael Turquette
2018-07-04  6:55 ` Viresh Kumar
2018-07-04 12:50   ` Ulf Hansson
2018-07-04 12:54     ` Rafael J. Wysocki
2018-07-04 12:58       ` Ulf Hansson
2018-07-20 17:12     ` Stephen Boyd
2018-07-20 17:56       ` Michael Turquette
2018-07-24 23:13         ` Stephen Boyd
2018-07-25  5:51           ` Michael Turquette
2018-07-23  8:26       ` Peter De Schrijver
2018-07-24 23:04         ` Stephen Boyd
2018-07-25  5:44           ` Michael Turquette
2018-07-25 11:27             ` Peter De Schrijver
2018-07-25 18:40               ` Michael Turquette
2018-07-31 11:56               ` Ulf Hansson
2018-07-31 20:02                 ` grahamr
2018-08-23 13:20                   ` Ulf Hansson
2018-09-18 23:00                     ` Michael Turquette
2018-09-19  7:05                       ` Geert Uytterhoeven
2018-09-19 18:07                         ` Michael Turquette
2018-09-25 13:11                           ` Geert Uytterhoeven
2018-09-25 21:26                       ` grahamr
2018-10-01 19:00                         ` Michael Turquette
2018-10-04  0:37                           ` Graham Roff
2018-10-04 21:23                             ` Michael Turquette
2018-09-18 17:25                   ` Kevin Hilman
2018-08-03 23:05                 ` Michael Turquette
2018-08-23 12:13                   ` Ulf Hansson
2018-09-18 22:48                     ` Michael Turquette
2018-07-31 10:35       ` Ulf Hansson
2018-08-03 21:11         ` Michael Turquette
2018-08-23 11:10           ` Ulf Hansson
2018-07-05  8:19 ` Peter De Schrijver

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.