linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
       [not found] <248bb01e-1746-c84c-78c4-3cf7d2541a70@codeaurora.org>
@ 2020-09-15 17:24 ` Matthias Kaehlcke
  2020-09-15 17:50   ` Daniel Lezcano
  2020-09-15 19:53 ` Doug Anderson
  1 sibling, 1 reply; 17+ messages in thread
From: Matthias Kaehlcke @ 2020-09-15 17:24 UTC (permalink / raw)
  To: Rajendra Nayak, Lukasz Luba
  Cc: Rob Herring, DTML, Doug Anderson, linux-pm, Amit Daniel Kachhap,
	Daniel Lezcano, Viresh Kumar, Javi Merino

+Thermal folks

Hi Rajendra,

On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
> Hi Rob,
> 
> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> calculating power values in mW, is there a need to document the property as something that *has* to be
> based on real power measurements?

Relative values may work for scheduling decisions, but not for thermal
management with the power allocator, at least not when CPU cooling devices
are combined with others that specify their power consumption in absolute
values. Such a configuration should be supported IMO.

Thanks

Matthias

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 17:24 ` is 'dynamic-power-coefficient' expected to be based on 'real' power measurements? Matthias Kaehlcke
@ 2020-09-15 17:50   ` Daniel Lezcano
  2020-09-15 17:58     ` Matthias Kaehlcke
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Lezcano @ 2020-09-15 17:50 UTC (permalink / raw)
  To: Matthias Kaehlcke, Rajendra Nayak, Lukasz Luba
  Cc: Rob Herring, DTML, Doug Anderson, linux-pm, Amit Daniel Kachhap,
	Viresh Kumar, Javi Merino

On 15/09/2020 19:24, Matthias Kaehlcke wrote:
> +Thermal folks
> 
> Hi Rajendra,
> 
> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
>> Hi Rob,
>>
>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
>> calculating power values in mW, is there a need to document the property as something that *has* to be
>> based on real power measurements?
> 
> Relative values may work for scheduling decisions, but not for thermal
> management with the power allocator, at least not when CPU cooling devices
> are combined with others that specify their power consumption in absolute
> values. Such a configuration should be supported IMO.

The energy model is used in the cpufreq cooling device and if the
sustainable power is consistent with the relative values then there is
no reason it shouldn't work.



-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 17:50   ` Daniel Lezcano
@ 2020-09-15 17:58     ` Matthias Kaehlcke
  2020-09-15 20:55       ` Daniel Lezcano
  2020-09-16  9:18       ` Lukasz Luba
  0 siblings, 2 replies; 17+ messages in thread
From: Matthias Kaehlcke @ 2020-09-15 17:58 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rajendra Nayak, Lukasz Luba, Rob Herring, DTML, Doug Anderson,
	linux-pm, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
> > +Thermal folks
> > 
> > Hi Rajendra,
> > 
> > On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
> >> Hi Rob,
> >>
> >> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> >> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> >> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> >> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> >> calculating power values in mW, is there a need to document the property as something that *has* to be
> >> based on real power measurements?
> > 
> > Relative values may work for scheduling decisions, but not for thermal
> > management with the power allocator, at least not when CPU cooling devices
> > are combined with others that specify their power consumption in absolute
> > values. Such a configuration should be supported IMO.
> 
> The energy model is used in the cpufreq cooling device and if the
> sustainable power is consistent with the relative values then there is
> no reason it shouldn't work.

Agreed on thermal zones that exclusively use CPUs as cooling devices, but
what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
GPU that specifies its power in mW?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
       [not found] <248bb01e-1746-c84c-78c4-3cf7d2541a70@codeaurora.org>
  2020-09-15 17:24 ` is 'dynamic-power-coefficient' expected to be based on 'real' power measurements? Matthias Kaehlcke
@ 2020-09-15 19:53 ` Doug Anderson
  1 sibling, 0 replies; 17+ messages in thread
From: Doug Anderson @ 2020-09-15 19:53 UTC (permalink / raw)
  To: Rajendra Nayak
  Cc: Matthias Kaehlcke, Lukasz Luba, Rob Herring, DTML, Linux PM,
	Amit Daniel Kachhap, Daniel Lezcano, Viresh Kumar, Javi Merino

Hi,

On Mon, Sep 14, 2020 at 10:44 PM Rajendra Nayak <rnayak@codeaurora.org> wrote:
>
> Hi Rob,
>
> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> calculating power values in mW, is there a need to document the property as something that *has* to be
> based on real power measurements?
>
> Looking at the bindings,
>
>    dynamic-power-coefficient:
>      $ref: '/schemas/types.yaml#/definitions/uint32'
>      description:
>        A u32 value that represents the running time dynamic
>        power coefficient in units of uW/MHz/V^2. The
>        coefficient can either be calculated from power
>        measurements or derived by analysis.
>
>        The dynamic power consumption of the CPU  is
>        proportional to the square of the Voltage (V) and
>        the clock frequency (f). The coefficient is used to
>        calculate the dynamic power as below -
>
>        Pdyn = dynamic-power-coefficient * V^2 * f
>
>        where voltage is in V, frequency is in MHz.
>
> .. the 'can either be calculated from power measurements or derived by analysis'
> tells me we don't mandate that this be based on real power measurements.
> If we do, then perhaps that needs to be mentioned explicitly?

To me, the phrase "derived by analysis" doesn't mean that the number
is allowed to be in completely made up units.  It means it's still
supposed to be in the same units but it's OK if you didn't integrate a
Coulomb counter into your system.

It's kinda like saying that the police can give you a speeding ticket
by either measuring your speed with a radar gun or by checking a clock
when your car passed two known places and calculating your speed based
on that.  The radar gun is a direct measurement whereas the other is
derived by analysis.  In both cases you're still talking about a speed
in terms of Miles per Hour (or kilometers per hour in more sane
countries).

-Doug

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 17:58     ` Matthias Kaehlcke
@ 2020-09-15 20:55       ` Daniel Lezcano
  2020-09-15 21:13         ` Matthias Kaehlcke
                           ` (2 more replies)
  2020-09-16  9:18       ` Lukasz Luba
  1 sibling, 3 replies; 17+ messages in thread
From: Daniel Lezcano @ 2020-09-15 20:55 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Rajendra Nayak, Lukasz Luba, Rob Herring, DTML, Doug Anderson,
	linux-pm, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

On 15/09/2020 19:58, Matthias Kaehlcke wrote:
> On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
>> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
>>> +Thermal folks
>>>
>>> Hi Rajendra,
>>>
>>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
>>>> Hi Rob,
>>>>
>>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
>>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
>>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
>>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
>>>> calculating power values in mW, is there a need to document the property as something that *has* to be
>>>> based on real power measurements?
>>>
>>> Relative values may work for scheduling decisions, but not for thermal
>>> management with the power allocator, at least not when CPU cooling devices
>>> are combined with others that specify their power consumption in absolute
>>> values. Such a configuration should be supported IMO.
>>
>> The energy model is used in the cpufreq cooling device and if the
>> sustainable power is consistent with the relative values then there is
>> no reason it shouldn't work.
> 
> Agreed on thermal zones that exclusively use CPUs as cooling devices, but
> what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
> GPU that specifies its power in mW?

Well, if a SoC vendor decides to mix the units, then there is nothing we
can do.

When specifying the power numbers available for the SoC, they could be
all scaled against the highest power number.

There are so many factors on the hardware, the firmware, the kernel and
the userspace sides having an impact on the energy efficiency, I don't
understand why SoC vendors are so shy to share the power numbers...



-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 20:55       ` Daniel Lezcano
@ 2020-09-15 21:13         ` Matthias Kaehlcke
  2020-09-15 21:23           ` Daniel Lezcano
  2020-09-15 21:46         ` Doug Anderson
  2020-09-16  9:53         ` Lukasz Luba
  2 siblings, 1 reply; 17+ messages in thread
From: Matthias Kaehlcke @ 2020-09-15 21:13 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rajendra Nayak, Lukasz Luba, Rob Herring, DTML, Doug Anderson,
	linux-pm, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

On Tue, Sep 15, 2020 at 10:55:52PM +0200, Daniel Lezcano wrote:
> On 15/09/2020 19:58, Matthias Kaehlcke wrote:
> > On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
> >> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
> >>> +Thermal folks
> >>>
> >>> Hi Rajendra,
> >>>
> >>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
> >>>> Hi Rob,
> >>>>
> >>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> >>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> >>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> >>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> >>>> calculating power values in mW, is there a need to document the property as something that *has* to be
> >>>> based on real power measurements?
> >>>
> >>> Relative values may work for scheduling decisions, but not for thermal
> >>> management with the power allocator, at least not when CPU cooling devices
> >>> are combined with others that specify their power consumption in absolute
> >>> values. Such a configuration should be supported IMO.
> >>
> >> The energy model is used in the cpufreq cooling device and if the
> >> sustainable power is consistent with the relative values then there is
> >> no reason it shouldn't work.
> > 
> > Agreed on thermal zones that exclusively use CPUs as cooling devices, but
> > what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
> > GPU that specifies its power in mW?
> 
> Well, if a SoC vendor decides to mix the units, then there is nothing we
> can do.
> 
> When specifying the power numbers available for the SoC, they could be
> all scaled against the highest power number.

The GPU was just one example, a device could have heat dissipating components
that are not from the SoC vendor (e.g. WiFi, modem, backlight), and depending
on the design it might not make sense to have separate thermal zones.

> There are so many factors on the hardware, the firmware, the kernel and
> the userspace sides having an impact on the energy efficiency, I don't
> understand why SoC vendors are so shy to share the power numbers...

nor do I, someone could just perform measurements to determine DPCs
with the proper scale if Qualcomm refuses to provide them ...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 21:13         ` Matthias Kaehlcke
@ 2020-09-15 21:23           ` Daniel Lezcano
  2020-09-15 21:36             ` Matthias Kaehlcke
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Lezcano @ 2020-09-15 21:23 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: Rajendra Nayak, Lukasz Luba, Rob Herring, DTML, Doug Anderson,
	linux-pm, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

On 15/09/2020 23:13, Matthias Kaehlcke wrote:
> On Tue, Sep 15, 2020 at 10:55:52PM +0200, Daniel Lezcano wrote:
>> On 15/09/2020 19:58, Matthias Kaehlcke wrote:
>>> On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
>>>> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
>>>>> +Thermal folks
>>>>>
>>>>> Hi Rajendra,
>>>>>
>>>>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
>>>>>> Hi Rob,
>>>>>>
>>>>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
>>>>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
>>>>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
>>>>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
>>>>>> calculating power values in mW, is there a need to document the property as something that *has* to be
>>>>>> based on real power measurements?
>>>>>
>>>>> Relative values may work for scheduling decisions, but not for thermal
>>>>> management with the power allocator, at least not when CPU cooling devices
>>>>> are combined with others that specify their power consumption in absolute
>>>>> values. Such a configuration should be supported IMO.
>>>>
>>>> The energy model is used in the cpufreq cooling device and if the
>>>> sustainable power is consistent with the relative values then there is
>>>> no reason it shouldn't work.
>>>
>>> Agreed on thermal zones that exclusively use CPUs as cooling devices, but
>>> what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
>>> GPU that specifies its power in mW?
>>
>> Well, if a SoC vendor decides to mix the units, then there is nothing we
>> can do.
>>
>> When specifying the power numbers available for the SoC, they could be
>> all scaled against the highest power number.
> 
> The GPU was just one example, a device could have heat dissipating components
> that are not from the SoC vendor (e.g. WiFi, modem, backlight), and depending
> on the design it might not make sense to have separate thermal zones.

Is it possible to elaborate, I'm not sure to get the point ?


>> There are so many factors on the hardware, the firmware, the kernel and
>> the userspace sides having an impact on the energy efficiency, I don't
>> understand why SoC vendors are so shy to share the power numbers...
> 
> nor do I, someone could just perform measurements to determine DPCs
> with the proper scale if Qualcomm refuses to provide them ...
> 


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 21:23           ` Daniel Lezcano
@ 2020-09-15 21:36             ` Matthias Kaehlcke
  2020-09-16  4:15               ` Rajendra Nayak
  0 siblings, 1 reply; 17+ messages in thread
From: Matthias Kaehlcke @ 2020-09-15 21:36 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rajendra Nayak, Lukasz Luba, Rob Herring, DTML, Doug Anderson,
	linux-pm, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

On Tue, Sep 15, 2020 at 11:23:49PM +0200, Daniel Lezcano wrote:
> On 15/09/2020 23:13, Matthias Kaehlcke wrote:
> > On Tue, Sep 15, 2020 at 10:55:52PM +0200, Daniel Lezcano wrote:
> >> On 15/09/2020 19:58, Matthias Kaehlcke wrote:
> >>> On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
> >>>> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
> >>>>> +Thermal folks
> >>>>>
> >>>>> Hi Rajendra,
> >>>>>
> >>>>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
> >>>>>> Hi Rob,
> >>>>>>
> >>>>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> >>>>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> >>>>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> >>>>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> >>>>>> calculating power values in mW, is there a need to document the property as something that *has* to be
> >>>>>> based on real power measurements?
> >>>>>
> >>>>> Relative values may work for scheduling decisions, but not for thermal
> >>>>> management with the power allocator, at least not when CPU cooling devices
> >>>>> are combined with others that specify their power consumption in absolute
> >>>>> values. Such a configuration should be supported IMO.
> >>>>
> >>>> The energy model is used in the cpufreq cooling device and if the
> >>>> sustainable power is consistent with the relative values then there is
> >>>> no reason it shouldn't work.
> >>>
> >>> Agreed on thermal zones that exclusively use CPUs as cooling devices, but
> >>> what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
> >>> GPU that specifies its power in mW?
> >>
> >> Well, if a SoC vendor decides to mix the units, then there is nothing we
> >> can do.
> >>
> >> When specifying the power numbers available for the SoC, they could be
> >> all scaled against the highest power number.
> > 
> > The GPU was just one example, a device could have heat dissipating components
> > that are not from the SoC vendor (e.g. WiFi, modem, backlight), and depending
> > on the design it might not make sense to have separate thermal zones.
> 
> Is it possible to elaborate, I'm not sure to get the point ?

A device could have a thermal zone with the following cooling
devices:

- CPUs with power consumption specified as pmW (pseudo mW
- A modem from a third party vendor. The modem can dissipate
  significant heat and allows to throttle the bandwidth for
  cooling. The power consumption of the modem is given in
  mW.

These could be crammed together in a small form factor
(e.g. ChromeCast or Chromebit) which makes it difficult to
discern with a sensor what exactly is generating the heat,
which is why you have a single thermal zone.

IPA is used as governor for this zone, it can't make accurate
decisions because one cooling device specifies it's power
consumption in pmW and the other in mW.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 20:55       ` Daniel Lezcano
  2020-09-15 21:13         ` Matthias Kaehlcke
@ 2020-09-15 21:46         ` Doug Anderson
  2020-09-15 21:51           ` Matthias Kaehlcke
  2020-09-16  9:53         ` Lukasz Luba
  2 siblings, 1 reply; 17+ messages in thread
From: Doug Anderson @ 2020-09-15 21:46 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Matthias Kaehlcke, Rajendra Nayak, Lukasz Luba, Rob Herring,
	DTML, Linux PM, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

Hi,

On Tue, Sep 15, 2020 at 1:55 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 15/09/2020 19:58, Matthias Kaehlcke wrote:
> > On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
> >> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
> >>> +Thermal folks
> >>>
> >>> Hi Rajendra,
> >>>
> >>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
> >>>> Hi Rob,
> >>>>
> >>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> >>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> >>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> >>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> >>>> calculating power values in mW, is there a need to document the property as something that *has* to be
> >>>> based on real power measurements?
> >>>
> >>> Relative values may work for scheduling decisions, but not for thermal
> >>> management with the power allocator, at least not when CPU cooling devices
> >>> are combined with others that specify their power consumption in absolute
> >>> values. Such a configuration should be supported IMO.
> >>
> >> The energy model is used in the cpufreq cooling device and if the
> >> sustainable power is consistent with the relative values then there is
> >> no reason it shouldn't work.
> >
> > Agreed on thermal zones that exclusively use CPUs as cooling devices, but
> > what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
> > GPU that specifies its power in mW?
>
> Well, if a SoC vendor decides to mix the units, then there is nothing we
> can do.

I mean, there is something someone could do.  They could buy one of
these devices, measure the power (which wouldn't actually be that hard
to do), then submit a patch to adjust all the numbers.  ;-)

-Doug

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 21:46         ` Doug Anderson
@ 2020-09-15 21:51           ` Matthias Kaehlcke
  0 siblings, 0 replies; 17+ messages in thread
From: Matthias Kaehlcke @ 2020-09-15 21:51 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Daniel Lezcano, Rajendra Nayak, Lukasz Luba, Rob Herring, DTML,
	Linux PM, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

On Tue, Sep 15, 2020 at 02:46:16PM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Sep 15, 2020 at 1:55 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
> >
> > On 15/09/2020 19:58, Matthias Kaehlcke wrote:
> > > On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
> > >> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
> > >>> +Thermal folks
> > >>>
> > >>> Hi Rajendra,
> > >>>
> > >>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
> > >>>> Hi Rob,
> > >>>>
> > >>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> > >>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> > >>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> > >>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> > >>>> calculating power values in mW, is there a need to document the property as something that *has* to be
> > >>>> based on real power measurements?
> > >>>
> > >>> Relative values may work for scheduling decisions, but not for thermal
> > >>> management with the power allocator, at least not when CPU cooling devices
> > >>> are combined with others that specify their power consumption in absolute
> > >>> values. Such a configuration should be supported IMO.
> > >>
> > >> The energy model is used in the cpufreq cooling device and if the
> > >> sustainable power is consistent with the relative values then there is
> > >> no reason it shouldn't work.
> > >
> > > Agreed on thermal zones that exclusively use CPUs as cooling devices, but
> > > what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
> > > GPU that specifies its power in mW?
> >
> > Well, if a SoC vendor decides to mix the units, then there is nothing we
> > can do.
> 
> I mean, there is something someone could do.  They could buy one of
> these devices, measure the power (which wouldn't actually be that hard
> to do), then submit a patch to adjust all the numbers.  ;-)

In case they look for a recipe:

commit ac60c5e33df4ec2b69c7e3ebbc0ccf1557e7bd5e
Author: Matthias Kaehlcke <mka@chromium.org>
Date:   Thu Apr 11 17:01:58 2019 -0700

    ARM: dts: rockchip: Add dynamic-power-coefficient for rk3288

    The value was determined with the following method:

    - take CPUs 1-3 offline
    - for each OPP
      - set cpufreq min and max freq to OPP freq
      - start dhrystone benchmark
      - measure CPU power consumption during 10s
      - calculate Cx for OPPx
      - Cx = (Px - P1) / (Vx²fx - V1²f1)          [1]
        using the following units: mW / Ghz / V   [2]
    - C = avg(C2, ..., Cn)

   [1] see commit 4daa001a1773 ("arm64: dts: juno: Add cpu
       dynamic-power-coefficient information")
   [2] https://patchwork.kernel.org/patch/10493615/#22158551

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 21:36             ` Matthias Kaehlcke
@ 2020-09-16  4:15               ` Rajendra Nayak
  2020-09-16 16:40                 ` Matthias Kaehlcke
  0 siblings, 1 reply; 17+ messages in thread
From: Rajendra Nayak @ 2020-09-16  4:15 UTC (permalink / raw)
  To: Matthias Kaehlcke, Daniel Lezcano
  Cc: Lukasz Luba, Rob Herring, DTML, Doug Anderson, linux-pm,
	Amit Daniel Kachhap, Viresh Kumar, Javi Merino


On 9/16/2020 3:06 AM, Matthias Kaehlcke wrote:
> On Tue, Sep 15, 2020 at 11:23:49PM +0200, Daniel Lezcano wrote:
>> On 15/09/2020 23:13, Matthias Kaehlcke wrote:
>>> On Tue, Sep 15, 2020 at 10:55:52PM +0200, Daniel Lezcano wrote:
>>>> On 15/09/2020 19:58, Matthias Kaehlcke wrote:
>>>>> On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
>>>>>> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
>>>>>>> +Thermal folks
>>>>>>>
>>>>>>> Hi Rajendra,
>>>>>>>
>>>>>>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
>>>>>>>> Hi Rob,
>>>>>>>>
>>>>>>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
>>>>>>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
>>>>>>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
>>>>>>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
>>>>>>>> calculating power values in mW, is there a need to document the property as something that *has* to be
>>>>>>>> based on real power measurements?
>>>>>>>
>>>>>>> Relative values may work for scheduling decisions, but not for thermal
>>>>>>> management with the power allocator, at least not when CPU cooling devices
>>>>>>> are combined with others that specify their power consumption in absolute
>>>>>>> values. Such a configuration should be supported IMO.
>>>>>>
>>>>>> The energy model is used in the cpufreq cooling device and if the
>>>>>> sustainable power is consistent with the relative values then there is
>>>>>> no reason it shouldn't work.
>>>>>
>>>>> Agreed on thermal zones that exclusively use CPUs as cooling devices, but
>>>>> what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
>>>>> GPU that specifies its power in mW?
>>>>
>>>> Well, if a SoC vendor decides to mix the units, then there is nothing we
>>>> can do.
>>>>
>>>> When specifying the power numbers available for the SoC, they could be
>>>> all scaled against the highest power number.
>>>
>>> The GPU was just one example, a device could have heat dissipating components
>>> that are not from the SoC vendor (e.g. WiFi, modem, backlight), and depending
>>> on the design it might not make sense to have separate thermal zones.
>>
>> Is it possible to elaborate, I'm not sure to get the point ?
> 
> A device could have a thermal zone with the following cooling
> devices:
> 
> - CPUs with power consumption specified as pmW (pseudo mW
> - A modem from a third party vendor. The modem can dissipate
>    significant heat and allows to throttle the bandwidth for
>    cooling. The power consumption of the modem is given in
>    mW.
> 
> These could be crammed together in a small form factor
> (e.g. ChromeCast or Chromebit) which makes it difficult to
> discern with a sensor what exactly is generating the heat,
> which is why you have a single thermal zone.
> 
> IPA is used as governor for this zone, it can't make accurate
> decisions because one cooling device specifies it's power
> consumption in pmW and the other in mW.

Is there a real example upstream for this, or is it a theoretical
problem (which can exist in the future) we are trying to solve?

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 17:58     ` Matthias Kaehlcke
  2020-09-15 20:55       ` Daniel Lezcano
@ 2020-09-16  9:18       ` Lukasz Luba
  1 sibling, 0 replies; 17+ messages in thread
From: Lukasz Luba @ 2020-09-16  9:18 UTC (permalink / raw)
  To: Matthias Kaehlcke, Daniel Lezcano
  Cc: Rajendra Nayak, Rob Herring, DTML, Doug Anderson, linux-pm,
	Amit Daniel Kachhap, Viresh Kumar, Javi Merino


On 9/15/20 6:58 PM, Matthias Kaehlcke wrote:
> On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
>> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
>>> +Thermal folks
>>>
>>> Hi Rajendra,
>>>
>>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
>>>> Hi Rob,
>>>>
>>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
>>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
>>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
>>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
>>>> calculating power values in mW, is there a need to document the property as something that *has* to be
>>>> based on real power measurements?
>>>
>>> Relative values may work for scheduling decisions, but not for thermal
>>> management with the power allocator, at least not when CPU cooling devices
>>> are combined with others that specify their power consumption in absolute
>>> values. Such a configuration should be supported IMO.
>>
>> The energy model is used in the cpufreq cooling device and if the
>> sustainable power is consistent with the relative values then there is
>> no reason it shouldn't work.
> 
> Agreed on thermal zones that exclusively use CPUs as cooling devices, but
> what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
> GPU that specifies its power in mW?
> 

These two (pmW and mW) shouldn't be combined in one thermal
zone with IPA as a governor, but IPA will try to recover.
It will be more unstable meaning the OPPs capping might jump from
lowest to highest, which will be spotted in the testing.

For example, we have CPU with abstract scale where max is 10 and
GPU with real mW max = 2990. They both have the same 'weight' = 1.
Let's say IPA is seeing 3000 as a total budget and splits it:
10/3000 * 3000 = 10
2990/3000 * 3000 = 2990
Which means both actors can run at max speed. Unfortunately, the real
consumption of the CPU could be ~4000mW. The temperature in the next
period will rise above the threshold for which the budget was estimated.
Next time both devices are likely to be capped at minimum freq because
there is no budget and temperature is to high.

After a while the PID would realize this and try to recover, but this
shouldn't be the main solution.

Regards,
Lukasz



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-15 20:55       ` Daniel Lezcano
  2020-09-15 21:13         ` Matthias Kaehlcke
  2020-09-15 21:46         ` Doug Anderson
@ 2020-09-16  9:53         ` Lukasz Luba
  2020-09-16 16:48           ` Matthias Kaehlcke
  2 siblings, 1 reply; 17+ messages in thread
From: Lukasz Luba @ 2020-09-16  9:53 UTC (permalink / raw)
  To: Daniel Lezcano, Matthias Kaehlcke
  Cc: Rajendra Nayak, Rob Herring, DTML, Doug Anderson, linux-pm,
	Amit Daniel Kachhap, Viresh Kumar, Javi Merino



On 9/15/20 9:55 PM, Daniel Lezcano wrote:
> On 15/09/2020 19:58, Matthias Kaehlcke wrote:
>> On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
>>> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
>>>> +Thermal folks
>>>>
>>>> Hi Rajendra,
>>>>
>>>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
>>>>> Hi Rob,
>>>>>
>>>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
>>>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
>>>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
>>>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
>>>>> calculating power values in mW, is there a need to document the property as something that *has* to be
>>>>> based on real power measurements?
>>>>
>>>> Relative values may work for scheduling decisions, but not for thermal
>>>> management with the power allocator, at least not when CPU cooling devices
>>>> are combined with others that specify their power consumption in absolute
>>>> values. Such a configuration should be supported IMO.
>>>
>>> The energy model is used in the cpufreq cooling device and if the
>>> sustainable power is consistent with the relative values then there is
>>> no reason it shouldn't work.
>>
>> Agreed on thermal zones that exclusively use CPUs as cooling devices, but
>> what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
>> GPU that specifies its power in mW?
> 
> Well, if a SoC vendor decides to mix the units, then there is nothing we
> can do.
> 
> When specifying the power numbers available for the SoC, they could be
> all scaled against the highest power number.
> 
> There are so many factors on the hardware, the firmware, the kernel and
> the userspace sides having an impact on the energy efficiency, I don't
> understand why SoC vendors are so shy to share the power numbers...
> 

Unfortunately (because it might confuse engineers in some cases like
this one), even in the SCMI spec DEN0056B [1] we have this statement
which allows to expose an 'abstract scale' values from firmware:
'4.5.1 Performance domain management protocol background
...The power can be expressed in mW or in an abstract scale. Vendors
are not obliged to reveal power costs if it is undesirable, but a linear
scale is required.'

This is the source of our Energy Model values when we use SCMI cpufreq
driver [2].

So this might be an issue in the future, when some SoC vendor decides to
not expose the real mW, but the phone OEM would then take the SoC and
try to add some other cooling device into the thermal zone. That new
device is not part of the SCMI perf but some custom and has the real mW.

Do you think Daniel it should be somewhere documented in the kernel
thermal that the firmware might silently populate EM with 'abstract
scale'? Then special care should be taken when combining new
cooling devices.

Regards,
Lukasz

[1] https://developer.arm.com/documentation/den0056/b/?lang=en
[2] 
https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L121

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-16  4:15               ` Rajendra Nayak
@ 2020-09-16 16:40                 ` Matthias Kaehlcke
  0 siblings, 0 replies; 17+ messages in thread
From: Matthias Kaehlcke @ 2020-09-16 16:40 UTC (permalink / raw)
  To: Rajendra Nayak
  Cc: Daniel Lezcano, Lukasz Luba, Rob Herring, DTML, Doug Anderson,
	linux-pm, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

On Wed, Sep 16, 2020 at 09:45:04AM +0530, Rajendra Nayak wrote:
> 
> On 9/16/2020 3:06 AM, Matthias Kaehlcke wrote:
> > On Tue, Sep 15, 2020 at 11:23:49PM +0200, Daniel Lezcano wrote:
> > > On 15/09/2020 23:13, Matthias Kaehlcke wrote:
> > > > On Tue, Sep 15, 2020 at 10:55:52PM +0200, Daniel Lezcano wrote:
> > > > > On 15/09/2020 19:58, Matthias Kaehlcke wrote:
> > > > > > On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
> > > > > > > On 15/09/2020 19:24, Matthias Kaehlcke wrote:
> > > > > > > > +Thermal folks
> > > > > > > > 
> > > > > > > > Hi Rajendra,
> > > > > > > > 
> > > > > > > > On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
> > > > > > > > > Hi Rob,
> > > > > > > > > 
> > > > > > > > > There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> > > > > > > > > for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> > > > > > > > > at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> > > > > > > > > I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> > > > > > > > > calculating power values in mW, is there a need to document the property as something that *has* to be
> > > > > > > > > based on real power measurements?
> > > > > > > > 
> > > > > > > > Relative values may work for scheduling decisions, but not for thermal
> > > > > > > > management with the power allocator, at least not when CPU cooling devices
> > > > > > > > are combined with others that specify their power consumption in absolute
> > > > > > > > values. Such a configuration should be supported IMO.
> > > > > > > 
> > > > > > > The energy model is used in the cpufreq cooling device and if the
> > > > > > > sustainable power is consistent with the relative values then there is
> > > > > > > no reason it shouldn't work.
> > > > > > 
> > > > > > Agreed on thermal zones that exclusively use CPUs as cooling devices, but
> > > > > > what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
> > > > > > GPU that specifies its power in mW?
> > > > > 
> > > > > Well, if a SoC vendor decides to mix the units, then there is nothing we
> > > > > can do.
> > > > > 
> > > > > When specifying the power numbers available for the SoC, they could be
> > > > > all scaled against the highest power number.
> > > > 
> > > > The GPU was just one example, a device could have heat dissipating components
> > > > that are not from the SoC vendor (e.g. WiFi, modem, backlight), and depending
> > > > on the design it might not make sense to have separate thermal zones.
> > > 
> > > Is it possible to elaborate, I'm not sure to get the point ?
> > 
> > A device could have a thermal zone with the following cooling
> > devices:
> > 
> > - CPUs with power consumption specified as pmW (pseudo mW
> > - A modem from a third party vendor. The modem can dissipate
> >    significant heat and allows to throttle the bandwidth for
> >    cooling. The power consumption of the modem is given in
> >    mW.
> > 
> > These could be crammed together in a small form factor
> > (e.g. ChromeCast or Chromebit) which makes it difficult to
> > discern with a sensor what exactly is generating the heat,
> > which is why you have a single thermal zone.
> > 
> > IPA is used as governor for this zone, it can't make accurate
> > decisions because one cooling device specifies it's power
> > consumption in pmW and the other in mW.
> 
> Is there a real example upstream for this, or is it a theoretical
> problem (which can exist in the future) we are trying to solve?

Sort of, there is the rk3288-based veyron-mickey, which uses CPUs,
the GPU and ddrfreq as cooling devices in the same zone:

https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.19/arch/arm/boot/dts/rk3288-veyron-mickey.dts#42

The device doesn't use IPA though, so mixed up units wouldn't matter in this
case.

From a quick grep in arch/arm(64)/boot/dts/ at least it seems that mixing
cooling devices of different types is not a common case, though it doesn't
necessarily reflect what is done in custom DTs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-16  9:53         ` Lukasz Luba
@ 2020-09-16 16:48           ` Matthias Kaehlcke
  2020-09-24  6:09             ` Rajendra Nayak
  0 siblings, 1 reply; 17+ messages in thread
From: Matthias Kaehlcke @ 2020-09-16 16:48 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Daniel Lezcano, Rajendra Nayak, Rob Herring, DTML, Doug Anderson,
	linux-pm, Amit Daniel Kachhap, Viresh Kumar, Javi Merino

On Wed, Sep 16, 2020 at 10:53:48AM +0100, Lukasz Luba wrote:
> 
> 
> On 9/15/20 9:55 PM, Daniel Lezcano wrote:
> > On 15/09/2020 19:58, Matthias Kaehlcke wrote:
> > > On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
> > > > On 15/09/2020 19:24, Matthias Kaehlcke wrote:
> > > > > +Thermal folks
> > > > > 
> > > > > Hi Rajendra,
> > > > > 
> > > > > On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
> > > > > > Hi Rob,
> > > > > > 
> > > > > > There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
> > > > > > for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
> > > > > > at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
> > > > > > I believe relative values work perfectly fine for scheduling decisions, but with others using this for
> > > > > > calculating power values in mW, is there a need to document the property as something that *has* to be
> > > > > > based on real power measurements?
> > > > > 
> > > > > Relative values may work for scheduling decisions, but not for thermal
> > > > > management with the power allocator, at least not when CPU cooling devices
> > > > > are combined with others that specify their power consumption in absolute
> > > > > values. Such a configuration should be supported IMO.
> > > > 
> > > > The energy model is used in the cpufreq cooling device and if the
> > > > sustainable power is consistent with the relative values then there is
> > > > no reason it shouldn't work.
> > > 
> > > Agreed on thermal zones that exclusively use CPUs as cooling devices, but
> > > what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
> > > GPU that specifies its power in mW?
> > 
> > Well, if a SoC vendor decides to mix the units, then there is nothing we
> > can do.
> > 
> > When specifying the power numbers available for the SoC, they could be
> > all scaled against the highest power number.
> > 
> > There are so many factors on the hardware, the firmware, the kernel and
> > the userspace sides having an impact on the energy efficiency, I don't
> > understand why SoC vendors are so shy to share the power numbers...
> > 
> 
> Unfortunately (because it might confuse engineers in some cases like
> this one), even in the SCMI spec DEN0056B [1] we have this statement
> which allows to expose an 'abstract scale' values from firmware:
> '4.5.1 Performance domain management protocol background
> ...The power can be expressed in mW or in an abstract scale. Vendors
> are not obliged to reveal power costs if it is undesirable, but a linear
> scale is required.'
> 
> This is the source of our Energy Model values when we use SCMI cpufreq
> driver [2].
> 
> So this might be an issue in the future, when some SoC vendor decides to
> not expose the real mW, but the phone OEM would then take the SoC and
> try to add some other cooling device into the thermal zone. That new
> device is not part of the SCMI perf but some custom and has the real mW.
> 
> Do you think Daniel it should be somewhere documented in the kernel
> thermal that the firmware might silently populate EM with 'abstract
> scale'? Then special care should be taken when combining new
> cooling devices.
> 
> Regards,
> Lukasz
> 
> [1] https://developer.arm.com/documentation/den0056/b/?lang=en
> [2] https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L121

If an 'abstract scale' is explicitly allowed I think it should be documented
to avoid confusion and make engineers aware of the peril of combining cooling
devices of different types in the same thermal zone.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-16 16:48           ` Matthias Kaehlcke
@ 2020-09-24  6:09             ` Rajendra Nayak
  2020-09-24  8:21               ` Lukasz Luba
  0 siblings, 1 reply; 17+ messages in thread
From: Rajendra Nayak @ 2020-09-24  6:09 UTC (permalink / raw)
  To: Matthias Kaehlcke, Lukasz Luba
  Cc: Daniel Lezcano, Rob Herring, DTML, Doug Anderson, linux-pm,
	Amit Daniel Kachhap, Viresh Kumar, Javi Merino


On 9/16/2020 10:18 PM, Matthias Kaehlcke wrote:
> On Wed, Sep 16, 2020 at 10:53:48AM +0100, Lukasz Luba wrote:
>>
>>
>> On 9/15/20 9:55 PM, Daniel Lezcano wrote:
>>> On 15/09/2020 19:58, Matthias Kaehlcke wrote:
>>>> On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
>>>>> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
>>>>>> +Thermal folks
>>>>>>
>>>>>> Hi Rajendra,
>>>>>>
>>>>>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
>>>>>>> Hi Rob,
>>>>>>>
>>>>>>> There has been some discussions on another thread [1] around the DPC (dynamic-power-coefficient) values
>>>>>>> for CPU's being relative vs absolute (based on real power) and should they be used to derive 'real' power
>>>>>>> at various OPPs in order to calculate things like 'sustainable-power' for thermal zones.
>>>>>>> I believe relative values work perfectly fine for scheduling decisions, but with others using this for
>>>>>>> calculating power values in mW, is there a need to document the property as something that *has* to be
>>>>>>> based on real power measurements?
>>>>>>
>>>>>> Relative values may work for scheduling decisions, but not for thermal
>>>>>> management with the power allocator, at least not when CPU cooling devices
>>>>>> are combined with others that specify their power consumption in absolute
>>>>>> values. Such a configuration should be supported IMO.
>>>>>
>>>>> The energy model is used in the cpufreq cooling device and if the
>>>>> sustainable power is consistent with the relative values then there is
>>>>> no reason it shouldn't work.
>>>>
>>>> Agreed on thermal zones that exclusively use CPUs as cooling devices, but
>>>> what when you have mixed zones, with CPUs with their pseudo-unit and e.g. a
>>>> GPU that specifies its power in mW?
>>>
>>> Well, if a SoC vendor decides to mix the units, then there is nothing we
>>> can do.
>>>
>>> When specifying the power numbers available for the SoC, they could be
>>> all scaled against the highest power number.
>>>
>>> There are so many factors on the hardware, the firmware, the kernel and
>>> the userspace sides having an impact on the energy efficiency, I don't
>>> understand why SoC vendors are so shy to share the power numbers...
>>>
>>
>> Unfortunately (because it might confuse engineers in some cases like
>> this one), even in the SCMI spec DEN0056B [1] we have this statement
>> which allows to expose an 'abstract scale' values from firmware:
>> '4.5.1 Performance domain management protocol background
>> ...The power can be expressed in mW or in an abstract scale. Vendors
>> are not obliged to reveal power costs if it is undesirable, but a linear
>> scale is required.'
>>
>> This is the source of our Energy Model values when we use SCMI cpufreq
>> driver [2].
>>
>> So this might be an issue in the future, when some SoC vendor decides to
>> not expose the real mW, but the phone OEM would then take the SoC and
>> try to add some other cooling device into the thermal zone. That new
>> device is not part of the SCMI perf but some custom and has the real mW.
>>
>> Do you think Daniel it should be somewhere documented in the kernel
>> thermal that the firmware might silently populate EM with 'abstract
>> scale'? Then special care should be taken when combining new
>> cooling devices.
>>
>> Regards,
>> Lukasz
>>
>> [1] https://developer.arm.com/documentation/den0056/b/?lang=en
>> [2] https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L121
> 
> If an 'abstract scale' is explicitly allowed I think it should be documented
> to avoid confusion and make engineers aware of the peril of combining cooling
> devices of different types in the same thermal zone.

Rob, we should perhaps also document this as part of the DT bindings document
to be consistent, that an abstract scale is allowed when specifying the DPC
values in DT.
if you agree, I can spin a quick patch to update the documentation.

thanks,
Rajendra


-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: is 'dynamic-power-coefficient' expected to be based on 'real' power measurements?
  2020-09-24  6:09             ` Rajendra Nayak
@ 2020-09-24  8:21               ` Lukasz Luba
  0 siblings, 0 replies; 17+ messages in thread
From: Lukasz Luba @ 2020-09-24  8:21 UTC (permalink / raw)
  To: Rajendra Nayak, Matthias Kaehlcke
  Cc: Daniel Lezcano, Rob Herring, DTML, Doug Anderson, linux-pm,
	Amit Daniel Kachhap, Viresh Kumar, Javi Merino



On 9/24/20 7:09 AM, Rajendra Nayak wrote:
> 
> On 9/16/2020 10:18 PM, Matthias Kaehlcke wrote:
>> On Wed, Sep 16, 2020 at 10:53:48AM +0100, Lukasz Luba wrote:
>>>
>>>
>>> On 9/15/20 9:55 PM, Daniel Lezcano wrote:
>>>> On 15/09/2020 19:58, Matthias Kaehlcke wrote:
>>>>> On Tue, Sep 15, 2020 at 07:50:10PM +0200, Daniel Lezcano wrote:
>>>>>> On 15/09/2020 19:24, Matthias Kaehlcke wrote:
>>>>>>> +Thermal folks
>>>>>>>
>>>>>>> Hi Rajendra,
>>>>>>>
>>>>>>> On Tue, Sep 15, 2020 at 11:14:00AM +0530, Rajendra Nayak wrote:
>>>>>>>> Hi Rob,
>>>>>>>>
>>>>>>>> There has been some discussions on another thread [1] around the 
>>>>>>>> DPC (dynamic-power-coefficient) values
>>>>>>>> for CPU's being relative vs absolute (based on real power) and 
>>>>>>>> should they be used to derive 'real' power
>>>>>>>> at various OPPs in order to calculate things like 
>>>>>>>> 'sustainable-power' for thermal zones.
>>>>>>>> I believe relative values work perfectly fine for scheduling 
>>>>>>>> decisions, but with others using this for
>>>>>>>> calculating power values in mW, is there a need to document the 
>>>>>>>> property as something that *has* to be
>>>>>>>> based on real power measurements?
>>>>>>>
>>>>>>> Relative values may work for scheduling decisions, but not for 
>>>>>>> thermal
>>>>>>> management with the power allocator, at least not when CPU 
>>>>>>> cooling devices
>>>>>>> are combined with others that specify their power consumption in 
>>>>>>> absolute
>>>>>>> values. Such a configuration should be supported IMO.
>>>>>>
>>>>>> The energy model is used in the cpufreq cooling device and if the
>>>>>> sustainable power is consistent with the relative values then 
>>>>>> there is
>>>>>> no reason it shouldn't work.
>>>>>
>>>>> Agreed on thermal zones that exclusively use CPUs as cooling 
>>>>> devices, but
>>>>> what when you have mixed zones, with CPUs with their pseudo-unit 
>>>>> and e.g. a
>>>>> GPU that specifies its power in mW?
>>>>
>>>> Well, if a SoC vendor decides to mix the units, then there is 
>>>> nothing we
>>>> can do.
>>>>
>>>> When specifying the power numbers available for the SoC, they could be
>>>> all scaled against the highest power number.
>>>>
>>>> There are so many factors on the hardware, the firmware, the kernel and
>>>> the userspace sides having an impact on the energy efficiency, I don't
>>>> understand why SoC vendors are so shy to share the power numbers...
>>>>
>>>
>>> Unfortunately (because it might confuse engineers in some cases like
>>> this one), even in the SCMI spec DEN0056B [1] we have this statement
>>> which allows to expose an 'abstract scale' values from firmware:
>>> '4.5.1 Performance domain management protocol background
>>> ...The power can be expressed in mW or in an abstract scale. Vendors
>>> are not obliged to reveal power costs if it is undesirable, but a linear
>>> scale is required.'
>>>
>>> This is the source of our Energy Model values when we use SCMI cpufreq
>>> driver [2].
>>>
>>> So this might be an issue in the future, when some SoC vendor decides to
>>> not expose the real mW, but the phone OEM would then take the SoC and
>>> try to add some other cooling device into the thermal zone. That new
>>> device is not part of the SCMI perf but some custom and has the real mW.
>>>
>>> Do you think Daniel it should be somewhere documented in the kernel
>>> thermal that the firmware might silently populate EM with 'abstract
>>> scale'? Then special care should be taken when combining new
>>> cooling devices.
>>>
>>> Regards,
>>> Lukasz
>>>
>>> [1] https://developer.arm.com/documentation/den0056/b/?lang=en
>>> [2] 
>>> https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L121 
>>>
>>
>> If an 'abstract scale' is explicitly allowed I think it should be 
>> documented
>> to avoid confusion and make engineers aware of the peril of combining 
>> cooling
>> devices of different types in the same thermal zone.
> 
> Rob, we should perhaps also document this as part of the DT bindings 
> document
> to be consistent, that an abstract scale is allowed when specifying the DPC
> values in DT.
> if you agree, I can spin a quick patch to update the documentation.
> 

The 'dynamic-power-coefficient' which is in the:
Documentation/devicetree/bindings/arm/cpus.yaml does not need any update
because it expects units of 'uW/MHz/V^2' to calculate dynamic power.

You have two ways to register Energy Model for a device:
1. em_dev_register_perf_domain() where you provide the callback function
and that can feed the 'abstract scale' (like the scmi-cpufreq.c)
2. dev_pm_opp_of_register_em() where the 'dynamic-power-coefficient'
is going to be involved.

If the developer would see that the platform might face potential issue
of mixing devices in one thermal zone of two scales, it should not use
the 2nd registration, but the 1st API and provide callback with
consistent scale to all devices. It is also very unlikely that the
device like GPU or DSP would not be part of the scmi perf domains
and would not expose a consistent abstract scale.

I have a patch spinning in our internal review to update EAS, EM, IPA
documentation and that would be updated soon.

Regards,
Lukasz


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-09-24  8:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <248bb01e-1746-c84c-78c4-3cf7d2541a70@codeaurora.org>
2020-09-15 17:24 ` is 'dynamic-power-coefficient' expected to be based on 'real' power measurements? Matthias Kaehlcke
2020-09-15 17:50   ` Daniel Lezcano
2020-09-15 17:58     ` Matthias Kaehlcke
2020-09-15 20:55       ` Daniel Lezcano
2020-09-15 21:13         ` Matthias Kaehlcke
2020-09-15 21:23           ` Daniel Lezcano
2020-09-15 21:36             ` Matthias Kaehlcke
2020-09-16  4:15               ` Rajendra Nayak
2020-09-16 16:40                 ` Matthias Kaehlcke
2020-09-15 21:46         ` Doug Anderson
2020-09-15 21:51           ` Matthias Kaehlcke
2020-09-16  9:53         ` Lukasz Luba
2020-09-16 16:48           ` Matthias Kaehlcke
2020-09-24  6:09             ` Rajendra Nayak
2020-09-24  8:21               ` Lukasz Luba
2020-09-16  9:18       ` Lukasz Luba
2020-09-15 19:53 ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).