Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Apr 27, 2020 at 08:22:47PM -0700, Francisco Jerez wrote:
>> This addresses the technical concerns people brought up about my
>> previous v2 revision of this series.  Other than a few bug fixes, the
>> only major change relative to v2 is that the controller is now exposed
>> as a new CPUFREQ generic governor as requested by Rafael (named
>> "adaptive" in this RFC though other naming suggestions are welcome).
>> Main reason for calling this v2.99 rather than v3 is that I haven't
>> yet addressed all the documentation requests from the v2 thread --
>> Will spend some time doing that as soon as I have an ACK (ideally from
>> Rafael) that things are moving in the right direction.
>> 
>> You can also find this series along with the WIP code for non-HWP
>> platforms in this branch:
>> 
>> https://github.com/curro/linux/tree/intel_pstate-vlp-v2.99
>> 
>> Thanks!
>> 
>> [PATCHv2.99 01/11] PM: QoS: Add CPU_SCALING_RESPONSE global PM QoS limit.
>> [PATCHv2.99 02/11] drm/i915: Adjust PM QoS scaling response frequency based on GPU load.
>> [PATCHv2.99 03/11] OPTIONAL: drm/i915: Expose PM QoS control parameters via debugfs.
>> [PATCHv2.99 04/11] cpufreq: Define ADAPTIVE frequency governor policy.
>> [PATCHv2.99 05/11] cpufreq: intel_pstate: Reorder intel_pstate_clear_update_util_hook() and intel_pstate_set_update_util_hook().
>> [PATCHv2.99 06/11] cpufreq: intel_pstate: Call intel_pstate_set_update_util_hook() once from the setpolicy hook.
>> [PATCHv2.99 07/11] cpufreq: intel_pstate: Implement VLP controller statistics and target range calculation.
>> [PATCHv2.99 08/11] cpufreq: intel_pstate: Implement VLP controller for HWP parts.
>> [PATCHv2.99 09/11] cpufreq: intel_pstate: Enable VLP controller based on ACPI FADT profile and CPUID.
>> [PATCHv2.99 10/11] OPTIONAL: cpufreq: intel_pstate: Add tracing of VLP controller status.
>> [PATCHv2.99 11/11] OPTIONAL: cpufreq: intel_pstate: Expose VLP controller parameters via debugfs.
>
> What I'm missing is an explanation for why this isn't using the
> infrastructure that was build for these kinds of things? The thermal
> framework, was AFAIU, supposed to help with these things, and the IPA
> thing in particular is used by ARM to do exactly this GPU/CPU power
> budget thing.
>
> If thermal/IPA is found wanting, why aren't we improving that?

The GPU/CPU power budget "thing" is only a positive side effect of this
series on some TDP-bound systems.  Its ultimate purpose is improving the
energy efficiency of workloads which have a bottleneck on a device other
than the CPU, by giving the bottlenecking device driver some influence
over the response latency of CPUFREQ governors via a PM QoS interface.
This seems to be completely outside the scope of the thermal framework
and IPA AFAIU.

>
> How much of that ADAPTIVE crud is actually intel_pstate specific? On a
> (really) quick read it appears to me that much of the controller bits
> there can be applied more generic, and thus should not be part of any
> one governor.
>

The implementation of that is intel_pstate-specific right now, but the
basic algorithm could be made to work on any other governor in
principle, which is why it is exposed as a generic CPUFREQ governor.  I
don't care about taking out the generic CPUFREQ governor changes if you
don't like them, and going back to some driver-specific means of turning
it on and off (though Rafael might disagree with that).

> Specifically, I want to use sched_util as cpufreq governor and use the
> intel_pstate as a passive driver.

Yeah, getting a similar optimization into the schedutil governor has
been on my wish list for a while, but I haven't had the time to get very
far on that except for a handful of hacks.  The intel_pstate handling is
going to be necessary anyway in order to handle HWP systems gracefully,
at least in the near future until schedutil becomes a viable alternative
to intel_pstate in active mode on HWP systems.