Peter Zijlstra writes: > On Mon, Apr 27, 2020 at 08:22:47PM -0700, Francisco Jerez wrote: >> This addresses the technical concerns people brought up about my >> previous v2 revision of this series. Other than a few bug fixes, the >> only major change relative to v2 is that the controller is now exposed >> as a new CPUFREQ generic governor as requested by Rafael (named >> "adaptive" in this RFC though other naming suggestions are welcome). >> Main reason for calling this v2.99 rather than v3 is that I haven't >> yet addressed all the documentation requests from the v2 thread -- >> Will spend some time doing that as soon as I have an ACK (ideally from >> Rafael) that things are moving in the right direction. >> >> You can also find this series along with the WIP code for non-HWP >> platforms in this branch: >> >> https://github.com/curro/linux/tree/intel_pstate-vlp-v2.99 >> >> Thanks! >> >> [PATCHv2.99 01/11] PM: QoS: Add CPU_SCALING_RESPONSE global PM QoS limit. >> [PATCHv2.99 02/11] drm/i915: Adjust PM QoS scaling response frequency based on GPU load. >> [PATCHv2.99 03/11] OPTIONAL: drm/i915: Expose PM QoS control parameters via debugfs. >> [PATCHv2.99 04/11] cpufreq: Define ADAPTIVE frequency governor policy. >> [PATCHv2.99 05/11] cpufreq: intel_pstate: Reorder intel_pstate_clear_update_util_hook() and intel_pstate_set_update_util_hook(). >> [PATCHv2.99 06/11] cpufreq: intel_pstate: Call intel_pstate_set_update_util_hook() once from the setpolicy hook. >> [PATCHv2.99 07/11] cpufreq: intel_pstate: Implement VLP controller statistics and target range calculation. >> [PATCHv2.99 08/11] cpufreq: intel_pstate: Implement VLP controller for HWP parts. >> [PATCHv2.99 09/11] cpufreq: intel_pstate: Enable VLP controller based on ACPI FADT profile and CPUID. >> [PATCHv2.99 10/11] OPTIONAL: cpufreq: intel_pstate: Add tracing of VLP controller status. >> [PATCHv2.99 11/11] OPTIONAL: cpufreq: intel_pstate: Expose VLP controller parameters via debugfs. > > What I'm missing is an explanation for why this isn't using the > infrastructure that was build for these kinds of things? The thermal > framework, was AFAIU, supposed to help with these things, and the IPA > thing in particular is used by ARM to do exactly this GPU/CPU power > budget thing. > > If thermal/IPA is found wanting, why aren't we improving that? The GPU/CPU power budget "thing" is only a positive side effect of this series on some TDP-bound systems. Its ultimate purpose is improving the energy efficiency of workloads which have a bottleneck on a device other than the CPU, by giving the bottlenecking device driver some influence over the response latency of CPUFREQ governors via a PM QoS interface. This seems to be completely outside the scope of the thermal framework and IPA AFAIU. > > How much of that ADAPTIVE crud is actually intel_pstate specific? On a > (really) quick read it appears to me that much of the controller bits > there can be applied more generic, and thus should not be part of any > one governor. > The implementation of that is intel_pstate-specific right now, but the basic algorithm could be made to work on any other governor in principle, which is why it is exposed as a generic CPUFREQ governor. I don't care about taking out the generic CPUFREQ governor changes if you don't like them, and going back to some driver-specific means of turning it on and off (though Rafael might disagree with that). > Specifically, I want to use sched_util as cpufreq governor and use the > intel_pstate as a passive driver. Yeah, getting a similar optimization into the schedutil governor has been on my wish list for a while, but I haven't had the time to get very far on that except for a handful of hacks. The intel_pstate handling is going to be necessary anyway in order to handle HWP systems gracefully, at least in the near future until schedutil becomes a viable alternative to intel_pstate in active mode on HWP systems.