Juri Lelli writes: > Hi, > > On 11/04/18 09:26, Francisco Jerez wrote: >> Francisco Jerez writes: >> >> > Hi Srinivas, >> > >> > Srinivas Pandruvada writes: >> > >> >> On Tue, 2018-04-10 at 15:28 -0700, Francisco Jerez wrote: >> >>> Francisco Jerez writes: >> >>> >> >> [...] >> >> >> >> >> >>> For the case anyone is wondering what's going on, Srinivas pointed me >> >>> at >> >>> a larger idle power usage increase off-list, ultimately caused by the >> >>> low-latency heuristic as discussed in the paragraph above.  I have a >> >>> v2 >> >>> of PATCH 6 that gives the controller a third response curve roughly >> >>> intermediate between the low-latency and low-power states of this >> >>> revision, which avoids the energy usage increase while C0 residency >> >>> is >> >>> low (e.g. during idle) expected for v1.  The low-latency behavior of >> >>> this revision is still going to be available based on a heuristic (in >> >>> particular when a realtime-priority task is scheduled).  We're >> >>> carrying >> >>> out some additional testing, I'll post the code here eventually. >> >> >> >> Please try sched-util governor also. There is a frequency-invariant >> >> patch, which I can send you (This eventually will be pushed by Peter). >> >> We want to avoid complexity to intel-pstate for non HWP power sensitive >> >> platforms as far as possible. >> >> >> > >> > Unfortunately the schedutil governor (whether frequency invariant or >> > not) has the exact same energy efficiency issues as the present >> > intel_pstate non-HWP governor. Its response is severely underdamped >> > leading to energy-inefficient behavior for any oscillating non-CPU-bound >> > workload. To exacerbate that problem the frequency is maxed out on >> > frequent IO waiting just like the current intel_pstate cpu-load >> >> "just like" here is possibly somewhat unfair to the schedutil governor, >> admittedly its progressive IOWAIT boosting behavior seems somewhat less >> wasteful than the intel_pstate non-HWP governor's IOWAIT boosting >> behavior, but it's still largely unhelpful on IO-bound conditions. > > Sorry if I jump in out of the blue, but what you are trying to solve > looks very similar to what IPA [1] is targeting as well. I might be > wrong (I'll try to spend more time reviewing your set), but my first > impression is that we should try to solve similar problems with a more > general approach that could benefit different sys/archs. > Thanks, seems interesting, I've also been taking a look at your whitepaper and source code. The problem we've both been trying to solve is indeed closely related, there may be an opportunity for sharing efforts both ways. Correct me if I didn't understand the whole details about your power allocation code, but IPA seems to be dividing up the available power budget proportionally to the power requested by the different actors (up to the point that causes some actor to reach its maximum power) and configured weights. From my understanding of the get_requested_power implementations for cpufreq and devfreq, the requested power attempts to approximate the current power usage of each device (whether it's estimated from the current frequency and a capacitance model, from the get_real_power callback, or other mechanism), which can be far from the optimal power consumption in cases where the device's governor is programming a frequency that wildly deviates from the optimal one (as is the case with the current intel_pstate governor for any IO-bound workload, which incidentally will suffer the greatest penalty from a suboptimal power allocation in cases where the IO device is actually an integrated GPU). Is there any mechanism in place to prevent the system from stabilizing at a power allocation that prevents it from achieving maximum throughput? E.g. in a TDP-limited system with two devices consuming a total power of Pmax = P0(f0) + P1(f1), with f0 much greater than the optimal, and f1 capped at a frequency lower than the optimal due to TDP or thermal constraints, and assuming that the system is bottlenecking at the second device. In such a scenario wouldn't IPA distribute power in a way that roughly approximates the pre-existing suboptimal distribution? If that's the case, I understand that it's the responsibility of the device's (or CPU's) frequency governor to request a frequency which is reasonably energy-efficient in the first place for the balancer to function correctly? (That's precisely the goal of this series) -- Which in addition allows the system to use less power to get the same work done in cases where the system is not thermally or TDP-limited as a whole, so the balancing logic wouldn't have any effect at all. > I'm Cc-ing some Arm folks... > > Best, > > - Juri > > [1] https://developer.arm.com/open-source/intelligent-power-allocation