Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Apr 12, 2018 at 11:34:54AM -0700, Francisco Jerez wrote:
>> The reason for the energy efficiency problem of iowait boosting is
>> precisely the greater oscillation between turbo and idle.  Say that
>> iowait boost increases the frequency by a factor alpha relative to the
>> optimal frequency f0 (in terms of energy efficiency) required to execute
>> some IO-bound workload.  This will cause the CPU to be busy for a
>> fraction of the time it was busy originally, approximately t1 = t0 /
>> alpha, which indeed divides the overall energy usage by a factor alpha,
>> but at the same time multiplies the instantaneous power consumption
>> while busy by a factor potentially much greater than alpha, since the
>> CPU's power curve is largely non-linear, and in fact approximately
>> convex within the frequency range allowed by the policy, so you get an
>> average energy usage possibly much greater than the optimal.
>
> Ah, but we don't (yet) have the (normalized) power curves, so we cannot
> make that call.
>
> Once we have the various energy/OPP numbers required for EAS we can
> compute the optimal. I think such was even mentioned in the thread I
> referred earlier.
>
> Until such time; we boost to max for lack of a better option.

Actually assuming that a single geometric feature of the power curve is
known -- it being convex in the frequency range allowed by the policy
(which is almost always the case, not only for Intel CPUs), the optimal
frequency for an IO-bound workload is fully independent of the exact
power curve -- It's just the minimum CPU frequency that's able to keep
the bottlenecking IO device at 100% utilization.  Any frequency higher
than that will lead to strictly lower energy efficiency whatever the
exact form of the power curve is.

I agree though that exact knowledge of the power curve *might* be useful
as a mechanism to estimate the potential costs of exceeding that optimal
frequency (e.g. as a mechanism to offset performance loss heuristically
for the case the workload fluctuates by giving the governor an upward
bias with an approximately known energy cost), but that's not required
for the governor's behavior to be approximately optimal in IO-bound
conditions.  Not making further assumptions about the power curve beyond
its convexity makes the algorithm fairly robust against any inaccuracy
in the power curve numbers (which there will always be, since the energy
efficiency of the workload is really dependent on the behavior of
multiple components of the system interacting with each other), and
makes it easily reusable on platforms where the exact power curves are
not known.