Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Apr 11, 2018 at 09:26:11AM -0700, Francisco Jerez wrote:
>> "just like" here is possibly somewhat unfair to the schedutil governor,
>> admittedly its progressive IOWAIT boosting behavior seems somewhat less
>> wasteful than the intel_pstate non-HWP governor's IOWAIT boosting
>> behavior, but it's still largely unhelpful on IO-bound conditions.
>
> So you understand why we need the iowait boosting right?
>

Yeah, sort of.  The latency-minimizing state of this governor provides a
comparable effect, but it's based on a pessimistic estimate of the
frequency required for the workload to achieve maximum throughput
(rather than a plain or exponential boost up to the max frequency which
can substantially deviate from that frequency, see the explanation in
PATCH 6 for more details).  It's enabled under conditions partially
overlapping but not identical to iowait boosting: The optimization is
not applied under IO-bound conditions (in order to avoid impacting
energy efficiency negatively for zero or negative payoff), OTOH the
optimization is applied in some cases where the current governor
wouldn't, like RT-priority threads (that's the main difference with v2
I'm planning to send out next week).

> It is just that when we get back to runnable, we want to process the
> next data packet ASAP. See also here:
>
>   https://lkml.kernel.org/r/20170522082154.f57cqovterd2qajv@hirez.programming.kicks-ass.net
>
> What I don't really understand is why it is costing so much power; after
> all, when we're in iowait the CPU is mostly idle and can power-gate.

The reason for the energy efficiency problem of iowait boosting is
precisely the greater oscillation between turbo and idle.  Say that
iowait boost increases the frequency by a factor alpha relative to the
optimal frequency f0 (in terms of energy efficiency) required to execute
some IO-bound workload.  This will cause the CPU to be busy for a
fraction of the time it was busy originally, approximately t1 = t0 /
alpha, which indeed divides the overall energy usage by a factor alpha,
but at the same time multiplies the instantaneous power consumption
while busy by a factor potentially much greater than alpha, since the
CPU's power curve is largely non-linear, and in fact approximately
convex within the frequency range allowed by the policy, so you get an
average energy usage possibly much greater than the optimal.