Peter Zijlstra writes: > On Wed, Apr 11, 2018 at 09:26:11AM -0700, Francisco Jerez wrote: >> "just like" here is possibly somewhat unfair to the schedutil governor, >> admittedly its progressive IOWAIT boosting behavior seems somewhat less >> wasteful than the intel_pstate non-HWP governor's IOWAIT boosting >> behavior, but it's still largely unhelpful on IO-bound conditions. > > So you understand why we need the iowait boosting right? > Yeah, sort of. The latency-minimizing state of this governor provides a comparable effect, but it's based on a pessimistic estimate of the frequency required for the workload to achieve maximum throughput (rather than a plain or exponential boost up to the max frequency which can substantially deviate from that frequency, see the explanation in PATCH 6 for more details). It's enabled under conditions partially overlapping but not identical to iowait boosting: The optimization is not applied under IO-bound conditions (in order to avoid impacting energy efficiency negatively for zero or negative payoff), OTOH the optimization is applied in some cases where the current governor wouldn't, like RT-priority threads (that's the main difference with v2 I'm planning to send out next week). > It is just that when we get back to runnable, we want to process the > next data packet ASAP. See also here: > > https://lkml.kernel.org/r/20170522082154.f57cqovterd2qajv@hirez.programming.kicks-ass.net > > What I don't really understand is why it is costing so much power; after > all, when we're in iowait the CPU is mostly idle and can power-gate. The reason for the energy efficiency problem of iowait boosting is precisely the greater oscillation between turbo and idle. Say that iowait boost increases the frequency by a factor alpha relative to the optimal frequency f0 (in terms of energy efficiency) required to execute some IO-bound workload. This will cause the CPU to be busy for a fraction of the time it was busy originally, approximately t1 = t0 / alpha, which indeed divides the overall energy usage by a factor alpha, but at the same time multiplies the instantaneous power consumption while busy by a factor potentially much greater than alpha, since the CPU's power curve is largely non-linear, and in fact approximately convex within the frequency range allowed by the policy, so you get an average energy usage possibly much greater than the optimal.