Mika Kuoppala writes: > Chris Wilson writes: > >> A recent trend for cpufreq is to boost the CPU frequencies for >> iowaiters, in particularly to benefit high frequency I/O. We do the same >> and boost the GPU clocks to try and minimise time spent waiting for the >> GPU. However, as the igfx and CPU share the same TDP, boosting the CPU >> frequency will result in the GPU being throttled and its frequency being >> reduced. Thus declaring iowait negatively impacts on GPU throughput. >> >> v2: Both sleeps! >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107410 >> References: 52ccc4314293 ("cpufreq: intel_pstate: HWP boost performance on IO wakeup") > > The commit above has it's own heuristics on when to actual ramp up, > inspecting the interval of io waits. > > Regardless of that, with shared tdp, the waiter should not stand in a > way. I've been running some tests with this series (and your previous ones). I still see statistically significant regressions in latency-sensitive benchmarks with this series applied: qgears2/render-backend=XRender Extension/test-mode=Text: XXX ±0.26% x12 -> XXX ±0.36% x15 d=-0.97% ±0.32% p=0.00% lightsmark: XXX ±0.51% x22 -> XXX ±0.49% x20 d=-1.58% ±0.50% p=0.00% gputest/triangle: XXX ±0.67% x10 -> XXX ±1.76% x20 d=-1.73% ±1.47% p=0.52% synmark/OglMultithread:ĝ XXX ±0.47% x10 -> XXX ±1.06% x20 d=-3.59% ±0.88% p=0.00% Numbers above are from a partial benchmark run on BXT J3455 -- I'm still waiting to get the results of a full run though. Worse, in combination with my intel_pstate branch the effect of this patch is strictly negative. There are no improvements because the cpufreq governor is able to figure out by itself that boosting the frequency of the CPU under GPU-bound conditions cannot possibly help (The HWP boost logic could be fixed to do the same thing easily which would allow us to obtain the best of both worlds on big core). The reason for the regressions is that IOWAIT is a useful signal for the cpufreq governor to provide reduced latency in applications that are unable to parallelize enough work between CPU and the IO device -- The upstream governor is just using it rather ineffectively. > And that it fixes a regression: > This patch isn't necessary anymore to fix the regression, there is another change going in that mitigates the problem [1]. Can we please keep the IO schedule calls here? (and elsewhere in the atomic commit code) [1] https://lkml.org/lkml/2018/7/30/880 > Reviewed-by: Mika Kuoppala > > On other way around, the atomic commit code on updating > planes, could potentially benefit of changing to the > io_schedule_timeout. (and/or adopting c state limits) > > -Mika > >> Signed-off-by: Chris Wilson >> Cc: Tvrtko Ursulin >> Cc: Joonas Lahtinen >> Cc: Eero Tamminen >> Cc: Francisco Jerez >> --- >> drivers/gpu/drm/i915/i915_request.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c >> index f3ff8dbe363d..3e48ea87b324 100644 >> --- a/drivers/gpu/drm/i915/i915_request.c >> +++ b/drivers/gpu/drm/i915/i915_request.c >> @@ -1376,7 +1376,7 @@ long i915_request_wait(struct i915_request *rq, >> goto complete; >> } >> >> - timeout = io_schedule_timeout(timeout); >> + timeout = schedule_timeout(timeout); >> } while (1); >> >> GEM_BUG_ON(!intel_wait_has_seqno(&wait)); >> @@ -1414,7 +1414,7 @@ long i915_request_wait(struct i915_request *rq, >> wait.seqno - 1)) >> qos = wait_dma_qos_add(); >> >> - timeout = io_schedule_timeout(timeout); >> + timeout = schedule_timeout(timeout); >> >> if (intel_wait_complete(&wait) && >> intel_wait_check_request(&wait, rq)) >> -- >> 2.18.0