intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Francisco Jerez <currojerez@riseup.net>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org, linux-pm@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	"Pandruvada, Srinivas" <srinivas.pandruvada@intel.com>,
	chris.p.wilson@intel.com
Subject: Re: [Intel-gfx] [PATCH 02/10] drm/i915: Adjust PM QoS response frequency based on GPU load.
Date: Mon, 16 Mar 2020 13:54:28 -0700	[thread overview]
Message-ID: <875zf481h7.fsf@riseup.net> (raw)
In-Reply-To: <87sgic9008.fsf@riseup.net>


[-- Attachment #1.1.1: Type: text/plain, Size: 7157 bytes --]

Francisco Jerez <currojerez@riseup.net> writes:

> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> writes:
>[...]
>> Some time ago we entertained the idea of GPU "load average", where that 
>> was defined as a count of runnable requests (so batch buffers). How 
>> that, more generic metric, would behave here if used as an input signal 
>> really intrigues me. Sadly I don't have a patch ready to give to you and 
>> ask to please test it.
>>
>> Or maybe the key is count of runnable contexts as opposed to requests, 
>> which would more match the ELSP[1] idea.
>>
>[..]
> This patch takes the rather conservative approach of limiting the
> application of the response frequency PM QoS request to the more
> restrictive set of cases where we are most certain that CPU latency
> shouldn't be an issue, in order to avoid regressions.  But it might be
> that you find the additional energy efficiency benefit from the more
> aggressive approach to be worth the cost to a few execlists submission
> latency-sensitive applications.  I'm trying to get some numbers
> comparing the two approaches now, will post them here once I have
> results so we can make a more informed trade-off.
>

I got some results from the promised comparison between the dual-ELSP
utilization approach used in this series and the more obvious
alternative of keeping track of the time that any request (or context)
is in flight.  As expected there are quite a few performance
improvements (numbers relative to this approach), however most of them
are either synthetic benchmarks or off-screen variants of benchmarks
(the corresponding on-screen variant of each benchmark below doesn't
show a significant improvement):

 synmark/OglCSDof:                                                                      XXX ±0.15% x18 ->   XXX ±0.22% x12          d=1.15% ±0.18%       p=0.00%
 synmark/OglDeferred:                                                                   XXX ±0.31% x18 ->   XXX ±0.15% x12          d=1.16% ±0.26%       p=0.00%
 synmark/OglTexFilterAniso:                                                             XXX ±0.18% x18 ->   XXX ±0.21% x12          d=1.25% ±0.19%       p=0.00%
 synmark/OglPSPhong:                                                                    XXX ±0.43% x18 ->   XXX ±0.29% x12          d=1.28% ±0.38%       p=0.00%
 synmark/OglBatch0:                                                                     XXX ±0.40% x18 ->   XXX ±0.53% x12          d=1.29% ±0.46%       p=0.00%
 synmark/OglVSDiffuse8:                                                                 XXX ±0.49% x17 ->   XXX ±0.25% x12          d=1.30% ±0.41%       p=0.00%
 synmark/OglVSTangent:                                                                  XXX ±0.53% x18 ->   XXX ±0.31% x12          d=1.31% ±0.46%       p=0.00%
 synmark/OglGeomPoint:                                                                  XXX ±0.56% x18 ->   XXX ±0.15% x12          d=1.48% ±0.44%       p=0.00%
 gputest/plot3d:                                                                        XXX ±0.16% x18 ->   XXX ±0.11% x12          d=1.50% ±0.14%       p=0.00%
 gputest/tess_x32:                                                                      XXX ±0.15% x18 ->   XXX ±0.06% x12          d=1.59% ±0.13%       p=0.00%
 synmark/OglTexFilterTri:                                                               XXX ±0.15% x18 ->   XXX ±0.19% x12          d=1.62% ±0.17%       p=0.00%
 synmark/OglBatch3:                                                                     XXX ±0.57% x18 ->   XXX ±0.33% x12          d=1.70% ±0.49%       p=0.00%
 synmark/OglBatch1:                                                                     XXX ±0.41% x18 ->   XXX ±0.34% x12          d=1.81% ±0.38%       p=0.00%
 synmark/OglShMapVsm:                                                                   XXX ±0.53% x18 ->   XXX ±0.38% x12          d=1.81% ±0.48%       p=0.00%
 synmark/OglTexMem128:                                                                  XXX ±0.62% x18 ->   XXX ±0.29% x12          d=1.87% ±0.52%       p=0.00%
 phoronix/x11perf/test=Scrolling 500 x 500 px:                                           XXX ±0.35% x6 ->   XXX ±0.56% x12          d=2.23% ±0.52%       p=0.00%
 phoronix/x11perf/test=500px Copy From Window To Window:                                 XXX ±0.00% x3 ->   XXX ±0.74% x12          d=2.41% ±0.70%       p=0.01%
 gfxbench/gl_trex_off:                                                                   XXX ±0.04% x3 ->   XXX ±0.34% x12          d=2.59% ±0.32%       p=0.00%
 synmark/OglBatch2:                                                                     XXX ±0.85% x18 ->   XXX ±0.21% x12          d=2.87% ±0.67%       p=0.00%
 glbenchmark/GLB27_EgyptHD_inherited_C24Z16_FixedTime_Offscreen:                         XXX ±0.35% x3 ->   XXX ±0.84% x12          d=3.03% ±0.81%       p=0.01%
 glbenchmark/GLB27_TRex_C24Z16_Offscreen:                                                XXX ±0.23% x3 ->   XXX ±0.32% x12          d=3.09% ±0.32%       p=0.00%
 synmark/OglCSCloth:                                                                    XXX ±0.60% x18 ->   XXX ±0.29% x12          d=3.76% ±0.50%       p=0.00%
 phoronix/x11perf/test=Copy 500x500 From Pixmap To Pixmap:                               XXX ±0.44% x3 ->   XXX ±0.70% x12          d=4.31% ±0.69%       p=0.00%

There aren't as many regressions (numbers relative to upstream
linux-next kernel), they're mostly 2D test-cases, however they are
substantially worse in absolute value:

 phoronix/jxrendermark/rendering-test=12pt Text LCD/rendering-size=128x128:              XXX ±0.30% x26 ->  XXX ±5.71% x26        d=-23.15% ±3.11%       p=0.00%
 phoronix/jxrendermark/rendering-test=Linear Gradient Blend/rendering-size=128x128:      XXX ±0.30% x26 ->  XXX ±4.32% x26        d=-21.34% ±2.41%       p=0.00%
 phoronix/x11perf/test=500px Compositing From Pixmap To Window:                         XXX ±15.46% x26 -> XXX ±12.76% x26       d=-19.05% ±13.15%       p=0.00%
 phoronix/jxrendermark/rendering-test=Transformed Blit Bilinear/rendering-size=128x128:  XXX ±0.20% x26 ->  XXX ±3.82% x27         d=-5.07% ±2.57%       p=0.00%
 phoronix/gtkperf/gtk-test=GtkDrawingArea - Pixbufs:                                     XXX ±2.81% x26 ->  XXX ±2.10% x26         d=-3.59% ±2.45%       p=0.00%
 warsow/benchsow:                                                                        XXX ±0.61% x26 ->  XXX ±1.41% x27         d=-2.45% ±1.07%       p=0.00%
 synmark/OglTerrainFlyInst:                                                              XXX ±0.44% x25 ->  XXX ±0.74% x25         d=-1.24% ±0.60%       p=0.00%

There are some things we might be able to do to get some of the
additional improvement we can see above without hurting
latency-sensitive workloads, but it's going to take more effort, the
present approach of using the dual-ELSP utilization seems like a good
compromise to me for starters.

>[...]

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-03-16 20:54 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 21:41 [Intel-gfx] [RFC] GPU-bound energy efficiency improvements for the intel_pstate driver (v2) Francisco Jerez
2020-03-10 21:41 ` [Intel-gfx] [PATCH 01/10] PM: QoS: Add CPU_RESPONSE_FREQUENCY global PM QoS limit Francisco Jerez
2020-03-11 12:42   ` Peter Zijlstra
2020-03-11 19:23     ` Francisco Jerez
2020-03-11 19:23       ` [Intel-gfx] [PATCHv2 " Francisco Jerez
2020-03-19 10:25         ` Rafael J. Wysocki
2020-03-10 21:41 ` [Intel-gfx] [PATCH 02/10] drm/i915: Adjust PM QoS response frequency based on GPU load Francisco Jerez
2020-03-10 22:26   ` Chris Wilson
2020-03-11  0:34     ` Francisco Jerez
2020-03-18 19:42       ` Francisco Jerez
2020-03-20  2:46         ` Francisco Jerez
2020-03-20 10:06           ` Chris Wilson
2020-03-11 10:00     ` Tvrtko Ursulin
2020-03-11 10:21       ` Chris Wilson
2020-03-11 19:54       ` Francisco Jerez
2020-03-12 11:52         ` Tvrtko Ursulin
2020-03-13  7:39           ` Francisco Jerez
2020-03-16 20:54             ` Francisco Jerez [this message]
2020-03-10 21:41 ` [Intel-gfx] [PATCH 03/10] OPTIONAL: drm/i915: Expose PM QoS control parameters via debugfs Francisco Jerez
2020-03-10 21:41 ` [Intel-gfx] [PATCH 04/10] Revert "cpufreq: intel_pstate: Drop ->update_util from pstate_funcs" Francisco Jerez
2020-03-19 10:45   ` Rafael J. Wysocki
2020-03-10 21:41 ` [Intel-gfx] [PATCH 05/10] cpufreq: intel_pstate: Implement VLP controller statistics and status calculation Francisco Jerez
2020-03-19 11:06   ` Rafael J. Wysocki
2020-03-10 21:41 ` [Intel-gfx] [PATCH 06/10] cpufreq: intel_pstate: Implement VLP controller target P-state range estimation Francisco Jerez
2020-03-19 11:12   ` Rafael J. Wysocki
2020-03-10 21:42 ` [Intel-gfx] [PATCH 07/10] cpufreq: intel_pstate: Implement VLP controller for HWP parts Francisco Jerez
2020-03-17 23:59   ` Pandruvada, Srinivas
2020-03-18 19:51     ` Francisco Jerez
2020-03-18 20:10       ` Pandruvada, Srinivas
2020-03-18 20:22         ` Francisco Jerez
2020-03-23 20:13           ` Pandruvada, Srinivas
2020-03-10 21:42 ` [Intel-gfx] [PATCH 08/10] cpufreq: intel_pstate: Enable VLP controller based on ACPI FADT profile and CPUID Francisco Jerez
2020-03-19 11:20   ` Rafael J. Wysocki
2020-03-10 21:42 ` [Intel-gfx] [PATCH 09/10] OPTIONAL: cpufreq: intel_pstate: Add tracing of VLP controller status Francisco Jerez
2020-03-10 21:42 ` [Intel-gfx] [PATCH 10/10] OPTIONAL: cpufreq: intel_pstate: Expose VLP controller parameters via debugfs Francisco Jerez
2020-03-11  2:35 ` [Intel-gfx] [RFC] GPU-bound energy efficiency improvements for the intel_pstate driver (v2) Pandruvada, Srinivas
2020-03-11  3:55   ` Francisco Jerez
2020-03-11  4:25 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for " Patchwork
2020-03-12  2:31 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for GPU-bound energy efficiency improvements for the intel_pstate driver (v2). (rev2) Patchwork
2020-03-12  2:32 ` Patchwork
2020-03-23 23:29 ` [Intel-gfx] [RFC] GPU-bound energy efficiency improvements for the intel_pstate driver (v2) Pandruvada, Srinivas
2020-03-24  0:23   ` Francisco Jerez
2020-03-24 19:16     ` Francisco Jerez
2020-03-24 20:03       ` Pandruvada, Srinivas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875zf481h7.fsf@riseup.net \
    --to=currojerez@riseup.net \
    --cc=chris.p.wilson@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=srinivas.pandruvada@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).