All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] drm/i915: per context slice/subslice powergating
@ 2018-04-25 11:45 Lionel Landwerlin
  2018-04-25 11:45 ` [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
                   ` (11 more replies)
  0 siblings, 12 replies; 50+ messages in thread
From: Lionel Landwerlin @ 2018-04-25 11:45 UTC (permalink / raw)
  To: intel-gfx

Hi all,

This is an update a series that was sent out a few months ago. The end
goal here is to optimize some media workloads.

Here is some information provided by Dmitry (cc) on why we want this :

Video decoding/encoding tends to work with macroblocks, dividing up a
frame into smaller elements. Dependencies exist between those
macroblocks, meaning that they cannot be processed in a random order
and also there is a maximum number of macroblock that can process at a
given time (called wave front).

As a result, some workloads (below a certain resolution) will not make
use of all the GPU's execution units. On a SKLGT4 (3 slices), for a
transcoding workload at a 720x480p, we were able to measure a low
number of active EUs (~3%) with 3 slices enabled. As we reduce the
number of slices used to 1, the percentage of active EUs obviously
increases (~9%). The execution time of the workload also decreases as
we decrease the number of slices used (we measure an up to ~20%
improvement with 1 slice).

It's not clear what speeds up the workload. We currently think that
the power budget is redistributed to other parts (including the CPU)
and that the GPU thread scheduling is also sped up because it doesn't
involve as many slices. We haven't found a way to measure these
assumptions.

Changing the powergating configuration doesn't come free though. We
have some numbers in an IGT benchmark on how much delay is added each
time we switch between 2 contexts of different powergating
configurations. Measurements are in the order of ~50us on SKLGT4 (3
slices) and ~40us on KBLGT3 (2 slices).

Cheers,

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (5):
  drm/i915: expose helper mapping exec flag engine to intel_engine_cs
  drm/i915: don't specify pinned size for wa_bb pin/allocation
  drm/i915: extract per-ctx/indirect bb programming
  drm/i915: pass wa_ctx as argument
  drm/i915: reprogram NOA muxes on context switch when using perf

 drivers/gpu/drm/i915/i915_drv.h            |   5 +
 drivers/gpu/drm/i915/i915_gem_context.c    | 104 +++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h    |  10 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  18 +-
 drivers/gpu/drm/i915/i915_perf.c           |  92 +++++++-
 drivers/gpu/drm/i915/intel_lrc.c           | 231 ++++++++++++++++-----
 drivers/gpu/drm/i915/intel_lrc.h           |   5 +
 include/uapi/drm/i915_drm.h                |  28 +++
 8 files changed, 419 insertions(+), 74 deletions(-)

--
2.17.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread
* [PATCH v10 0/8] Per context dynamic (sub)slice power-gating
@ 2018-08-14 14:40 Tvrtko Ursulin
  2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
  0 siblings, 1 reply; 50+ messages in thread
From: Tvrtko Ursulin @ 2018-08-14 14:40 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Updated series after continuing Lionel's work.

Userspace for the feature is the media-driver project on GitHub. Please see
https://github.com/intel/media-driver/pull/271/commits.

Headline changes:

 1.

  No more master allow/disallow sysfs switch. Feature is unconditionally
  enabled for Gen11 and on other platforms it requires CAP_SYS_ADMIN.

  *** To be discussed if this is a good idea or not. ***

 2.

  Two new patches due a) breaking out the global barrier, and b) fixing one
  GEM_BUG_ON regarding incorrent kernel context classification by i915_is_ggtt.


Otherwise please see individial patch change logs.

Main topic for the cover letter though is addressing the question of dynamic
slice re-configuration performance impact.

Introduction into this problem space is that changing the (sub)slice
configuration has a cost at context switch time in the order of tens of milli-
seconds. (It varies per Gen and with different slice count transitions.)

So the question is whether a malicious unprivileged workload can negatively
impact other clients. To try and answer this question I have extended gem_wsim
and creating some test workloads. (Note that my testing was done on a Gen9
system. Overall message could be the same on Gen11 but needs to be verified.)

First test was a simulated video playback client running in parallel with a
simulated game of both medium and high complexity (uses around 60% or 90% of the
render engine respectively, and 7% of the blitter engine). I had two flavours of
the playback client, one which runs normally and one which requests reduced
slice configuration. Both workloads are targetting to run at 60fps.

Second test is the same but against a heavier simulated game workload, the one
which uses around 90% of the render engine.

Results are achieved frames per second as observed from the game client:

                     No player  Normal player   SSEU enabled player
        Medium game     59.6        59.6               59.6
         Heavy game     59.7        58.4               58.1

Here we can see that the medium workload was not affected either by the normal
or SSEU player, while the heavy workload did see a performance hit. Both with
the video player running in parallel, and slighlty larger when the player was
SSEU enabled.

Second test is running a malicious client (or clients) in parallel to the same
simulated game workloads. These clients try to trigger many context switches by
using multiple contexts with dependencies set up so request coalescing is
defeated as much as possible.

I tested both with normal and SSEU enabled malicious clients:

                     DoS client   SSEU DoS client
        Medium game     59.5           59.6
         Heavy game     57.8           55.4

For here we can see a similar picture as with the first test. Medium game client
is not affected by either DoS client, while the heavy game client is, more so
with the SSEU enabled attacker.

From both tests I think the conclusion is that dynamic SSEU switching does
increase the magnitude of performance loss, especially with over-subscribed
engines, due cost being proportional to context switch frequency.

Likelyhood is that it slightly lowers the utilization level at which this starts
to happen, but does not introduce a completely new vector of attack - that is -
where it was possible to DoS a system from an unprivileged client, it still is.
In both cases (SSEU enabled or not), a malicious client has the option to grind
the system to a halt, albeit it may need fewer submission threads to do so when
it is SSEU enabled.

Chris Wilson (3):
  drm/i915: Program RPCS for Broadwell
  drm/i915: Record the sseu configuration per-context & engine
  drm/i915: Expose RPCS (SSEU) configuration to userspace

Lionel Landwerlin (3):
  drm/i915/perf: simplify configure all context function
  drm/i915/perf: reuse intel_lrc ctx regs macro
  drm/i915/perf: lock powergating configuration to default when active

Tvrtko Ursulin (2):
  drm/i915: Add global barrier support
  drm/i915: Explicitly mark Global GTT address spaces

 drivers/gpu/drm/i915/i915_drv.h         |  56 +++++++
 drivers/gpu/drm/i915/i915_gem.c         |   2 +
 drivers/gpu/drm/i915/i915_gem_context.c | 189 +++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h |   4 +
 drivers/gpu/drm/i915/i915_gem_gtt.c     |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.h     |   5 +-
 drivers/gpu/drm/i915/i915_perf.c        |  68 +++++----
 drivers/gpu/drm/i915/i915_request.c     |  16 ++
 drivers/gpu/drm/i915/i915_request.h     |  10 ++
 drivers/gpu/drm/i915/intel_lrc.c        |  87 ++++++++---
 drivers/gpu/drm/i915/intel_lrc.h        |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 include/uapi/drm/i915_drm.h             |  43 ++++++
 13 files changed, 439 insertions(+), 50 deletions(-)

-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2018-08-15 11:57 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 11:45 [PATCH 0/8] drm/i915: per context slice/subslice powergating Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 1/8] drm/i915: expose helper mapping exec flag engine to intel_engine_cs Lionel Landwerlin
2018-04-25 11:50   ` Chris Wilson
2018-04-30 14:37     ` Lionel Landwerlin
2018-05-01 11:13       ` Chris Wilson
2018-05-03 17:12       ` Tvrtko Ursulin
2018-05-03 17:31         ` Lionel Landwerlin
2018-05-03 18:00           ` Tvrtko Ursulin
2018-05-03 20:09             ` Lionel Landwerlin
2018-05-03 20:15               ` Chris Wilson
2018-04-25 11:45 ` [PATCH 2/8] drm/i915: Program RPCS for Broadwell Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 3/8] drm/i915: don't specify pinned size for wa_bb pin/allocation Lionel Landwerlin
2018-04-25 11:52   ` Chris Wilson
2018-04-25 11:45 ` [PATCH 4/8] drm/i915: extract per-ctx/indirect bb programming Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 5/8] drm/i915: pass wa_ctx as argument Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 6/8] drm/i915: reprogram NOA muxes on context switch when using perf Lionel Landwerlin
2018-04-25 11:57   ` Chris Wilson
2018-04-25 13:23     ` Chris Wilson
2018-04-25 14:35     ` Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 7/8] drm/i915: Record the sseu configuration per-context & engine Lionel Landwerlin
2018-04-25 11:45 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Lionel Landwerlin
2018-04-26 10:00   ` Joonas Lahtinen
2018-04-26 10:22     ` Lionel Landwerlin
2018-05-03 16:04       ` Joonas Lahtinen
2018-05-03 16:14         ` Chris Wilson
2018-05-03 16:25         ` Lionel Landwerlin
2018-05-03 16:30         ` Tvrtko Ursulin
2018-05-03 17:18   ` Tvrtko Ursulin
2018-05-04 16:25     ` Lionel Landwerlin
2018-05-08  4:04       ` Rogozhkin, Dmitry V
2018-05-08  8:24         ` Tvrtko Ursulin
2018-05-08 16:00           ` Rogozhkin, Dmitry V
2018-05-08 20:56     ` Chris Wilson
2018-05-09 15:35       ` Lionel Landwerlin
2018-04-25 12:34 ` ✗ Fi.CI.CHECKPATCH: warning for drm/i915: per context slice/subslice powergating Patchwork
2018-04-25 12:36 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-04-25 12:49 ` ✓ Fi.CI.BAT: success " Patchwork
2018-04-25 15:39 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-08-14 14:40 [PATCH v10 0/8] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
2018-08-14 14:40 ` [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
2018-08-14 14:59   ` Chris Wilson
2018-08-14 15:11     ` Lionel Landwerlin
2018-08-14 15:18       ` Chris Wilson
2018-08-14 16:05         ` Lionel Landwerlin
2018-08-14 16:09           ` Lionel Landwerlin
2018-08-14 18:44     ` Tvrtko Ursulin
2018-08-14 18:53       ` Chris Wilson
2018-08-15  9:12         ` Tvrtko Ursulin
2018-08-14 15:22   ` Chris Wilson
2018-08-15 11:51     ` Tvrtko Ursulin
2018-08-15 11:56       ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.