All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
To: Tvrtko Ursulin <tursulin@ursulin.net>, Intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
Date: Thu, 6 Sep 2018 10:57:47 +0100	[thread overview]
Message-ID: <9450ba2c-0f62-88d9-de26-a82dc61cf0f8@intel.com> (raw)
In-Reply-To: <20180905142222.3251-5-tvrtko.ursulin@linux.intel.com>

On 05/09/2018 15:22, Tvrtko Ursulin wrote:
> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>
> If some of the contexts submitting workloads to the GPU have been
> configured to shutdown slices/subslices, we might loose the NOA
> configurations written in the NOA muxes.
>
> One possible solution to this problem is to reprogram the NOA muxes
> when we switch to a new context. We initially tried this in the
> workaround batchbuffer but some concerns where raised about the cost
> of reprogramming at every context switch. This solution is also not
> without consequences from the userspace point of view. Reprogramming
> of the muxes can only happen once the powergating configuration has
> changed (which happens after context switch). This means for a window
> of time during the recording, counters recorded by the OA unit might
> be invalid. This requires userspace dealing with OA reports to discard
> the invalid values.
>
> Minimizing the reprogramming could be implemented by tracking of the
> last programmed configuration somewhere in GGTT and use MI_PREDICATE
> to discard some of the programming commands, but the command streamer
> would still have to parse all the MI_LRI instructions in the
> workaround batchbuffer.
>
> Another solution, which this change implements, is to simply disregard
> the user requested configuration for the period of time when i915/perf
> is active. There is no known issue with this apart from a performance
> penality for some media workloads that benefit from running on a
> partially powergated GPU. We already prevent RC6 from affecting the
> programming so it doesn't sound completely unreasonable to hold on
> powergating for the same reason.
>
> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>
> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>      More to_intel_context() (Tvrtko)
>      s/dev_priv/i915/ (Tvrtko)
>
> Tvrtko Ursulin:
>
> v4:
>   * Rebase for make_rpcs changes.
>
> v5:
>   * Apply OA restriction from make_rpcs directly.
>
> v6:
>   * Rebase for context image setup changes.
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>   drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++----------
>   drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>   3 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index ccb20230df2c..dd65b72bddd4 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1677,6 +1677,11 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   
>   		CTX_REG(reg_state, state_offset, flex_regs[i], value);
>   	}
> +
> +	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> +		gen8_make_rpcs(dev_priv,
> +			       &to_intel_context(ctx,
> +						 dev_priv->engine[RCS])->sseu));


I think there is one issue I missed on the previous iterations of this 
patch.

This gen8_update_reg_state_unlocked() is called when the GPU is parked 
on the kernel context.

It's supposed to update all contexts, but I think we might not be able 
to update the kernel context image while the GPU is using it.

Context save might happen after we edited the image and that would 
override the values we just put in there.


The OA config is emitted through context image edition in this function 
but also through the ring buffer in 
gen8_switch_to_updated_kernel_context() for the kernel context.

Since we can't have a context modify its own RCPS value, we'll have to 
resort to yet another context to do that for the kernel context.


I remember having a patch that created yet another kernel context (let's 
call it rpcs edition context), which is used to reconfigure rpcs for 
every context but itself and then have the kernel context reconfigure 
this rpcs edition context.

Or alternatively not do anything to it, because it's only going to run 
to edit other contexts at a time when we don't care about power 
configuration stability.


-

Lionel


>   }
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 8a477e43dbca..9709c1fbe836 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1305,9 +1305,6 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
>   	return i915_vma_pin(vma, 0, 0, flags);
>   }
>   
> -static u32 make_rpcs(struct drm_i915_private *dev_priv,
> -		     struct intel_sseu *ctx_sseu);
> -
>   static struct intel_context *
>   __execlists_context_pin(struct intel_engine_cs *engine,
>   			struct i915_gem_context *ctx,
> @@ -1350,7 +1347,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>   	/* RPCS */
>   	if (engine->class == RENDER_CLASS) {
>   		ce->lrc_reg_state[CTX_R_PWR_CLK_STATE + 1] =
> -					make_rpcs(engine->i915, &ce->sseu);
> +					gen8_make_rpcs(engine->i915, &ce->sseu);
>   	}
>   
>   	ce->state->obj->pin_global++;
> @@ -2494,15 +2491,28 @@ int logical_xcs_ring_init(struct intel_engine_cs *engine)
>   	return logical_ring_init(engine);
>   }
>   
> -static u32 make_rpcs(struct drm_i915_private *dev_priv,
> -		     struct intel_sseu *ctx_sseu)
> +u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
> +		   struct intel_sseu *req_sseu)
>   {
>   	const struct sseu_dev_info *sseu = &INTEL_INFO(dev_priv)->sseu;
>   	bool subslice_pg = sseu->has_subslice_pg;
> -	u8 slices = hweight8(ctx_sseu->slice_mask);
> -	u8 subslices = hweight8(ctx_sseu->subslice_mask);
> +	struct intel_sseu ctx_sseu;
> +	u8 slices, subslices;
>   	u32 rpcs = 0;
>   
> +	/*
> +	 * If i915/perf is active, we want a stable powergating configuration
> +	 * on the system. The most natural configuration to take in that case
> +	 * is the default (i.e maximum the hardware can do).
> +	 */
> +	if (unlikely(dev_priv->perf.oa.exclusive_stream))
> +		ctx_sseu = intel_device_default_sseu(dev_priv);
> +	else
> +		ctx_sseu = *req_sseu;
> +
> +	slices = hweight8(ctx_sseu.slice_mask);
> +	subslices = hweight8(ctx_sseu.subslice_mask);
> +
>   	/*
>   	 * Since the SScount bitfield in GEN8_R_PWR_CLK_STATE is only three bits
>   	 * wide and Icelake has up to eight subslices, specfial programming is
> @@ -2572,13 +2582,13 @@ static u32 make_rpcs(struct drm_i915_private *dev_priv,
>   	if (sseu->has_eu_pg) {
>   		u32 val;
>   
> -		val = ctx_sseu->min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
> +		val = ctx_sseu.min_eus_per_subslice << GEN8_RPCS_EU_MIN_SHIFT;
>   		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MIN_MASK);
>   		val &= GEN8_RPCS_EU_MIN_MASK;
>   
>   		rpcs |= val;
>   
> -		val = ctx_sseu->max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
> +		val = ctx_sseu.max_eus_per_subslice << GEN8_RPCS_EU_MAX_SHIFT;
>   		GEM_BUG_ON(val & ~GEN8_RPCS_EU_MAX_MASK);
>   		val &= GEN8_RPCS_EU_MAX_MASK;
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f5a5502ecf70..11da6fc0002d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -104,4 +104,7 @@ void intel_lr_context_resume(struct drm_i915_private *dev_priv);
>   
>   void intel_execlists_set_default_submission(struct intel_engine_cs *engine);
>   
> +u32 gen8_make_rpcs(struct drm_i915_private *dev_priv,
> +		   struct intel_sseu *ctx_sseu);
> +
>   #endif /* _INTEL_LRC_H_ */


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2018-09-06  9:57 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin Tvrtko Ursulin
2018-09-05 15:14   ` Chris Wilson
2018-09-05 14:22 ` [PATCH 2/7] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
2018-09-05 15:18   ` Chris Wilson
2018-09-06  9:36     ` Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
2018-09-05 15:21   ` Chris Wilson
2018-09-06  9:41     ` Tvrtko Ursulin
2018-09-06  9:57   ` Lionel Landwerlin [this message]
2018-09-06 10:10     ` Chris Wilson
2018-09-06 10:18       ` Lionel Landwerlin
2018-09-06 10:22         ` Chris Wilson
2018-09-06 10:36           ` Lionel Landwerlin
2018-09-07  8:26             ` Tvrtko Ursulin
2018-09-07  8:59               ` Chris Wilson
2018-09-07  9:23               ` Lionel Landwerlin
2018-09-07  9:39                 ` Tvrtko Ursulin
2018-09-07  9:55                   ` Lionel Landwerlin
2018-09-10 13:44                     ` Tvrtko Ursulin
2018-09-11 20:11                       ` Lionel Landwerlin
2018-09-12  8:03                         ` Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 5/7] drm/i915: Add timeline barrier support Tvrtko Ursulin
2018-09-05 15:23   ` Chris Wilson
2018-09-05 14:22 ` [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
2018-09-05 15:29   ` Chris Wilson
2018-09-06  9:50     ` Tvrtko Ursulin
2018-09-06  9:54       ` Chris Wilson
2018-09-06  9:58       ` Lionel Landwerlin
2018-09-05 14:22 ` [PATCH 7/7] drm/i915/icl: Support co-existance between per-context SSEU and OA Tvrtko Ursulin
2018-09-05 14:46 ` ✗ Fi.CI.CHECKPATCH: warning for Per context dynamic (sub)slice power-gating (rev2) Patchwork
2018-09-05 14:49 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-09-05 15:05 ` ✓ Fi.CI.BAT: success " Patchwork
2018-09-05 19:55 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-09-06 19:33 ` [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Chris Wilson
2018-09-06 19:52   ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9450ba2c-0f62-88d9-de26-a82dc61cf0f8@intel.com \
    --to=lionel.g.landwerlin@intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=tursulin@ursulin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.