All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Lionel Landwerlin <lionel.g.landwerlin@intel.com>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Intel-gfx@lists.freedesktop.org,
	Tvrtko Ursulin <tursulin@ursulin.net>
Subject: Re: [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active
Date: Fri, 7 Sep 2018 10:39:46 +0100	[thread overview]
Message-ID: <6c567d44-6edd-0bc4-ccd7-dc678b6e5006@linux.intel.com> (raw)
In-Reply-To: <c6c75872-a2da-b9f0-b88b-3bb1bc05fb54@intel.com>


On 07/09/2018 10:23, Lionel Landwerlin wrote:
> On 07/09/2018 09:26, Tvrtko Ursulin wrote:
>>
>> On 06/09/2018 11:36, Lionel Landwerlin wrote:
>>> On 06/09/2018 11:22, Chris Wilson wrote:
>>>> Quoting Lionel Landwerlin (2018-09-06 11:18:01)
>>>>> On 06/09/2018 11:10, Chris Wilson wrote:
>>>>>> Quoting Lionel Landwerlin (2018-09-06 10:57:47)
>>>>>>> On 05/09/2018 15:22, Tvrtko Ursulin wrote:
>>>>>>>> From: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>>
>>>>>>>> If some of the contexts submitting workloads to the GPU have been
>>>>>>>> configured to shutdown slices/subslices, we might loose the NOA
>>>>>>>> configurations written in the NOA muxes.
>>>>>>>>
>>>>>>>> One possible solution to this problem is to reprogram the NOA muxes
>>>>>>>> when we switch to a new context. We initially tried this in the
>>>>>>>> workaround batchbuffer but some concerns where raised about the 
>>>>>>>> cost
>>>>>>>> of reprogramming at every context switch. This solution is also not
>>>>>>>> without consequences from the userspace point of view. 
>>>>>>>> Reprogramming
>>>>>>>> of the muxes can only happen once the powergating configuration has
>>>>>>>> changed (which happens after context switch). This means for a 
>>>>>>>> window
>>>>>>>> of time during the recording, counters recorded by the OA unit 
>>>>>>>> might
>>>>>>>> be invalid. This requires userspace dealing with OA reports to 
>>>>>>>> discard
>>>>>>>> the invalid values.
>>>>>>>>
>>>>>>>> Minimizing the reprogramming could be implemented by tracking of 
>>>>>>>> the
>>>>>>>> last programmed configuration somewhere in GGTT and use 
>>>>>>>> MI_PREDICATE
>>>>>>>> to discard some of the programming commands, but the command 
>>>>>>>> streamer
>>>>>>>> would still have to parse all the MI_LRI instructions in the
>>>>>>>> workaround batchbuffer.
>>>>>>>>
>>>>>>>> Another solution, which this change implements, is to simply 
>>>>>>>> disregard
>>>>>>>> the user requested configuration for the period of time when 
>>>>>>>> i915/perf
>>>>>>>> is active. There is no known issue with this apart from a 
>>>>>>>> performance
>>>>>>>> penality for some media workloads that benefit from running on a
>>>>>>>> partially powergated GPU. We already prevent RC6 from affecting the
>>>>>>>> programming so it doesn't sound completely unreasonable to hold on
>>>>>>>> powergating for the same reason.
>>>>>>>>
>>>>>>>> v2: Leave RPCS programming in intel_lrc.c (Lionel)
>>>>>>>>
>>>>>>>> v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel)
>>>>>>>>        More to_intel_context() (Tvrtko)
>>>>>>>>        s/dev_priv/i915/ (Tvrtko)
>>>>>>>>
>>>>>>>> Tvrtko Ursulin:
>>>>>>>>
>>>>>>>> v4:
>>>>>>>>     * Rebase for make_rpcs changes.
>>>>>>>>
>>>>>>>> v5:
>>>>>>>>     * Apply OA restriction from make_rpcs directly.
>>>>>>>>
>>>>>>>> v6:
>>>>>>>>     * Rebase for context image setup changes.
>>>>>>>>
>>>>>>>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/i915/i915_perf.c |  5 +++++
>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 30 
>>>>>>>> ++++++++++++++++++++----------
>>>>>>>>     drivers/gpu/drm/i915/intel_lrc.h |  3 +++
>>>>>>>>     3 files changed, 28 insertions(+), 10 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_perf.c 
>>>>>>>> b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> index ccb20230df2c..dd65b72bddd4 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_perf.c
>>>>>>>> @@ -1677,6 +1677,11 @@ static void 
>>>>>>>> gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>>>>>>>>                 CTX_REG(reg_state, state_offset, flex_regs[i], 
>>>>>>>> value);
>>>>>>>>         }
>>>>>>>> +
>>>>>>>> +     CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
>>>>>>>> +             gen8_make_rpcs(dev_priv,
>>>>>>>> + &to_intel_context(ctx,
>>>>>>>> + dev_priv->engine[RCS])->sseu));
>>>>>>> I think there is one issue I missed on the previous iterations of 
>>>>>>> this
>>>>>>> patch.
>>>>>>>
>>>>>>> This gen8_update_reg_state_unlocked() is called when the GPU is 
>>>>>>> parked
>>>>>>> on the kernel context.
>>>>>>>
>>>>>>> It's supposed to update all contexts, but I think we might not be 
>>>>>>> able
>>>>>>> to update the kernel context image while the GPU is using it.
>>>>>> The kernel context is only ever taken in extremis (you are either
>>>>>> parking or stalling userspace) so I don't care.
>>>>>
>>>>> The patch exposing the RPCS configuration to userspace will make 
>>>>> use of
>>>>> the kernel context while OA/perf is enabled. Even if it reprograms the
>>>>> locked value that will break the power configuration stability on 
>>>>> Gen11
>>>>> (because the locked configuration will be different from the kernel
>>>>> context configuration).
>>>> Sure, but as you point out that's only on changing configuration.
>>>>
>>>> What's missing in the patch is that we only bail early if the new sseu
>>>> matches the ce->sseu, but that doesn't necessarily match whats in the
>>>> context due to OA. (Or maybe I missed the conversion to rpcs value and
>>>> checking.)
>>>> -Chris
>>>>
>>>
>>> Yep, because the gen8_make_rpcs() post processes the values store at 
>>> the gem context level, we risk rerunning the kernel context to write 
>>> the exiting value.
>>> Sorry this is all so messy :(
>>
>> Lets see if I managed to follow here.
>>
>> The current code indeed bails out at the set ctx param level if the 
>> requested state matches the ce->state. My thinking was that ce->state 
>> is the master state and whatever happens in "post processing" via 
>> gen8_make_rpcs should be hidden from it since the design is that the 
>> i915_perf.c will re-configure all contexts when the OA active status 
>> changes (to either direction).
>>
>> So I don't see a problem in those two interactions.
> 
> 
> Let's say you have contextA with sseu(slice,subslice)=(0x1/0xff) for ICL.
> 
> You then enable OA which locks the configuration at (0x1,0xf).
> 
> The kernel context has retained its (0x1/0xff) configuration.
> 
> 
> And after you change the config of contextA to (0x1,0x7).
> 
> 
> This would lead to the kernel context scheduled with (0x1,0xff) while OA 
> is active.

Okay that's a problem discussed in the paragraph below - that the kernel 
context is not updated at all. But is it a problem for OA? Will it mess 
up some counters even if kernel context isn't executing anything 
interacting with them? Or is it?

> 
>>
>> Apart from one, get_param_sseu will lie a bit - we can discuss about 
>> this one more. At one point I suggested we have two sets of masks in 
>> the uAPI, requested and active in a way. So userspace could query what 
>> it set and what is actually active.
>>
>> Now second issue is if i915_perf.c is able to reprogram the kernel 
>> config.
>>
>> Here its true, it will write to the context image and that will get 
>> overwritten by context save.
>>
>> If that is a problem for OA, I was initially if a throw-away second 
>> "kernel" context could be use to re-program the real one, but perhaps 
>> even simpler - what about a mmio write to program the RPCS while 
>> kernel context is active?
> 
> 
> Documentation says : "This register must not be programmed directly 
> through CPU MMIO cycle."
> 
> 
> Sorry :(

Ugh.. okay, help me understand if kernel context absolutely needs to 
follow the "lock" for OA to work and then we'll see what to do.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-09-07  9:40 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-05 14:22 [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 1/7] drm/i915/execlists: Move RPCS setup to context pin Tvrtko Ursulin
2018-09-05 15:14   ` Chris Wilson
2018-09-05 14:22 ` [PATCH 2/7] drm/i915: Program RPCS for Broadwell Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 3/7] drm/i915: Record the sseu configuration per-context & engine Tvrtko Ursulin
2018-09-05 15:18   ` Chris Wilson
2018-09-06  9:36     ` Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 4/7] drm/i915/perf: lock powergating configuration to default when active Tvrtko Ursulin
2018-09-05 15:21   ` Chris Wilson
2018-09-06  9:41     ` Tvrtko Ursulin
2018-09-06  9:57   ` Lionel Landwerlin
2018-09-06 10:10     ` Chris Wilson
2018-09-06 10:18       ` Lionel Landwerlin
2018-09-06 10:22         ` Chris Wilson
2018-09-06 10:36           ` Lionel Landwerlin
2018-09-07  8:26             ` Tvrtko Ursulin
2018-09-07  8:59               ` Chris Wilson
2018-09-07  9:23               ` Lionel Landwerlin
2018-09-07  9:39                 ` Tvrtko Ursulin [this message]
2018-09-07  9:55                   ` Lionel Landwerlin
2018-09-10 13:44                     ` Tvrtko Ursulin
2018-09-11 20:11                       ` Lionel Landwerlin
2018-09-12  8:03                         ` Tvrtko Ursulin
2018-09-05 14:22 ` [PATCH 5/7] drm/i915: Add timeline barrier support Tvrtko Ursulin
2018-09-05 15:23   ` Chris Wilson
2018-09-05 14:22 ` [PATCH 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace Tvrtko Ursulin
2018-09-05 15:29   ` Chris Wilson
2018-09-06  9:50     ` Tvrtko Ursulin
2018-09-06  9:54       ` Chris Wilson
2018-09-06  9:58       ` Lionel Landwerlin
2018-09-05 14:22 ` [PATCH 7/7] drm/i915/icl: Support co-existance between per-context SSEU and OA Tvrtko Ursulin
2018-09-05 14:46 ` ✗ Fi.CI.CHECKPATCH: warning for Per context dynamic (sub)slice power-gating (rev2) Patchwork
2018-09-05 14:49 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-09-05 15:05 ` ✓ Fi.CI.BAT: success " Patchwork
2018-09-05 19:55 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-09-06 19:33 ` [PATCH v11 0/7] Per context dynamic (sub)slice power-gating Chris Wilson
2018-09-06 19:52   ` Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c567d44-6edd-0bc4-ccd7-dc678b6e5006@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=Intel-gfx@lists.freedesktop.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=lionel.g.landwerlin@intel.com \
    --cc=tursulin@ursulin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.