All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrzej Hajda <andrzej.hajda@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	"Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>,
	Andi Shyti <andi.shyti@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org,
	Matthew Auld <matthew.auld@intel.com>,
	chris@chris-wilson.co.uk
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: do not capture error state on exiting context
Date: Tue, 27 Sep 2022 10:16:42 +0200	[thread overview]
Message-ID: <a2a65cef-f8b4-2d1a-0044-9a64492f0420@intel.com> (raw)
In-Reply-To: <c3af2831-d06b-5818-baf2-e88b4d1f6694@linux.intel.com>



On 27.09.2022 09:45, Tvrtko Ursulin wrote:
>
> On 27/09/2022 07:49, Andrzej Hajda wrote:
>>
>>
>> On 27.09.2022 01:34, Ceraolo Spurio, Daniele wrote:
>>>
>>>
>>> On 9/26/2022 3:44 PM, Andi Shyti wrote:
>>>> Hi Andrzej,
>>>>
>>>> On Mon, Sep 26, 2022 at 11:54:09PM +0200, Andrzej Hajda wrote:
>>>>> Capturing error state is time consuming (up to 350ms on DG2), so 
>>>>> it should
>>>>> be avoided if possible. Context reset triggered by context removal 
>>>>> is a
>>>>> good example.
>>>>> With this patch multiple igt tests will not timeout and should run 
>>>>> faster.
>>>>>
>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1551
>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3952
>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5891
>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6268
>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6281
>>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>>> fine for me:
>>>>
>>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>>
>>>> Just to be on the safe side, can we also have the ack from any of
>>>> the GuC folks? Daniele, John?
>>>>
>>>> Andi
>>>>
>>>>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++-
>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> index 22ba66e48a9b01..cb58029208afe1 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>>> @@ -4425,7 +4425,8 @@ static void guc_handle_context_reset(struct 
>>>>> intel_guc *guc,
>>>>>       trace_intel_context_reset(ce);
>>>>>         if (likely(!intel_context_is_banned(ce))) {
>>>>> -        capture_error_state(guc, ce);
>>>>> +        if (!intel_context_is_exiting(ce))
>>>>> +            capture_error_state(guc, ce);
>
> I am not sure here - if we have a persistent context which caused a 
> GPU hang I'd expect we'd still want error capture.
>
> What causes the reset in the affected IGTs? Always preemption timeout?

Affected tests performs always context destroy with bb having 
IGT_SPIN_NO_PREEMPTION, and "preempt_timeout_ms" set to 50.
So I guess yes.

Regards
Andrzej


>
>>>>>           guc_context_replay(ce);
>>>
>>> You definitely don't want to replay requests of a context that is 
>>> going away.
>>
>> My intention was to just avoid error capture, but that's even better, 
>> only condition change:
>> -        if (likely(!intel_context_is_banned(ce))) {
>> +       if (likely(intel_context_is_schedulable(ce)))  {
>
> Yes that helper was intended to be used for contexts which should not 
> be scheduled post exit or ban.
>
> Daniele - you say there are some misses in the GuC backend. Should 
> most, or even all in intel_guc_submission.c be converted to use 
> intel_context_is_schedulable? My idea indeed was that "ban" should be 
> a level up from the backends. Backend should only distinguish between 
> "should I run this or not", and not the reason.
>
> Regards,
>
> Tvrtko
>
>>
>>>
>>> This seems at least in part due to 
>>> https://patchwork.freedesktop.org/patch/487531/, where we replaced 
>>> the "context_ban" with "context_exiting". There are several places 
>>> where we skipped operations if the context was banned (here 
>>> included) which are now not covered anymore for exiting contexts. 
>>> Maybe we need a new checker function to check both flags in places 
>>> where we don't care why the context is being removed (ban vs 
>>> exiting), just that it is?
>>>
>>> Daniele
>>>
>>>>>       } else {
>>>>>           drm_info(&guc_to_gt(guc)->i915->drm,
>>
>> And maybe degrade above to drm_dbg, to avoid spamming dmesg?
>>
>> Regards
>> Andrzej
>>
>>
>>>>> -- 
>>>>> 2.34.1
>>>
>>


  reply	other threads:[~2022-09-27  8:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-26 21:54 [Intel-gfx] [PATCH] drm/i915/guc: do not capture error state on exiting context Andrzej Hajda
2022-09-26 22:44 ` Andi Shyti
2022-09-26 23:34   ` Ceraolo Spurio, Daniele
2022-09-27  6:49     ` Andrzej Hajda
2022-09-27  7:45       ` Tvrtko Ursulin
2022-09-27  8:16         ` Andrzej Hajda [this message]
2022-09-27 21:36         ` Ceraolo Spurio, Daniele
2022-09-28  7:19           ` Tvrtko Ursulin
2022-09-28 18:27             ` John Harrison
2022-09-29  8:22               ` Tvrtko Ursulin
2022-09-29  9:49                 ` Andrzej Hajda
2022-09-29 10:40                   ` Tvrtko Ursulin
2022-09-29 14:28                     ` Ceraolo Spurio, Daniele
2022-09-29 16:49                 ` John Harrison
2022-09-27 10:14     ` Andrzej Hajda
2022-09-27 21:33       ` Ceraolo Spurio, Daniele
2022-09-27  2:07 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-09-27 13:50 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2a65cef-f8b4-2d1a-0044-9a64492f0420@intel.com \
    --to=andrzej.hajda@intel.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniele.ceraolospurio@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.