All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ceraolo Spurio, Daniele" <daniele.ceraolospurio@intel.com>
To: Andrzej Hajda <andrzej.hajda@intel.com>,
	Andi Shyti <andi.shyti@linux.intel.com>,
	Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org,
	Matthew Auld <matthew.auld@intel.com>,
	chris@chris-wilson.co.uk
Subject: Re: [Intel-gfx] [PATCH] drm/i915/guc: do not capture error state on exiting context
Date: Tue, 27 Sep 2022 14:33:25 -0700	[thread overview]
Message-ID: <e650c472-8d9b-27b4-c4b6-5759297176c8@intel.com> (raw)
In-Reply-To: <98c9c819-5606-5d93-8008-bcf0acba1898@intel.com>



On 9/27/2022 3:14 AM, Andrzej Hajda wrote:
> On 27.09.2022 01:34, Ceraolo Spurio, Daniele wrote:
>>
>>
>> On 9/26/2022 3:44 PM, Andi Shyti wrote:
>>> Hi Andrzej,
>>>
>>> On Mon, Sep 26, 2022 at 11:54:09PM +0200, Andrzej Hajda wrote:
>>>> Capturing error state is time consuming (up to 350ms on DG2), so it 
>>>> should
>>>> be avoided if possible. Context reset triggered by context removal 
>>>> is a
>>>> good example.
>>>> With this patch multiple igt tests will not timeout and should run 
>>>> faster.
>>>>
>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1551
>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3952
>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5891
>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6268
>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6281
>>>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>>> fine for me:
>>>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>>
>>> Just to be on the safe side, can we also have the ack from any of
>>> the GuC folks? Daniele, John?
>>>
>>> Andi
>>>
>>>
>>>> ---
>>>>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> index 22ba66e48a9b01..cb58029208afe1 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> @@ -4425,7 +4425,8 @@ static void guc_handle_context_reset(struct 
>>>> intel_guc *guc,
>>>>       trace_intel_context_reset(ce);
>>>>       if (likely(!intel_context_is_banned(ce))) {
>>>> -        capture_error_state(guc, ce);
>>>> +        if (!intel_context_is_exiting(ce))
>>>> +            capture_error_state(guc, ce);
>>>>           guc_context_replay(ce);
>>
>> You definitely don't want to replay requests of a context that is 
>> going away.
>
> Without guc_context_replay I see timeouts. Probably because 
> guc_context_replay calls __guc_reset_context. I am not sure if there 
> is need to dig deeper, stay with my initial proposition, or sth like:
>
>     if (likely(!intel_context_is_banned(ce))) {
>         if (!intel_context_is_exiting(ce)) {
>             capture_error_state(guc, ce);
>             guc_context_replay(ce);
>         } else {
>             __guc_reset_context(ce, ce->engine->mask);
>         }
>     } else {
>
> The latter is also working.

This seems to be an issue with the context close path when hangcheck is 
disabled. In that case we don't call the revoke() helper, so we're not 
clearing the context state in the guc backend and therefore we require 
__guc_reset_context() in the reset handler to do so. I'd argue that the 
proper solution would be to ban the context on close in the hangcheck 
disabled scenario and not just rely on the pulse, which btw I'm not sure 
works with GuC submission with a preemptable context because the GUC 
will just schedule the context back in unless we send an H2G to 
explicitly disable it. Not sure why we're not banning right now though, 
so I'd prefer if someone knowledgeable could chime in in case there is a 
good reason for it.

Daniele

>
> Regards
> Andrzej
>
>
>>
>> This seems at least in part due to 
>> https://patchwork.freedesktop.org/patch/487531/, where we replaced 
>> the "context_ban" with "context_exiting". There are several places 
>> where we skipped operations if the context was banned (here included) 
>> which are now not covered anymore for exiting contexts. Maybe we need 
>> a new checker function to check both flags in places where we don't 
>> care why the context is being removed (ban vs exiting), just that it is?
>>
>> Daniele
>>
>>>>       } else {
>>>>           drm_info(&guc_to_gt(guc)->i915->drm,
>>>> -- 
>>>> 2.34.1
>>
>


  reply	other threads:[~2022-09-27 21:33 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-26 21:54 [Intel-gfx] [PATCH] drm/i915/guc: do not capture error state on exiting context Andrzej Hajda
2022-09-26 22:44 ` Andi Shyti
2022-09-26 23:34   ` Ceraolo Spurio, Daniele
2022-09-27  6:49     ` Andrzej Hajda
2022-09-27  7:45       ` Tvrtko Ursulin
2022-09-27  8:16         ` Andrzej Hajda
2022-09-27 21:36         ` Ceraolo Spurio, Daniele
2022-09-28  7:19           ` Tvrtko Ursulin
2022-09-28 18:27             ` John Harrison
2022-09-29  8:22               ` Tvrtko Ursulin
2022-09-29  9:49                 ` Andrzej Hajda
2022-09-29 10:40                   ` Tvrtko Ursulin
2022-09-29 14:28                     ` Ceraolo Spurio, Daniele
2022-09-29 16:49                 ` John Harrison
2022-09-27 10:14     ` Andrzej Hajda
2022-09-27 21:33       ` Ceraolo Spurio, Daniele [this message]
2022-09-27  2:07 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2022-09-27 13:50 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e650c472-8d9b-27b4-c4b6-5759297176c8@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.