intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: maarten.lankhorst@linux.intel.com, matthew.auld@intel.com,
	Matthew Brost <matthew.brost@intel.com>,
	John Harrison <John.C.Harrison@Intel.com>
Subject: Re: [Intel-gfx] [PATCH v6 3/9] drm/i915/gt: Increase suspend timeout
Date: Thu, 23 Sep 2021 15:19:37 +0200	[thread overview]
Message-ID: <199e2c25-8133-360e-4b85-18485522c2be@linux.intel.com> (raw)
In-Reply-To: <061617be-9bf4-7853-a34d-7501f6b3179f@linux.intel.com>


On 9/23/21 2:59 PM, Tvrtko Ursulin wrote:
>
> On 23/09/2021 12:47, Thomas Hellström wrote:
>> Hi, Tvrtko,
>>
>> On 9/23/21 12:13 PM, Tvrtko Ursulin wrote:
>>>
>>> On 22/09/2021 07:25, Thomas Hellström wrote:
>>>> With GuC submission on DG1, the execution of the requests times out
>>>> for the gem_exec_suspend igt test case after executing around 800-900
>>>> of 1000 submitted requests.
>>>>
>>>> Given the time we allow elsewhere for fences to signal (in the 
>>>> order of
>>>> seconds), increase the timeout before we mark the gt wedged and 
>>>> proceed.
>>>
>>> I suspect it is not about requests not retiring in time but about 
>>> the intel_guc_wait_for_idle part of intel_gt_wait_for_idle. Although 
>>> I don't know which G2H message is the code waiting for at suspend 
>>> time so perhaps something to run past the GuC experts.
>>
>> So what's happening here is that the tests submits 1000 requests, 
>> each writing a value to an object, and then that object content is 
>> checked after resume. With GuC it turns out that only 800-900 or so 
>> values are actually written before we time out, and the test 
>> (basic-S3) fails, but not on every run.
>
> Yes and that did not make sense to me. It is a single context even so 
> I did not come up with an explanation why would GuC be slower.
>
> Unless it somehow manages to not even update the ring tail in time and 
> requests are still only stuck in the software queue? Perhaps you can 
> see that from context tail and head when it happens.
>
>> This is a bit interesting in itself, because I never saw the hang-S3 
>> test fail, which from what I can tell basically is an identical test 
>> but with a spinner submitted after the 1000th request. Could be that 
>> the suspend backup code ends up waiting for something before we end 
>> up in intel_gt_wait_for_idle, giving more requests time to execute.
>
> No idea, I don't know the suspend paths that well. For instance before 
> looking at the code I thought we would preempt what's executing and 
> not wait for everything that has been submitted to finish. :)
>
>>> Anyway, if that turns out to be correct then perhaps it would be 
>>> better to split the two timeouts (like if required GuC timeout is 
>>> perhaps fundamentally independent) so it's clear who needs how much 
>>> time. Adding Matt and John to comment.
>>
>> You mean we have separate timeouts depending on whether we're using 
>> GuC or execlists submission?
>
> No, I don't know yet. First I think we need to figure out what exactly 
> is happening.

Well then TBH I will need to file a separate Jira about that. There 
might be various things going on here like swiching between the migrate 
context for eviction of unrelated LMEM buffers and the context used by 
gem_exec_suspend. The gem_exec_suspend failures are blocking DG1 BAT so 
it's pretty urgent to get this series merged. If you insist I can leave 
this patch out for now, but rather I'd commit it as is and File a Jira 
instead.

/Thomas



  reply	other threads:[~2021-09-23 13:19 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22  6:25 [Intel-gfx] [PATCH v6 0/9] drm/i915: Suspend / resume backup- and restore of LMEM Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 1/9] drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 2/9] drm/i915/gem: Implement a function to process all gem objects of a region Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 3/9] drm/i915/gt: Increase suspend timeout Thomas Hellström
2021-09-23  9:18   ` Matthew Auld
2021-09-23 10:13   ` Tvrtko Ursulin
2021-09-23 11:47     ` Thomas Hellström
2021-09-23 12:59       ` Tvrtko Ursulin
2021-09-23 13:19         ` Thomas Hellström [this message]
2021-09-23 14:33           ` Tvrtko Ursulin
2021-09-23 15:43             ` Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 4/9] drm/i915 Implement LMEM backup and restore for suspend / resume Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 5/9] drm/i915/gt: Register the migrate contexts with their engines Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 6/9] drm/i915: Don't back up pinned LMEM context images and rings during suspend Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 7/9] drm/i915: Reduce the number of objects subject to memcpy recover Thomas Hellström
2021-09-23  9:44   ` Matthew Auld
2021-09-23  9:58     ` Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 8/9] HAX: component: do not leave master devres group open after bind Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 9/9] HAX: drm/i915/gem: Fix the __i915_gem_is_lmem() function Thomas Hellström
2021-09-22  7:23 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Suspend / resume backup- and restore of LMEM. (rev9) Patchwork
2021-09-22  7:25 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-09-22  7:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-09-22  9:05 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-09-22 18:06   ` Thomas Hellström
2021-09-23  2:11     ` Vudum, Lakshminarayana
2021-09-23  0:27 ` [Intel-gfx] ✓ Fi.CI.IGT: success " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=199e2c25-8133-360e-4b85-18485522c2be@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=John.C.Harrison@Intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).