All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: maarten.lankhorst@linux.intel.com, matthew.auld@intel.com,
	Matthew Brost <matthew.brost@intel.com>,
	John Harrison <John.C.Harrison@Intel.com>
Subject: Re: [Intel-gfx] [PATCH v6 3/9] drm/i915/gt: Increase suspend timeout
Date: Thu, 23 Sep 2021 13:47:46 +0200	[thread overview]
Message-ID: <0f1050c9-b9fe-b587-2aac-cceae4032638@linux.intel.com> (raw)
In-Reply-To: <f276fe3d-5ed8-7ac9-440d-3703f6f0e5e5@linux.intel.com>

Hi, Tvrtko,

On 9/23/21 12:13 PM, Tvrtko Ursulin wrote:
>
> On 22/09/2021 07:25, Thomas Hellström wrote:
>> With GuC submission on DG1, the execution of the requests times out
>> for the gem_exec_suspend igt test case after executing around 800-900
>> of 1000 submitted requests.
>>
>> Given the time we allow elsewhere for fences to signal (in the order of
>> seconds), increase the timeout before we mark the gt wedged and proceed.
>
> I suspect it is not about requests not retiring in time but about the 
> intel_guc_wait_for_idle part of intel_gt_wait_for_idle. Although I 
> don't know which G2H message is the code waiting for at suspend time 
> so perhaps something to run past the GuC experts.

So what's happening here is that the tests submits 1000 requests, each 
writing a value to an object, and then that object content is checked 
after resume. With GuC it turns out that only 800-900 or so values are 
actually written before we time out, and the test (basic-S3) fails, but 
not on every run.

This is a bit interesting in itself, because I never saw the hang-S3 
test fail, which from what I can tell basically is an identical test but 
with a spinner submitted after the 1000th request. Could be that the 
suspend backup code ends up waiting for something before we end up in 
intel_gt_wait_for_idle, giving more requests time to execute.

>
> Anyway, if that turns out to be correct then perhaps it would be 
> better to split the two timeouts (like if required GuC timeout is 
> perhaps fundamentally independent) so it's clear who needs how much 
> time. Adding Matt and John to comment.

You mean we have separate timeouts depending on whether we're using GuC 
or execlists submission?

>
> To be clear, as timeout is AFAIK an arbitrary value, I don't have 
> fundamental objections here. Just think it would be good to have 
> accurate story in the commit message.

Ok. yes. I wonder whether we actually should increase this timeout even 
more since now the watchdog times out after 10+ seconds? I guess those 
long-running requests could be executing also at suspend time.

/Thomas





>
> Regards,
>
> Tvrtko
>
>>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gt_pm.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
>> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>> index dea8e2479897..f84f2bfe2de0 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
>> @@ -19,6 +19,8 @@
>>   #include "intel_rps.h"
>>   #include "intel_wakeref.h"
>>   +#define I915_GT_SUSPEND_IDLE_TIMEOUT (HZ / 2)
>> +
>>   static void user_forcewake(struct intel_gt *gt, bool suspend)
>>   {
>>       int count = atomic_read(&gt->user_wakeref);
>> @@ -279,7 +281,7 @@ static void wait_for_suspend(struct intel_gt *gt)
>>       if (!intel_gt_pm_is_awake(gt))
>>           return;
>>   -    if (intel_gt_wait_for_idle(gt, I915_GEM_IDLE_TIMEOUT) == 
>> -ETIME) {
>> +    if (intel_gt_wait_for_idle(gt, I915_GT_SUSPEND_IDLE_TIMEOUT) == 
>> -ETIME) {
>>           /*
>>            * Forcibly cancel outstanding work and leave
>>            * the gpu quiet.
>>

  reply	other threads:[~2021-09-23 11:47 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22  6:25 [PATCH v6 0/9] drm/i915: Suspend / resume backup- and restore of LMEM Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] " Thomas Hellström
2021-09-22  6:25 ` [PATCH v6 1/9] drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects Thomas Hellström
2021-09-22  6:25   ` [Intel-gfx] " Thomas Hellström
2021-09-22  6:25 ` [PATCH v6 2/9] drm/i915/gem: Implement a function to process all gem objects of a region Thomas Hellström
2021-09-22  6:25   ` [Intel-gfx] " Thomas Hellström
2021-09-22  6:25 ` [Intel-gfx] [PATCH v6 3/9] drm/i915/gt: Increase suspend timeout Thomas Hellström
2021-09-22  6:25   ` Thomas Hellström
2021-09-23  9:18   ` Matthew Auld
2021-09-23  9:18     ` [Intel-gfx] " Matthew Auld
2021-09-23 10:13   ` Tvrtko Ursulin
2021-09-23 11:47     ` Thomas Hellström [this message]
2021-09-23 12:59       ` Tvrtko Ursulin
2021-09-23 13:19         ` Thomas Hellström
2021-09-23 14:33           ` Tvrtko Ursulin
2021-09-23 15:43             ` Thomas Hellström
2021-09-22  6:25 ` [PATCH v6 4/9] drm/i915 Implement LMEM backup and restore for suspend / resume Thomas Hellström
2021-09-22  6:25   ` [Intel-gfx] " Thomas Hellström
2021-09-22  6:25 ` [PATCH v6 5/9] drm/i915/gt: Register the migrate contexts with their engines Thomas Hellström
2021-09-22  6:25   ` [Intel-gfx] " Thomas Hellström
2021-09-22  6:25 ` [PATCH v6 6/9] drm/i915: Don't back up pinned LMEM context images and rings during suspend Thomas Hellström
2021-09-22  6:25   ` [Intel-gfx] " Thomas Hellström
2021-09-22  6:25 ` [PATCH v6 7/9] drm/i915: Reduce the number of objects subject to memcpy recover Thomas Hellström
2021-09-22  6:25   ` [Intel-gfx] " Thomas Hellström
2021-09-23  9:44   ` Matthew Auld
2021-09-23  9:44     ` [Intel-gfx] " Matthew Auld
2021-09-23  9:58     ` Thomas Hellström
2021-09-23  9:58       ` [Intel-gfx] " Thomas Hellström
2021-09-22  6:25 ` [PATCH v6 8/9] HAX: component: do not leave master devres group open after bind Thomas Hellström
2021-09-22  6:25   ` [Intel-gfx] " Thomas Hellström
2021-09-22  6:25 ` [PATCH v6 9/9] HAX: drm/i915/gem: Fix the __i915_gem_is_lmem() function Thomas Hellström
2021-09-22  6:25   ` [Intel-gfx] " Thomas Hellström
2021-09-22  7:23 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Suspend / resume backup- and restore of LMEM. (rev9) Patchwork
2021-09-22  7:25 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-09-22  7:52 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-09-22  9:05 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2021-09-22 18:06   ` Thomas Hellström
2021-09-23  2:11     ` Vudum, Lakshminarayana
2021-09-23  0:27 ` [Intel-gfx] ✓ Fi.CI.IGT: success " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0f1050c9-b9fe-b587-2aac-cceae4032638@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=John.C.Harrison@Intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.auld@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.