All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: John Harrison <john.c.harrison@intel.com>
Cc: thomas.hellstrom@linux.intel.com,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 1/3] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL
Date: Wed, 19 Jan 2022 12:47:22 -0800	[thread overview]
Message-ID: <20220119204722.GA32405@jons-linux-dev-box> (raw)
In-Reply-To: <22b0f8c6-fea1-f03c-d91f-253de481287f@intel.com>

On Tue, Jan 18, 2022 at 05:29:54PM -0800, John Harrison wrote:
> On 1/18/2022 13:43, Matthew Brost wrote:
> > Allocate intel_engine_coredump_alloc with ALLOW_FAIL rather than
> > GFP_KERNEL do fully decouple the error capture from fence signalling.
> s/do/to/
> 

Yep.

> > 
> > Fixes: 8b91cdd4f8649 ("drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code")
> Does this really count as a bug fix over that patch? Isn't it more of a
> changing in policy now that we do DRM fence signalling and that other
> changes related to error capture behaviour have been implemented.
>

That patch was supposed to allow signalling annotations to be added,
without this change I think these annotations would be broken. So I
think the Fixes is correct. 
 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 67f3515f07e7a..aee42eae4729f 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1516,7 +1516,7 @@ capture_engine(struct intel_engine_cs *engine,
> >   	struct i915_request *rq = NULL;
> >   	unsigned long flags;
> > -	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
> > +	ee = intel_engine_coredump_alloc(engine, ALLOW_FAIL);
> This still makes me nervous that we will fail to allocate engine captures in
> stress test scenarios, which are exactly the kind of situations where we
> need valid error captures.
> 

Me too, but this whole file has been changed to the ALLOW_FAIL. Thomas
and Daniel seem to think this is correct. For what it's worth this
allocation is less than a page, so it should be pretty safe to do with
ALLOW_FAIL.

> There is also still a GFP_KERNEL in __i915_error_grow(). Doesn't that need
> updating as well?
>

Probably just should be deleted. If look it tries with ALLOW_FAIL first,
then falls back to GFP_KERNEL. I didn't want to make that update in this
series yet but that is something to keep an eye on.

Matt
 
> John.
> 
> >   	if (!ee)
> >   		return NULL;
> 

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Brost <matthew.brost@intel.com>
To: John Harrison <john.c.harrison@intel.com>
Cc: thomas.hellstrom@linux.intel.com,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 1/3] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL
Date: Wed, 19 Jan 2022 12:47:22 -0800	[thread overview]
Message-ID: <20220119204722.GA32405@jons-linux-dev-box> (raw)
In-Reply-To: <22b0f8c6-fea1-f03c-d91f-253de481287f@intel.com>

On Tue, Jan 18, 2022 at 05:29:54PM -0800, John Harrison wrote:
> On 1/18/2022 13:43, Matthew Brost wrote:
> > Allocate intel_engine_coredump_alloc with ALLOW_FAIL rather than
> > GFP_KERNEL do fully decouple the error capture from fence signalling.
> s/do/to/
> 

Yep.

> > 
> > Fixes: 8b91cdd4f8649 ("drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code")
> Does this really count as a bug fix over that patch? Isn't it more of a
> changing in policy now that we do DRM fence signalling and that other
> changes related to error capture behaviour have been implemented.
>

That patch was supposed to allow signalling annotations to be added,
without this change I think these annotations would be broken. So I
think the Fixes is correct. 
 
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 67f3515f07e7a..aee42eae4729f 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1516,7 +1516,7 @@ capture_engine(struct intel_engine_cs *engine,
> >   	struct i915_request *rq = NULL;
> >   	unsigned long flags;
> > -	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
> > +	ee = intel_engine_coredump_alloc(engine, ALLOW_FAIL);
> This still makes me nervous that we will fail to allocate engine captures in
> stress test scenarios, which are exactly the kind of situations where we
> need valid error captures.
> 

Me too, but this whole file has been changed to the ALLOW_FAIL. Thomas
and Daniel seem to think this is correct. For what it's worth this
allocation is less than a page, so it should be pretty safe to do with
ALLOW_FAIL.

> There is also still a GFP_KERNEL in __i915_error_grow(). Doesn't that need
> updating as well?
>

Probably just should be deleted. If look it tries with ALLOW_FAIL first,
then falls back to GFP_KERNEL. I didn't want to make that update in this
series yet but that is something to keep an eye on.

Matt
 
> John.
> 
> >   	if (!ee)
> >   		return NULL;
> 

  reply	other threads:[~2022-01-19 20:53 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-18 21:43 [PATCH 0/3] Flush G2H handler during a GT reset Matthew Brost
2022-01-18 21:43 ` [Intel-gfx] " Matthew Brost
2022-01-18 21:43 ` [PATCH 1/3] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL Matthew Brost
2022-01-18 21:43   ` [Intel-gfx] " Matthew Brost
2022-01-19  1:29   ` John Harrison
2022-01-19  1:29     ` [Intel-gfx] " John Harrison
2022-01-19 20:47     ` Matthew Brost [this message]
2022-01-19 20:47       ` Matthew Brost
2022-01-19 20:56       ` John Harrison
2022-01-19 20:56         ` [Intel-gfx] " John Harrison
2022-01-18 21:43 ` [PATCH 2/3] drm/i915/guc: Add work queue to trigger a GT reset Matthew Brost
2022-01-18 21:43   ` [Intel-gfx] " Matthew Brost
2022-01-19  1:37   ` John Harrison
2022-01-19  1:37     ` [Intel-gfx] " John Harrison
2022-01-19 20:54     ` Matthew Brost
2022-01-19 20:54       ` [Intel-gfx] " Matthew Brost
2022-01-19 21:07       ` John Harrison
2022-01-19 21:07         ` [Intel-gfx] " John Harrison
2022-01-19 21:05         ` Matthew Brost
2022-01-19 21:05           ` [Intel-gfx] " Matthew Brost
2022-01-18 21:43 ` [PATCH 3/3] drm/i915/guc: Flush G2H handler during " Matthew Brost
2022-01-18 21:43   ` [Intel-gfx] " Matthew Brost
2022-01-19  1:38   ` John Harrison
2022-01-19  1:38     ` [Intel-gfx] " John Harrison
2022-01-18 22:01 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Flush G2H handler during a GT reset (rev2) Patchwork
2022-01-18 22:02 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-18 22:32 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-01-19  1:02 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-01-19 21:24 [PATCH 0/3] Flush G2H handler during a GT reset Matthew Brost
2022-01-19 21:24 ` [PATCH 1/3] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL Matthew Brost
2022-01-21  4:31 [PATCH 0/3] Flush G2H handler during a GT reset Matthew Brost
2022-01-21  4:31 ` [PATCH 1/3] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220119204722.GA32405@jons-linux-dev-box \
    --to=matthew.brost@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.