[PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
@ 2017-08-28 19:25 jeff.mcgee
  2017-08-28 19:41 ` Michel Thierry
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: jeff.mcgee @ 2017-08-28 19:25 UTC (permalink / raw)
  To: intel-gfx

From: Jeff McGee <jeff.mcgee@intel.com>

If someone else is resetting the engine we should clear our own bit as
part of skipping that engine. Otherwise we will later believe that it
has not been reset successfully and then trigger full gpu reset. If the
other guy's reset actually fails, he will trigger the full gpu reset.

Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/i915_irq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 5d391e689070..575d618ccdbf 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2711,8 +2711,10 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-					     &dev_priv->gpu_error.flags))
+					     &dev_priv->gpu_error.flags)) {
+				engine_mask &= ~intel_engine_flag(engine);
 				continue;
+			}
 
 			if (i915_reset_engine(engine, 0) == 0)
 				engine_mask &= ~intel_engine_flag(engine);
-- 
2.11.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-28 19:25 [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere jeff.mcgee
@ 2017-08-28 19:41 ` Michel Thierry
  2017-08-28 19:46   ` Jeff McGee
  2017-08-28 19:44 ` Chris Wilson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Michel Thierry @ 2017-08-28 19:41 UTC (permalink / raw)
  To: jeff.mcgee, intel-gfx

On 28/08/17 12:25, jeff.mcgee@intel.com wrote:
> From: Jeff McGee <jeff.mcgee@intel.com>
> 
> If someone else is resetting the engine we should clear our own bit as
> part of skipping that engine. Otherwise we will later believe that it
> has not been reset successfully and then trigger full gpu reset. If the
> other guy's reset actually fails, he will trigger the full gpu reset.
> 

Did you hit this by manually setting wedged to 'x' ring repeatedly?

> Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_irq.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 5d391e689070..575d618ccdbf 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2711,8 +2711,10 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>   		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
>   			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
>   			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> -					     &dev_priv->gpu_error.flags))
> +					     &dev_priv->gpu_error.flags)) {
> +				engine_mask &= ~intel_engine_flag(engine);
>   				continue;
> +			}
>   
>   			if (i915_reset_engine(engine, 0) == 0)
>   				engine_mask &= ~intel_engine_flag(engine);
> 

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-28 19:25 [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere jeff.mcgee
  2017-08-28 19:41 ` Michel Thierry
@ 2017-08-28 19:44 ` Chris Wilson
  2017-08-28 20:18   ` Jeff McGee
  2017-08-28 19:48 ` ✓ Fi.CI.BAT: success for " Patchwork
  2017-08-28 20:59 ` ✗ Fi.CI.IGT: warning " Patchwork
  3 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-08-28 19:44 UTC (permalink / raw)
  To: jeff.mcgee, intel-gfx

Quoting jeff.mcgee@intel.com (2017-08-28 20:25:30)
> From: Jeff McGee <jeff.mcgee@intel.com>
> 
> If someone else is resetting the engine we should clear our own bit as
> part of skipping that engine. Otherwise we will later believe that it
> has not been reset successfully and then trigger full gpu reset. If the
> other guy's reset actually fails, he will trigger the full gpu reset.

The reason we did continue on to the global reset was to serialise
i915_handle_error() with the other thread. Not a huge issue, but a
reasonable property to keep -- and we definitely want a to explain why
only one reset at a time is important.

bool intel_engine_lock_reset() {
	if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
			      &engine->i915->gpu_error.flags))
		return true;

	intel_engine_wait_for_reset(engine);
	return false; /* somebody else beat us to the reset */
}

void intel_engine_wait_for_reset() {
	while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
		                &engine->i915->gpu_error.flags))
		wait_on_bit(&engine->i915->gpu_error.flags, I915_RESET_ENGINE + engine->id,
		            TASK_UNINTERRUPTIBLE);
}

It can also be used by selftests/intel_hangcheck.c, so let's refactor
before we have 3 copies.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-28 19:41 ` Michel Thierry
@ 2017-08-28 19:46   ` Jeff McGee
  2017-08-29 15:22     ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff McGee @ 2017-08-28 19:46 UTC (permalink / raw)
  To: Michel Thierry; +Cc: intel-gfx

On Mon, Aug 28, 2017 at 12:41:58PM -0700, Michel Thierry wrote:
> On 28/08/17 12:25, jeff.mcgee@intel.com wrote:
> >From: Jeff McGee <jeff.mcgee@intel.com>
> >
> >If someone else is resetting the engine we should clear our own bit as
> >part of skipping that engine. Otherwise we will later believe that it
> >has not been reset successfully and then trigger full gpu reset. If the
> >other guy's reset actually fails, he will trigger the full gpu reset.
> >
> 
> Did you hit this by manually setting wedged to 'x' ring repeatedly?
> 
I haven't actually reproduced it. Have just been looking at the code a
lot to try to develop reset for preemption enforcement. The implementation
will call i915_handle_error from another work item that can run concurrent
with hangcheck.

> >Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_irq.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >index 5d391e689070..575d618ccdbf 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -2711,8 +2711,10 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
> >  		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
> >  			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> >  			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> >-					     &dev_priv->gpu_error.flags))
> >+					     &dev_priv->gpu_error.flags)) {
> >+				engine_mask &= ~intel_engine_flag(engine);
> >  				continue;
> >+			}
> >  			if (i915_reset_engine(engine, 0) == 0)
> >  				engine_mask &= ~intel_engine_flag(engine);
> >
> 
> Reviewed-by: Michel Thierry <michel.thierry@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-28 19:25 [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere jeff.mcgee
  2017-08-28 19:41 ` Michel Thierry
  2017-08-28 19:44 ` Chris Wilson
@ 2017-08-28 19:48 ` Patchwork
  2017-08-28 20:59 ` ✗ Fi.CI.IGT: warning " Patchwork
  3 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2017-08-28 19:48 UTC (permalink / raw)
  To: jeff.mcgee; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
URL   : https://patchwork.freedesktop.org/series/29437/
State : success

== Summary ==

Series 29437v1 drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
https://patchwork.freedesktop.org/api/1.0/series/29437/revisions/1/mbox/

Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-atomic:
                fail       -> PASS       (fi-snb-2600) fdo#100215
Test kms_flip:
        Subgroup basic-flip-vs-modeset:
                skip       -> PASS       (fi-skl-x1585l) fdo#101781

fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#101781 https://bugs.freedesktop.org/show_bug.cgi?id=101781

fi-bdw-5557u     total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:453s
fi-bdw-gvtdvm    total:279  pass:265  dwarn:0   dfail:0   fail:0   skip:14  time:436s
fi-blb-e6850     total:279  pass:224  dwarn:1   dfail:0   fail:0   skip:54  time:359s
fi-bsw-n3050     total:279  pass:243  dwarn:0   dfail:0   fail:0   skip:36  time:552s
fi-bwr-2160      total:279  pass:184  dwarn:0   dfail:0   fail:0   skip:95  time:253s
fi-bxt-j4205     total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:524s
fi-byt-j1900     total:279  pass:254  dwarn:1   dfail:0   fail:0   skip:24  time:520s
fi-byt-n2820     total:279  pass:250  dwarn:1   dfail:0   fail:0   skip:28  time:511s
fi-elk-e7500     total:279  pass:230  dwarn:0   dfail:0   fail:0   skip:49  time:435s
fi-glk-2a        total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:615s
fi-hsw-4770      total:279  pass:261  dwarn:0   dfail:0   fail:2   skip:16  time:443s
fi-hsw-4770r     total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:426s
fi-ilk-650       total:279  pass:229  dwarn:0   dfail:0   fail:0   skip:50  time:420s
fi-ivb-3520m     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:509s
fi-ivb-3770      total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:473s
fi-kbl-7500u     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:477s
fi-kbl-7560u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:598s
fi-kbl-r         total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:596s
fi-pnv-d510      total:279  pass:223  dwarn:1   dfail:0   fail:0   skip:55  time:520s
fi-skl-6260u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:464s
fi-skl-6700k     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:476s
fi-skl-6770hq    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:486s
fi-skl-gvtdvm    total:279  pass:266  dwarn:0   dfail:0   fail:0   skip:13  time:441s
fi-skl-x1585l    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:509s
fi-snb-2520m     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:542s
fi-snb-2600      total:279  pass:249  dwarn:0   dfail:0   fail:1   skip:29  time:406s

ee53909d971df42daac0b870cf7c091f45f1f6b9 drm-tip: 2017y-08m-28d-15h-03m-59s UTC integration manifest
9ccbefee6663 drm/i915: Clear local engine-needs-reset bit if in progress elsewhere

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5513/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-28 19:44 ` Chris Wilson
@ 2017-08-28 20:18   ` Jeff McGee
  2017-08-29  9:07     ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff McGee @ 2017-08-28 20:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote:
> Quoting jeff.mcgee@intel.com (2017-08-28 20:25:30)
> > From: Jeff McGee <jeff.mcgee@intel.com>
> > 
> > If someone else is resetting the engine we should clear our own bit as
> > part of skipping that engine. Otherwise we will later believe that it
> > has not been reset successfully and then trigger full gpu reset. If the
> > other guy's reset actually fails, he will trigger the full gpu reset.
> 
> The reason we did continue on to the global reset was to serialise
> i915_handle_error() with the other thread. Not a huge issue, but a
> reasonable property to keep -- and we definitely want a to explain why
> only one reset at a time is important.
> 
> bool intel_engine_lock_reset() {
> 	if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> 			      &engine->i915->gpu_error.flags))
> 		return true;
> 
> 	intel_engine_wait_for_reset(engine);
The current code doesn't wait for the other thread to finish the reset, but
this would add that wait. Did you intend that as an additional change to
the current code? I don't think it is necessary. Each thread wants to
reset some subset of engines, so it seems the thread can safely exit as soon
as it knows each of those engines has been reset or is being reset as part
of another thread that got the lock first. If any of the threads fail to
reset an engine they "own", then full gpu reset is assured.
-Jeff

> 	return false; /* somebody else beat us to the reset */
> }
> 
> void intel_engine_wait_for_reset() {
> 	while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> 		                &engine->i915->gpu_error.flags))
> 		wait_on_bit(&engine->i915->gpu_error.flags, I915_RESET_ENGINE + engine->id,
> 		            TASK_UNINTERRUPTIBLE);
> }
> 
> It can also be used by selftests/intel_hangcheck.c, so let's refactor
> before we have 3 copies.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ✗ Fi.CI.IGT: warning for drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-28 19:25 [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere jeff.mcgee
                   ` (2 preceding siblings ...)
  2017-08-28 19:48 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-08-28 20:59 ` Patchwork
  3 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2017-08-28 20:59 UTC (permalink / raw)
  To: Jeff McGee; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
URL   : https://patchwork.freedesktop.org/series/29437/
State : warning

== Summary ==

Test kms_flip:
        Subgroup basic-flip-vs-modeset:
                pass       -> DMESG-WARN (shard-hsw)
        Subgroup plain-flip-ts-check:
                fail       -> PASS       (shard-hsw)

shard-hsw        total:2230 pass:1230 dwarn:1   dfail:0   fail:17  skip:982 time:9656s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5513/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-28 20:18   ` Jeff McGee
@ 2017-08-29  9:07     ` Chris Wilson
  2017-08-29 15:04       ` Jeff McGee
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-08-29  9:07 UTC (permalink / raw)
  To: Jeff McGee; +Cc: intel-gfx

Quoting Jeff McGee (2017-08-28 21:18:44)
> On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote:
> > Quoting jeff.mcgee@intel.com (2017-08-28 20:25:30)
> > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > 
> > > If someone else is resetting the engine we should clear our own bit as
> > > part of skipping that engine. Otherwise we will later believe that it
> > > has not been reset successfully and then trigger full gpu reset. If the
> > > other guy's reset actually fails, he will trigger the full gpu reset.
> > 
> > The reason we did continue on to the global reset was to serialise
> > i915_handle_error() with the other thread. Not a huge issue, but a
> > reasonable property to keep -- and we definitely want a to explain why
> > only one reset at a time is important.
> > 
> > bool intel_engine_lock_reset() {
> >       if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> >                             &engine->i915->gpu_error.flags))
> >               return true;
> > 
> >       intel_engine_wait_for_reset(engine);
> The current code doesn't wait for the other thread to finish the reset, but
> this would add that wait. 

Pardon? If we can't reset the engine, we go to the full reset which is
serialised, both with individual engine resets and other globals.

> Did you intend that as an additional change to
> the current code? I don't think it is necessary. Each thread wants to
> reset some subset of engines, so it seems the thread can safely exit as soon
> as it knows each of those engines has been reset or is being reset as part
> of another thread that got the lock first. If any of the threads fail to
> reset an engine they "own", then full gpu reset is assured.

It's unexpected for this function to return before the reset.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-29  9:07     ` Chris Wilson
@ 2017-08-29 15:04       ` Jeff McGee
  2017-08-29 15:17         ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff McGee @ 2017-08-29 15:04 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Aug 29, 2017 at 10:07:18AM +0100, Chris Wilson wrote:
> Quoting Jeff McGee (2017-08-28 21:18:44)
> > On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote:
> > > Quoting jeff.mcgee@intel.com (2017-08-28 20:25:30)
> > > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > > 
> > > > If someone else is resetting the engine we should clear our own bit as
> > > > part of skipping that engine. Otherwise we will later believe that it
> > > > has not been reset successfully and then trigger full gpu reset. If the
> > > > other guy's reset actually fails, he will trigger the full gpu reset.
> > > 
> > > The reason we did continue on to the global reset was to serialise
> > > i915_handle_error() with the other thread. Not a huge issue, but a
> > > reasonable property to keep -- and we definitely want a to explain why
> > > only one reset at a time is important.
> > > 
> > > bool intel_engine_lock_reset() {
> > >       if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > >                             &engine->i915->gpu_error.flags))
> > >               return true;
> > > 
> > >       intel_engine_wait_for_reset(engine);
> > The current code doesn't wait for the other thread to finish the reset, but
> > this would add that wait. 
> 
> Pardon? If we can't reset the engine, we go to the full reset which is
> serialised, both with individual engine resets and other globals.
> 
> > Did you intend that as an additional change to
> > the current code? I don't think it is necessary. Each thread wants to
> > reset some subset of engines, so it seems the thread can safely exit as soon
> > as it knows each of those engines has been reset or is being reset as part
> > of another thread that got the lock first. If any of the threads fail to
> > reset an engine they "own", then full gpu reset is assured.
> 
> It's unexpected for this function to return before the reset.
> -Chris

I'm a bit confused, so let's go back to the original code that I was trying
to fix:


	/*
	 * Try engine reset when available. We fall back to full reset if
	 * single reset fails.
	 */
	if (intel_has_reset_engine(dev_priv)) {
		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
					     &dev_priv->gpu_error.flags))
				continue;

			if (i915_reset_engine(engine, 0) == 0)
				engine_mask &= ~intel_engine_flag(engine);

			clear_bit(I915_RESET_ENGINE + engine->id,
				  &dev_priv->gpu_error.flags);
			wake_up_bit(&dev_priv->gpu_error.flags,
				    I915_RESET_ENGINE + engine->id);
		}
	}

	if (!engine_mask)
		goto out;

	/* Full reset needs the mutex, stop any other user trying to do so. */

Let's say that 2 threads are here intending to reset render. #1 gets the lock
and starts the render engine-only reset. #2 fails to get the lock which implies
that someone else is in the process of resetting the render engine (with single
engine reset or full gpu reset). #2 continues on without waiting but doesn't
clear the render bit in engine_mask. So #2 will proceed to initiate a full
gpu reset when it may not be necessary. That's the problem I was trying
to address with my initial patch. Do you agree that #2 must clear this bit
to avoid always triggering full gpu reset? If the engine-only reset done by
#1 fails, #1 will do the fallback to full gpu reset, so there is no risk that
we would miss the full gpu reset if it is really needed.

Then there is the question of whether #2 should wait around for the
render engine reset by #1 to complete. It doesn't in current code and I don't
see why it needs to. But that can be a separate discussion from the above.
-Jeff
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-29 15:04       ` Jeff McGee
@ 2017-08-29 15:17         ` Chris Wilson
  2017-08-29 17:01           ` Jeff McGee
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-08-29 15:17 UTC (permalink / raw)
  To: Jeff McGee; +Cc: intel-gfx

Quoting Jeff McGee (2017-08-29 16:04:17)
> On Tue, Aug 29, 2017 at 10:07:18AM +0100, Chris Wilson wrote:
> > Quoting Jeff McGee (2017-08-28 21:18:44)
> > > On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote:
> > > > Quoting jeff.mcgee@intel.com (2017-08-28 20:25:30)
> > > > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > > > 
> > > > > If someone else is resetting the engine we should clear our own bit as
> > > > > part of skipping that engine. Otherwise we will later believe that it
> > > > > has not been reset successfully and then trigger full gpu reset. If the
> > > > > other guy's reset actually fails, he will trigger the full gpu reset.
> > > > 
> > > > The reason we did continue on to the global reset was to serialise
> > > > i915_handle_error() with the other thread. Not a huge issue, but a
> > > > reasonable property to keep -- and we definitely want a to explain why
> > > > only one reset at a time is important.
> > > > 
> > > > bool intel_engine_lock_reset() {
> > > >       if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > > >                             &engine->i915->gpu_error.flags))
> > > >               return true;
> > > > 
> > > >       intel_engine_wait_for_reset(engine);
> > > The current code doesn't wait for the other thread to finish the reset, but
> > > this would add that wait. 
> > 
> > Pardon? If we can't reset the engine, we go to the full reset which is
> > serialised, both with individual engine resets and other globals.
> > 
> > > Did you intend that as an additional change to
> > > the current code? I don't think it is necessary. Each thread wants to
> > > reset some subset of engines, so it seems the thread can safely exit as soon
> > > as it knows each of those engines has been reset or is being reset as part
> > > of another thread that got the lock first. If any of the threads fail to
> > > reset an engine they "own", then full gpu reset is assured.
> > 
> > It's unexpected for this function to return before the reset.
> > -Chris
> 
> I'm a bit confused, so let's go back to the original code that I was trying
> to fix:
> 
> 
>         /*
>          * Try engine reset when available. We fall back to full reset if
>          * single reset fails.
>          */
>         if (intel_has_reset_engine(dev_priv)) {
>                 for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
>                         BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
>                         if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
>                                              &dev_priv->gpu_error.flags))
>                                 continue;
> 
>                         if (i915_reset_engine(engine, 0) == 0)
>                                 engine_mask &= ~intel_engine_flag(engine);
> 
>                         clear_bit(I915_RESET_ENGINE + engine->id,
>                                   &dev_priv->gpu_error.flags);
>                         wake_up_bit(&dev_priv->gpu_error.flags,
>                                     I915_RESET_ENGINE + engine->id);
>                 }
>         }
> 
>         if (!engine_mask)
>                 goto out;
> 
>         /* Full reset needs the mutex, stop any other user trying to do so. */
> 
> Let's say that 2 threads are here intending to reset render. #1 gets the lock
> and starts the render engine-only reset. #2 fails to get the lock which implies
> that someone else is in the process of resetting the render engine (with single
> engine reset or full gpu reset). #2 continues on without waiting but doesn't
> clear the render bit in engine_mask. So #2 will proceed to initiate a full
> gpu reset when it may not be necessary. That's the problem I was trying
> to address with my initial patch. Do you agree that #2 must clear this bit
> to avoid always triggering full gpu reset? If the engine-only reset done by
> #1 fails, #1 will do the fallback to full gpu reset, so there is no risk that
> we would miss the full gpu reset if it is really needed.
> 
> Then there is the question of whether #2 should wait around for the
> render engine reset by #1 to complete. It doesn't in current code and I don't
> see why it needs to. But that can be a separate discussion from the above.

It very much does in the current code. If we can not do the per-engine
reset, it falls back to the global reset. The global reset is serialised
with itself and the per-engine resets. Ergo it waits, and that was
intentional.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-28 19:46   ` Jeff McGee
@ 2017-08-29 15:22     ` Chris Wilson
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2017-08-29 15:22 UTC (permalink / raw)
  To: Jeff McGee, Michel Thierry; +Cc: intel-gfx

Quoting Jeff McGee (2017-08-28 20:46:00)
> On Mon, Aug 28, 2017 at 12:41:58PM -0700, Michel Thierry wrote:
> > On 28/08/17 12:25, jeff.mcgee@intel.com wrote:
> > >From: Jeff McGee <jeff.mcgee@intel.com>
> > >
> > >If someone else is resetting the engine we should clear our own bit as
> > >part of skipping that engine. Otherwise we will later believe that it
> > >has not been reset successfully and then trigger full gpu reset. If the
> > >other guy's reset actually fails, he will trigger the full gpu reset.
> > >
> > 
> > Did you hit this by manually setting wedged to 'x' ring repeatedly?
> > 
> I haven't actually reproduced it. Have just been looking at the code a
> lot to try to develop reset for preemption enforcement. The implementation
> will call i915_handle_error from another work item that can run concurrent
> with hangcheck.

Note to hit it in practice is a nasty bug. The assumption is that between
a pair of resets there was sufficient time for the engine to recover,
and so if we reset too quickly we conclude that the reset/recovery
mechanism is broken.

And if you do start playing with fast resets, you very quickly find that
kthread_park is a livelock waiting to happen.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-29 15:17         ` Chris Wilson
@ 2017-08-29 17:01           ` Jeff McGee
  2017-09-06 21:57             ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff McGee @ 2017-08-29 17:01 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Aug 29, 2017 at 04:17:46PM +0100, Chris Wilson wrote:
> Quoting Jeff McGee (2017-08-29 16:04:17)
> > On Tue, Aug 29, 2017 at 10:07:18AM +0100, Chris Wilson wrote:
> > > Quoting Jeff McGee (2017-08-28 21:18:44)
> > > > On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote:
> > > > > Quoting jeff.mcgee@intel.com (2017-08-28 20:25:30)
> > > > > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > > > > 
> > > > > > If someone else is resetting the engine we should clear our own bit as
> > > > > > part of skipping that engine. Otherwise we will later believe that it
> > > > > > has not been reset successfully and then trigger full gpu reset. If the
> > > > > > other guy's reset actually fails, he will trigger the full gpu reset.
> > > > > 
> > > > > The reason we did continue on to the global reset was to serialise
> > > > > i915_handle_error() with the other thread. Not a huge issue, but a
> > > > > reasonable property to keep -- and we definitely want a to explain why
> > > > > only one reset at a time is important.
> > > > > 
> > > > > bool intel_engine_lock_reset() {
> > > > >       if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > > > >                             &engine->i915->gpu_error.flags))
> > > > >               return true;
> > > > > 
> > > > >       intel_engine_wait_for_reset(engine);
> > > > The current code doesn't wait for the other thread to finish the reset, but
> > > > this would add that wait. 
> > > 
> > > Pardon? If we can't reset the engine, we go to the full reset which is
> > > serialised, both with individual engine resets and other globals.
> > > 
> > > > Did you intend that as an additional change to
> > > > the current code? I don't think it is necessary. Each thread wants to
> > > > reset some subset of engines, so it seems the thread can safely exit as soon
> > > > as it knows each of those engines has been reset or is being reset as part
> > > > of another thread that got the lock first. If any of the threads fail to
> > > > reset an engine they "own", then full gpu reset is assured.
> > > 
> > > It's unexpected for this function to return before the reset.
> > > -Chris
> > 
> > I'm a bit confused, so let's go back to the original code that I was trying
> > to fix:
> > 
> > 
> >         /*
> >          * Try engine reset when available. We fall back to full reset if
> >          * single reset fails.
> >          */
> >         if (intel_has_reset_engine(dev_priv)) {
> >                 for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
> >                         BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> >                         if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> >                                              &dev_priv->gpu_error.flags))
> >                                 continue;
> > 
> >                         if (i915_reset_engine(engine, 0) == 0)
> >                                 engine_mask &= ~intel_engine_flag(engine);
> > 
> >                         clear_bit(I915_RESET_ENGINE + engine->id,
> >                                   &dev_priv->gpu_error.flags);
> >                         wake_up_bit(&dev_priv->gpu_error.flags,
> >                                     I915_RESET_ENGINE + engine->id);
> >                 }
> >         }
> > 
> >         if (!engine_mask)
> >                 goto out;
> > 
> >         /* Full reset needs the mutex, stop any other user trying to do so. */
> > 
> > Let's say that 2 threads are here intending to reset render. #1 gets the lock
> > and starts the render engine-only reset. #2 fails to get the lock which implies
> > that someone else is in the process of resetting the render engine (with single
> > engine reset or full gpu reset). #2 continues on without waiting but doesn't
> > clear the render bit in engine_mask. So #2 will proceed to initiate a full
> > gpu reset when it may not be necessary. That's the problem I was trying
> > to address with my initial patch. Do you agree that #2 must clear this bit
> > to avoid always triggering full gpu reset? If the engine-only reset done by
> > #1 fails, #1 will do the fallback to full gpu reset, so there is no risk that
> > we would miss the full gpu reset if it is really needed.
> > 
> > Then there is the question of whether #2 should wait around for the
> > render engine reset by #1 to complete. It doesn't in current code and I don't
> > see why it needs to. But that can be a separate discussion from the above.
> 
> It very much does in the current code. If we can not do the per-engine
> reset, it falls back to the global reset.

So are you saying that it is by design in this scenario that #2 will resort
to full gpu reset just because it wasn't the thread that actually performed
the engine reset, even though it can clearly infer based on the engine lock
being held that #1 is performing that reset for him?

> The global reset is serialised
> with itself and the per-engine resets. Ergo it waits, and that was
> intentional.
> -Chris

Yes, the wait happens because #2 goes on to start full gpu reset which
requires all engine bits to be grabbed. My contention is that it should not
start full gpu reset.
-Jeff
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-08-29 17:01           ` Jeff McGee
@ 2017-09-06 21:57             ` Chris Wilson
  2017-09-13 14:23               ` Jeff McGee
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2017-09-06 21:57 UTC (permalink / raw)
  To: Jeff McGee; +Cc: intel-gfx

Quoting Jeff McGee (2017-08-29 18:01:47)
> On Tue, Aug 29, 2017 at 04:17:46PM +0100, Chris Wilson wrote:
> > Quoting Jeff McGee (2017-08-29 16:04:17)
> > > On Tue, Aug 29, 2017 at 10:07:18AM +0100, Chris Wilson wrote:
> > > > Quoting Jeff McGee (2017-08-28 21:18:44)
> > > > > On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote:
> > > > > > Quoting jeff.mcgee@intel.com (2017-08-28 20:25:30)
> > > > > > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > > > > > 
> > > > > > > If someone else is resetting the engine we should clear our own bit as
> > > > > > > part of skipping that engine. Otherwise we will later believe that it
> > > > > > > has not been reset successfully and then trigger full gpu reset. If the
> > > > > > > other guy's reset actually fails, he will trigger the full gpu reset.
> > > > > > 
> > > > > > The reason we did continue on to the global reset was to serialise
> > > > > > i915_handle_error() with the other thread. Not a huge issue, but a
> > > > > > reasonable property to keep -- and we definitely want a to explain why
> > > > > > only one reset at a time is important.
> > > > > > 
> > > > > > bool intel_engine_lock_reset() {
> > > > > >       if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > > > > >                             &engine->i915->gpu_error.flags))
> > > > > >               return true;
> > > > > > 
> > > > > >       intel_engine_wait_for_reset(engine);
> > > > > The current code doesn't wait for the other thread to finish the reset, but
> > > > > this would add that wait. 
> > > > 
> > > > Pardon? If we can't reset the engine, we go to the full reset which is
> > > > serialised, both with individual engine resets and other globals.
> > > > 
> > > > > Did you intend that as an additional change to
> > > > > the current code? I don't think it is necessary. Each thread wants to
> > > > > reset some subset of engines, so it seems the thread can safely exit as soon
> > > > > as it knows each of those engines has been reset or is being reset as part
> > > > > of another thread that got the lock first. If any of the threads fail to
> > > > > reset an engine they "own", then full gpu reset is assured.
> > > > 
> > > > It's unexpected for this function to return before the reset.
> > > > -Chris
> > > 
> > > I'm a bit confused, so let's go back to the original code that I was trying
> > > to fix:
> > > 
> > > 
> > >         /*
> > >          * Try engine reset when available. We fall back to full reset if
> > >          * single reset fails.
> > >          */
> > >         if (intel_has_reset_engine(dev_priv)) {
> > >                 for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
> > >                         BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> > >                         if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > >                                              &dev_priv->gpu_error.flags))
> > >                                 continue;
> > > 
> > >                         if (i915_reset_engine(engine, 0) == 0)
> > >                                 engine_mask &= ~intel_engine_flag(engine);
> > > 
> > >                         clear_bit(I915_RESET_ENGINE + engine->id,
> > >                                   &dev_priv->gpu_error.flags);
> > >                         wake_up_bit(&dev_priv->gpu_error.flags,
> > >                                     I915_RESET_ENGINE + engine->id);
> > >                 }
> > >         }
> > > 
> > >         if (!engine_mask)
> > >                 goto out;
> > > 
> > >         /* Full reset needs the mutex, stop any other user trying to do so. */
> > > 
> > > Let's say that 2 threads are here intending to reset render. #1 gets the lock
> > > and starts the render engine-only reset. #2 fails to get the lock which implies
> > > that someone else is in the process of resetting the render engine (with single
> > > engine reset or full gpu reset). #2 continues on without waiting but doesn't
> > > clear the render bit in engine_mask. So #2 will proceed to initiate a full
> > > gpu reset when it may not be necessary. That's the problem I was trying
> > > to address with my initial patch. Do you agree that #2 must clear this bit
> > > to avoid always triggering full gpu reset? If the engine-only reset done by
> > > #1 fails, #1 will do the fallback to full gpu reset, so there is no risk that
> > > we would miss the full gpu reset if it is really needed.
> > > 
> > > Then there is the question of whether #2 should wait around for the
> > > render engine reset by #1 to complete. It doesn't in current code and I don't
> > > see why it needs to. But that can be a separate discussion from the above.
> > 
> > It very much does in the current code. If we can not do the per-engine
> > reset, it falls back to the global reset.
> 
> So are you saying that it is by design in this scenario that #2 will resort
> to full gpu reset just because it wasn't the thread that actually performed
> the engine reset, even though it can clearly infer based on the engine lock
> being held that #1 is performing that reset for him?

Yes, that wait was intentional.
 
> > The global reset is serialised
> > with itself and the per-engine resets. Ergo it waits, and that was
> > intentional.
> > -Chris
> 
> Yes, the wait happens because #2 goes on to start full gpu reset which
> requires all engine bits to be grabbed. My contention is that it should not
> start full gpu reset.

And that I am not disputing. Just that returning before the reset is
complete changes the current contract.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere
  2017-09-06 21:57             ` Chris Wilson
@ 2017-09-13 14:23               ` Jeff McGee
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff McGee @ 2017-09-13 14:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Sep 06, 2017 at 10:57:20PM +0100, Chris Wilson wrote:
> Quoting Jeff McGee (2017-08-29 18:01:47)
> > On Tue, Aug 29, 2017 at 04:17:46PM +0100, Chris Wilson wrote:
> > > Quoting Jeff McGee (2017-08-29 16:04:17)
> > > > On Tue, Aug 29, 2017 at 10:07:18AM +0100, Chris Wilson wrote:
> > > > > Quoting Jeff McGee (2017-08-28 21:18:44)
> > > > > > On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote:
> > > > > > > Quoting jeff.mcgee@intel.com (2017-08-28 20:25:30)
> > > > > > > > From: Jeff McGee <jeff.mcgee@intel.com>
> > > > > > > > 
> > > > > > > > If someone else is resetting the engine we should clear our own bit as
> > > > > > > > part of skipping that engine. Otherwise we will later believe that it
> > > > > > > > has not been reset successfully and then trigger full gpu reset. If the
> > > > > > > > other guy's reset actually fails, he will trigger the full gpu reset.
> > > > > > > 
> > > > > > > The reason we did continue on to the global reset was to serialise
> > > > > > > i915_handle_error() with the other thread. Not a huge issue, but a
> > > > > > > reasonable property to keep -- and we definitely want a to explain why
> > > > > > > only one reset at a time is important.
> > > > > > > 
> > > > > > > bool intel_engine_lock_reset() {
> > > > > > >       if (!test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > > > > > >                             &engine->i915->gpu_error.flags))
> > > > > > >               return true;
> > > > > > > 
> > > > > > >       intel_engine_wait_for_reset(engine);
> > > > > > The current code doesn't wait for the other thread to finish the reset, but
> > > > > > this would add that wait. 
> > > > > 
> > > > > Pardon? If we can't reset the engine, we go to the full reset which is
> > > > > serialised, both with individual engine resets and other globals.
> > > > > 
> > > > > > Did you intend that as an additional change to
> > > > > > the current code? I don't think it is necessary. Each thread wants to
> > > > > > reset some subset of engines, so it seems the thread can safely exit as soon
> > > > > > as it knows each of those engines has been reset or is being reset as part
> > > > > > of another thread that got the lock first. If any of the threads fail to
> > > > > > reset an engine they "own", then full gpu reset is assured.
> > > > > 
> > > > > It's unexpected for this function to return before the reset.
> > > > > -Chris
> > > > 
> > > > I'm a bit confused, so let's go back to the original code that I was trying
> > > > to fix:
> > > > 
> > > > 
> > > >         /*
> > > >          * Try engine reset when available. We fall back to full reset if
> > > >          * single reset fails.
> > > >          */
> > > >         if (intel_has_reset_engine(dev_priv)) {
> > > >                 for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
> > > >                         BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> > > >                         if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > > >                                              &dev_priv->gpu_error.flags))
> > > >                                 continue;
> > > > 
> > > >                         if (i915_reset_engine(engine, 0) == 0)
> > > >                                 engine_mask &= ~intel_engine_flag(engine);
> > > > 
> > > >                         clear_bit(I915_RESET_ENGINE + engine->id,
> > > >                                   &dev_priv->gpu_error.flags);
> > > >                         wake_up_bit(&dev_priv->gpu_error.flags,
> > > >                                     I915_RESET_ENGINE + engine->id);
> > > >                 }
> > > >         }
> > > > 
> > > >         if (!engine_mask)
> > > >                 goto out;
> > > > 
> > > >         /* Full reset needs the mutex, stop any other user trying to do so. */
> > > > 
> > > > Let's say that 2 threads are here intending to reset render. #1 gets the lock
> > > > and starts the render engine-only reset. #2 fails to get the lock which implies
> > > > that someone else is in the process of resetting the render engine (with single
> > > > engine reset or full gpu reset). #2 continues on without waiting but doesn't
> > > > clear the render bit in engine_mask. So #2 will proceed to initiate a full
> > > > gpu reset when it may not be necessary. That's the problem I was trying
> > > > to address with my initial patch. Do you agree that #2 must clear this bit
> > > > to avoid always triggering full gpu reset? If the engine-only reset done by
> > > > #1 fails, #1 will do the fallback to full gpu reset, so there is no risk that
> > > > we would miss the full gpu reset if it is really needed.
> > > > 
> > > > Then there is the question of whether #2 should wait around for the
> > > > render engine reset by #1 to complete. It doesn't in current code and I don't
> > > > see why it needs to. But that can be a separate discussion from the above.
> > > 
> > > It very much does in the current code. If we can not do the per-engine
> > > reset, it falls back to the global reset.
> > 
> > So are you saying that it is by design in this scenario that #2 will resort
> > to full gpu reset just because it wasn't the thread that actually performed
> > the engine reset, even though it can clearly infer based on the engine lock
> > being held that #1 is performing that reset for him?
> 
> Yes, that wait was intentional.
>  

So couldn't we preserve the wait without resorting to full GPU reset to do it?
It is the unnecessary triggering of full GPU reset that concerns me the most,
not that thread #2 has to wait.

I admit it's not a major concern at the moment because the code runs
concurrently only if debugfs wedged is invoked along with hangcheck (if I read
it correctly). I've added another delayed work that can run the code to do
"forced" preemption to clear the way for high-priority requests, and unnecessary
fallback to slow full GPU reset is a bad thing.

> > > The global reset is serialised
> > > with itself and the per-engine resets. Ergo it waits, and that was
> > > intentional.
> > > -Chris
> > 
> > Yes, the wait happens because #2 goes on to start full gpu reset which
> > requires all engine bits to be grabbed. My contention is that it should not
> > start full gpu reset.
> 
> And that I am not disputing. Just that returning before the reset is
> complete changes the current contract.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-09-13 14:26 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28 19:25 [PATCH] drm/i915: Clear local engine-needs-reset bit if in progress elsewhere jeff.mcgee
2017-08-28 19:41 ` Michel Thierry
2017-08-28 19:46   ` Jeff McGee
2017-08-29 15:22     ` Chris Wilson
2017-08-28 19:44 ` Chris Wilson
2017-08-28 20:18   ` Jeff McGee
2017-08-29  9:07     ` Chris Wilson
2017-08-29 15:04       ` Jeff McGee
2017-08-29 15:17         ` Chris Wilson
2017-08-29 17:01           ` Jeff McGee
2017-09-06 21:57             ` Chris Wilson
2017-09-13 14:23               ` Jeff McGee
2017-08-28 19:48 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-08-28 20:59 ` ✗ Fi.CI.IGT: warning " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.