All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Avoid the gpu reset vs. modeset deadlock
@ 2017-07-20 20:23 Daniel Vetter
  2017-07-20 20:47 ` ✗ Fi.CI.BAT: failure for " Patchwork
  2017-07-27 12:13 ` [PATCH] " Tvrtko Ursulin
  0 siblings, 2 replies; 4+ messages in thread
From: Daniel Vetter @ 2017-07-20 20:23 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Mika Kuoppala, Daniel Vetter

... using the biggest hammer we have. This is essentially a weaponized
version of the timeout-based wedging Chris added in

commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 22 11:56:25 2017 +0100

    drm/i915: Break modeset deadlocks on reset

Because defense-in-depth is good it's good to still have both. Also
note that with the locking change we can now restrict this a lot (old
gpus and special testing only), so this doesn't kill the TDR benefits
on at least anything remotely modern.

And futuremore with a few tricks it should be possible to make a much
more educated guess about whether an atomic commit is stuck waiting on
the gpu (atomic_t counting the pending i915_sw_fence used by the
atomic modeset code should do it), so we can improve this.

But for now just start with something that is guaranteed to recover
faster, for much better CI througput.

This defacto reverts TDR on these platforms, but there's not really a
single commit to specify as the sole offender.

v2: Add a debug message to explain what's going on. We can't DRM_ERROR
because that spams CI. And the timeout based fallback still prints a
DRM_ERROR, in case something goes wrong.

Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 02b1f4966049..995522e40ec1 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	    !gpu_reset_clobbers_display(dev_priv))
 		return;
 
+	/* We have a modeset vs reset deadlock, defensively unbreak it.
+	 *
+	 * FIXME: We can do a _lot_ better, this is just a first iteration.*/
+	i915_gem_set_wedged(dev_priv);
+	DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
+
 	/*
 	 * Need mode_config.mutex so that we don't
 	 * trample ongoing ->detect() and whatnot.
-- 
2.13.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Avoid the gpu reset vs. modeset deadlock
  2017-07-20 20:23 [PATCH] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
@ 2017-07-20 20:47 ` Patchwork
  2017-07-27 12:13 ` [PATCH] " Tvrtko Ursulin
  1 sibling, 0 replies; 4+ messages in thread
From: Patchwork @ 2017-07-20 20:47 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Avoid the gpu reset vs. modeset deadlock
URL   : https://patchwork.freedesktop.org/series/27670/
State : failure

== Summary ==

Series 27670v1 drm/i915: Avoid the gpu reset vs. modeset deadlock
https://patchwork.freedesktop.org/api/1.0/series/27670/revisions/1/mbox/

Test gem_exec_fence:
        Subgroup await-hang-default:
                pass       -> FAIL       (fi-blb-e6850)
                pass       -> FAIL       (fi-pnv-d510)
Test gem_ringfill:
        Subgroup basic-default-hang:
                dmesg-warn -> PASS       (fi-blb-e6850) fdo#101600 +1
Test kms_cursor_legacy:
        Subgroup basic-busy-flip-before-cursor-atomic:
                fail       -> PASS       (fi-snb-2600) fdo#100215
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-b:
                dmesg-warn -> PASS       (fi-pnv-d510) fdo#101597
        Subgroup suspend-read-crc-pipe-b:
                pass       -> DMESG-WARN (fi-byt-j1900) fdo#101705 +1

fdo#101600 https://bugs.freedesktop.org/show_bug.cgi?id=101600
fdo#100215 https://bugs.freedesktop.org/show_bug.cgi?id=100215
fdo#101597 https://bugs.freedesktop.org/show_bug.cgi?id=101597
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:448s
fi-bdw-gvtdvm    total:279  pass:265  dwarn:0   dfail:0   fail:0   skip:14  time:431s
fi-blb-e6850     total:279  pass:224  dwarn:0   dfail:0   fail:1   skip:54  time:352s
fi-bsw-n3050     total:279  pass:243  dwarn:0   dfail:0   fail:0   skip:36  time:532s
fi-bxt-j4205     total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:511s
fi-byt-j1900     total:279  pass:254  dwarn:1   dfail:0   fail:0   skip:24  time:497s
fi-byt-n2820     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:485s
fi-glk-2a        total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:605s
fi-hsw-4770      total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:438s
fi-hsw-4770r     total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:413s
fi-ilk-650       total:279  pass:229  dwarn:0   dfail:0   fail:0   skip:50  time:421s
fi-ivb-3520m     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:503s
fi-ivb-3770      total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:480s
fi-kbl-7500u     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:467s
fi-kbl-7560u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:579s
fi-kbl-r         total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:584s
fi-pnv-d510      total:279  pass:223  dwarn:0   dfail:0   fail:1   skip:55  time:562s
fi-skl-6260u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:459s
fi-skl-6700hq    total:279  pass:262  dwarn:0   dfail:0   fail:0   skip:17  time:592s
fi-skl-6700k     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:471s
fi-skl-6770hq    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:477s
fi-skl-gvtdvm    total:279  pass:266  dwarn:0   dfail:0   fail:0   skip:13  time:437s
fi-skl-x1585l    total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:471s
fi-snb-2520m     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:542s
fi-snb-2600      total:279  pass:250  dwarn:0   dfail:0   fail:0   skip:29  time:403s

07fe547642181ced02ac75bbf748cbb7ddd367b9 drm-tip: 2017y-07m-20d-18h-40m-36s UTC integration manifest
0b8dc77 drm/i915: Avoid the gpu reset vs. modeset deadlock

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5257/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Avoid the gpu reset vs. modeset deadlock
  2017-07-20 20:23 [PATCH] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
  2017-07-20 20:47 ` ✗ Fi.CI.BAT: failure for " Patchwork
@ 2017-07-27 12:13 ` Tvrtko Ursulin
  2017-08-03 19:31   ` Michel Thierry
  1 sibling, 1 reply; 4+ messages in thread
From: Tvrtko Ursulin @ 2017-07-27 12:13 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development; +Cc: Daniel Vetter, Mika Kuoppala


On 20/07/2017 21:23, Daniel Vetter wrote:
> ... using the biggest hammer we have. This is essentially a weaponized
> version of the timeout-based wedging Chris added in
> 
> commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Jun 22 11:56:25 2017 +0100
> 
>      drm/i915: Break modeset deadlocks on reset
> 
> Because defense-in-depth is good it's good to still have both. Also
> note that with the locking change we can now restrict this a lot (old
> gpus and special testing only), so this doesn't kill the TDR benefits
> on at least anything remotely modern.
> 
> And futuremore with a few tricks it should be possible to make a much
> more educated guess about whether an atomic commit is stuck waiting on
> the gpu (atomic_t counting the pending i915_sw_fence used by the
> atomic modeset code should do it), so we can improve this.
> 
> But for now just start with something that is guaranteed to recover
> faster, for much better CI througput.
> 
> This defacto reverts TDR on these platforms, but there's not really a
> single commit to specify as the sole offender.
> 
> v2: Add a debug message to explain what's going on. We can't DRM_ERROR
> because that spams CI. And the timeout based fallback still prints a
> DRM_ERROR, in case something goes wrong.
> 
> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
> Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_display.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 02b1f4966049..995522e40ec1 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>   	    !gpu_reset_clobbers_display(dev_priv))
>   		return;
>   
> +	/* We have a modeset vs reset deadlock, defensively unbreak it.
> +	 *
> +	 * FIXME: We can do a _lot_ better, this is just a first iteration.*/
> +	i915_gem_set_wedged(dev_priv);
> +	DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
> +
>   	/*
>   	 * Need mode_config.mutex so that we don't
>   	 * trample ongoing ->detect() and whatnot.
> 

Looks fine to me since it only affects very old platforms, and is a 
temporary measure.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/i915: Avoid the gpu reset vs. modeset deadlock
  2017-07-27 12:13 ` [PATCH] " Tvrtko Ursulin
@ 2017-08-03 19:31   ` Michel Thierry
  0 siblings, 0 replies; 4+ messages in thread
From: Michel Thierry @ 2017-08-03 19:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, Daniel Vetter, Intel Graphics Development
  Cc: Vetter, Daniel, Kuoppala, Mika

On 7/27/2017 5:13 AM, Tvrtko Ursulin wrote:
> 
> On 20/07/2017 21:23, Daniel Vetter wrote:
>> ... using the biggest hammer we have. This is essentially a weaponized
>> version of the timeout-based wedging Chris added in
>>
>> commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>> Date:   Thu Jun 22 11:56:25 2017 +0100
>>
>>       drm/i915: Break modeset deadlocks on reset
>>
>> Because defense-in-depth is good it's good to still have both. Also
>> note that with the locking change we can now restrict this a lot (old
>> gpus and special testing only), so this doesn't kill the TDR benefits
>> on at least anything remotely modern.
>>
>> And futuremore with a few tricks it should be possible to make a much
>> more educated guess about whether an atomic commit is stuck waiting on
>> the gpu (atomic_t counting the pending i915_sw_fence used by the
>> atomic modeset code should do it), so we can improve this.
>>
>> But for now just start with something that is guaranteed to recover
>> faster, for much better CI througput.
>>
>> This defacto reverts TDR on these platforms, but there's not really a
>> single commit to specify as the sole offender.
>>
>> v2: Add a debug message to explain what's going on. We can't DRM_ERROR
>> because that spams CI. And the timeout based fallback still prints a
>> DRM_ERROR, in case something goes wrong.
>>
>> Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
>> Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>> ---
>>    drivers/gpu/drm/i915/intel_display.c | 6 ++++++
>>    1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>> index 02b1f4966049..995522e40ec1 100644
>> --- a/drivers/gpu/drm/i915/intel_display.c
>> +++ b/drivers/gpu/drm/i915/intel_display.c
>> @@ -3471,6 +3471,12 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>>            !gpu_reset_clobbers_display(dev_priv))
>>                return;
>>
>> +     /* We have a modeset vs reset deadlock, defensively unbreak it.
>> +      *
>> +      * FIXME: We can do a _lot_ better, this is just a first iteration.*/

I think the standard is to move '*/' to a new line.

>> +     i915_gem_set_wedged(dev_priv);
>> +     DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
>> +
>>        /*
>>         * Need mode_config.mutex so that we don't
>>         * trample ongoing ->detect() and whatnot.
>>
> 
> Looks fine to me since it only affects very old platforms, and is a
> temporary measure.
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 

Also,

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-08-03 19:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-20 20:23 [PATCH] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
2017-07-20 20:47 ` ✗ Fi.CI.BAT: failure for " Patchwork
2017-07-27 12:13 ` [PATCH] " Tvrtko Ursulin
2017-08-03 19:31   ` Michel Thierry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.