[PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock
@ 2017-08-08  8:08 Daniel Vetter
  2017-08-08  8:08 ` [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Daniel Vetter @ 2017-08-08  8:08 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

... using the biggest hammer we have. This is essentially a weaponized
version of the timeout-based wedging Chris added in

commit 36703e79a982c8ce5a8e43833291f2719e92d0d1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 22 11:56:25 2017 +0100

    drm/i915: Break modeset deadlocks on reset

Because defense-in-depth is good it's good to still have both. Also
note that with the locking change we can now restrict this a lot (old
gpus and special testing only), so this doesn't kill the TDR benefits
on at least anything remotely modern.

And futuremore with a few tricks it should be possible to make a much
more educated guess about whether an atomic commit is stuck waiting on
the gpu (atomic_t counting the pending i915_sw_fence used by the
atomic modeset code should do it), so we can improve this.

But for now just start with something that is guaranteed to recover
faster, for much better CI througput.

This defacto reverts TDR on these platforms, but there's not really a
single commit to specify as the sole offender.

v2: Add a debug message to explain what's going on. We can't DRM_ERROR
because that spams CI. And the timeout based fallback still prints a
DRM_ERROR, in case something goes wrong.

v3: Fix comment layout (Michel)

Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion")
Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e4af56b5ff27..b2c220ba2575 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3458,6 +3458,13 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	    !gpu_reset_clobbers_display(dev_priv))
 		return;

+	/* We have a modeset vs reset deadlock, defensively unbreak it.
+	 *
+	 * FIXME: We can do a _lot_ better, this is just a first iteration.
+	 */
+	i915_gem_set_wedged(dev_priv);
+	DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
+
 	/*
 	 * Need mode_config.mutex so that we don't
 	 * trample ongoing ->detect() and whatnot.
-- 
2.13.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit
  2017-08-08  8:08 [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
@ 2017-08-08  8:08 ` Daniel Vetter
  2017-08-08  8:08 ` [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter
  2017-08-08  8:31 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Patchwork
  2 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2017-08-08  8:08 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

Blocking in a worker is ok, that's what the unbound_wq is for. And it
unifies the paths between the blocking and nonblocking commit, giving
me just one path where I have to implement the deadlock avoidance
trickery in the next patch.

I first tried to implement the following patch without this rework, but
force-completing i915_sw_fence creates some serious challenges around
properly cleaning things up. So wasn't a feasible short-term approach.
Another approach would be to simple keep track of all pending atomic
commit work items and manually queue them from the reset code. With the
caveat that double-queue in case we race with the i915_sw_fence must be
avoided. Given all that, taking the cost of a double schedule in atomic
for the short-term fix is the best approach, but can be changed in the
future of course.

v2: Amend commit message (Chris).

v3: Add comment explaining why we do nothing in the sw_fence complete
callback (Michel).

Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index b2c220ba2575..da8d0d3b2bc2 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12082,6 +12082,8 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 	unsigned crtc_vblank_mask = 0;
 	int i;
 
+	i915_sw_fence_wait(&intel_state->commit_ready);
+
 	drm_atomic_helper_wait_for_dependencies(state);
 
 	if (intel_state->modeset)
@@ -12247,10 +12249,8 @@ intel_atomic_commit_ready(struct i915_sw_fence *fence,
 
 	switch (notify) {
 	case FENCE_COMPLETE:
-		if (state->base.commit_work.func)
-			queue_work(system_unbound_wq, &state->base.commit_work);
+		/* we do blocking waits in the worker, nothing to do here */
 		break;
-
 	case FENCE_FREE:
 		{
 			struct intel_atomic_helper *helper =
@@ -12352,14 +12352,14 @@ static int intel_atomic_commit(struct drm_device *dev,
 	}
 
 	drm_atomic_state_get(state);
-	INIT_WORK(&state->commit_work,
-		  nonblock ? intel_atomic_commit_work : NULL);
+	INIT_WORK(&state->commit_work, intel_atomic_commit_work);
 
 	i915_sw_fence_commit(&intel_state->commit_ready);
-	if (!nonblock) {
-		i915_sw_fence_wait(&intel_state->commit_ready);
+	if (nonblock)
+		queue_work(system_unbound_wq, &state->commit_work);
+	else
 		intel_atomic_commit_tail(state);
-	}
+
 
 	return 0;
 }
-- 
2.13.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock
  2017-08-08  8:08 [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
  2017-08-08  8:08 ` [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter
@ 2017-08-08  8:08 ` Daniel Vetter
  2017-08-14 14:13   ` Michel Thierry
  2017-08-08  8:31 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Patchwork
  2 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2017-08-08  8:08 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

There's no reason to entirely wedge the gpu, for the minimal deadlock
bugfix we only need to unbreak/decouple the atomic commit from the gpu
reset. The simplest way to fix that is by replacing the
unconditional fence wait a the top of commit_tail by a wait which
completes either when the fences are done (normal case, or when a
reset doesn't need to touch the display state). Or when the gpu reset
needs to force-unblock all pending modeset states.

The lesser source of deadlocks is when we try to pin a new framebuffer
and run into a stall. There's a bunch of places this can happen, like
eviction, changing the caching mode, acquiring a fence on older
platforms. And we can't just break the depency loop and keep going,
the only way would be to break out and restart. But the problem with
that approach is that we must stall for the reset to complete before
we grab any locks, and with the atomic infrastructure that's a bit
tricky. The only place is the ioctl code, and we don't want to insert
code into e.g. the BUSY ioctl. Hence for that problem just create a
critical section, and if any code is in there, wedge the GPU. For the
steady-state this should never be a problem.

Note that in both cases TDR itself keeps working, so from a userspace
pov this trickery isn't observable. Users themselvs might spot a short
glitch while the rendering is catching up again, but that's still
better than pre-TDR where we've thrown away all the rendering,
including innocent batches. Also, this fixes the regression TDR
introduced of making gpu resets deadlock-prone when we do need to
touch the display.

One thing I noticed is that gpu_error.flags seems to use both our own
wait-queue in gpu_error.wait_queue, and the generic wait_on_bit
facilities. Not entirely sure why this inconsistency exists, I just
picked one style.

A possible future avenue could be to insert the gpu reset in-between
ongoing modeset changes, which would avoid the momentary glitch. But
that's a lot more work to implement in the atomic commit machinery,
and given that we only need this for pre-g4x hw, of questionable
utility just for the sake of polishing gpu reset even more on those
old boxes. It might be useful for other features though.

v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/.

v3: Really emabarrassing fixup, I checked the wrong bit and broke the
unbreak/wakeup logic.

v4: Also handle deadlocks in pin_to_display.

v5: Review from Michel:
- Fixup the BUILD_BUG_ON
- Don't forget about the overlay

Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h      |  3 +++
 drivers/gpu/drm/i915/i915_irq.c      |  2 +-
 drivers/gpu/drm/i915/intel_display.c | 46 ++++++++++++++++++++++++++++++------
 drivers/gpu/drm/i915/intel_overlay.c | 11 +++++++--
 4 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 907603cba447..4e669b7738d9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1528,6 +1528,8 @@ struct i915_gpu_error {
 	/* Protected by the above dev->gpu_error.lock. */
 	struct i915_gpu_state *first_error;
 
+	atomic_t pending_fb_pin;
+
 	unsigned long missed_irq_rings;
 
 	/**
@@ -1587,6 +1589,7 @@ struct i915_gpu_error {
 	unsigned long flags;
 #define I915_RESET_BACKOFF	0
 #define I915_RESET_HANDOFF	1
+#define I915_RESET_MODESET	2
 #define I915_WEDGED		(BITS_PER_LONG - 1)
 #define I915_RESET_ENGINE	(I915_WEDGED - I915_NUM_ENGINES)
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index cb9f98555c71..f181f19e8436 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2731,7 +2731,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
 	 */
 	if (intel_has_reset_engine(dev_priv)) {
 		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
-			BUILD_BUG_ON(I915_RESET_HANDOFF >= I915_RESET_ENGINE);
+			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
 					     &dev_priv->gpu_error.flags))
 				continue;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index da8d0d3b2bc2..b5631d97a317 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2157,6 +2157,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
 	 */
 	intel_runtime_pm_get(dev_priv);
 
+	atomic_inc(&dev_priv->gpu_error.pending_fb_pin);
+
 	vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view);
 	if (IS_ERR(vma))
 		goto err;
@@ -2184,6 +2186,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
 
 	i915_vma_get(vma);
 err:
+	atomic_dec(&dev_priv->gpu_error.pending_fb_pin);
+
 	intel_runtime_pm_put(dev_priv);
 	return vma;
 }
@@ -3458,12 +3462,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 	    !gpu_reset_clobbers_display(dev_priv))
 		return;
 
-	/* We have a modeset vs reset deadlock, defensively unbreak it.
-	 *
-	 * FIXME: We can do a _lot_ better, this is just a first iteration.
-	 */
-	i915_gem_set_wedged(dev_priv);
-	DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
+	/* We have a modeset vs reset deadlock, defensively unbreak it. */
+	set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
+	wake_up_all(&dev_priv->gpu_error.wait_queue);
+
+	if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) {
+		DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n");
+		i915_gem_set_wedged(dev_priv);
+	}
 
 	/*
 	 * Need mode_config.mutex so that we don't
@@ -3551,6 +3557,8 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
 	drm_modeset_drop_locks(ctx);
 	drm_modeset_acquire_fini(ctx);
 	mutex_unlock(&dev->mode_config.mutex);
+
+	clear_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
 }
 
 static void intel_update_pipe_config(struct intel_crtc *crtc,
@@ -12069,6 +12077,30 @@ static void intel_atomic_helper_free_state_worker(struct work_struct *work)
 	intel_atomic_helper_free_state(dev_priv);
 }
 
+static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_state)
+{
+	struct wait_queue_entry wait_fence, wait_reset;
+	struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev);
+
+	init_wait_entry(&wait_fence, 0);
+	init_wait_entry(&wait_reset, 0);
+	for (;;) {
+		prepare_to_wait(&intel_state->commit_ready.wait,
+				&wait_fence, TASK_UNINTERRUPTIBLE);
+		prepare_to_wait(&dev_priv->gpu_error.wait_queue,
+				&wait_reset, TASK_UNINTERRUPTIBLE);
+
+
+		if (i915_sw_fence_done(&intel_state->commit_ready)
+		    || test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags))
+			break;
+
+		schedule();
+	}
+	finish_wait(&intel_state->commit_ready.wait, &wait_fence);
+	finish_wait(&dev_priv->gpu_error.wait_queue, &wait_reset);
+}
+
 static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 {
 	struct drm_device *dev = state->dev;
@@ -12082,7 +12114,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
 	unsigned crtc_vblank_mask = 0;
 	int i;
 
-	i915_sw_fence_wait(&intel_state->commit_ready);
+	intel_atomic_commit_fence_wait(intel_state);
 
 	drm_atomic_helper_wait_for_dependencies(state);
 
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index b96aed941b97..aace22e7ccac 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -799,9 +799,13 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 	if (ret != 0)
 		return ret;
 
+	atomic_inc(&dev_priv->gpu_error.pending_fb_pin);
+
 	vma = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL);
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
+	if (IS_ERR(vma)) {
+		ret = PTR_ERR(vma);
+		goto out_pin_section;
+	}
 
 	ret = i915_vma_put_fence(vma);
 	if (ret)
@@ -886,6 +890,9 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
 
 out_unpin:
 	i915_gem_object_unpin_from_display_plane(vma);
+out_pin_section:
+	atomic_dec(&dev_priv->gpu_error.pending_fb_pin);
+
 	return ret;
 }
 
-- 
2.13.3

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock
  2017-08-08  8:08 [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
  2017-08-08  8:08 ` [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter
  2017-08-08  8:08 ` [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter
@ 2017-08-08  8:31 ` Patchwork
  2 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2017-08-08  8:31 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock
URL   : https://patchwork.freedesktop.org/series/28485/
State : success

== Summary ==

Series 28485v1 Series without cover letter
https://patchwork.freedesktop.org/api/1.0/series/28485/revisions/1/mbox/

Test gem_ringfill:
        Subgroup basic-default:
                skip       -> PASS       (fi-bsw-n3050) fdo#101915
Test kms_flip:
        Subgroup basic-flip-vs-modeset:
                skip       -> PASS       (fi-skl-x1585l) fdo#101781
Test kms_pipe_crc_basic:
        Subgroup hang-read-crc-pipe-a:
                pass       -> DMESG-WARN (fi-pnv-d510) fdo#101597
        Subgroup suspend-read-crc-pipe-b:
                dmesg-warn -> PASS       (fi-byt-n2820) fdo#101705

fdo#101915 https://bugs.freedesktop.org/show_bug.cgi?id=101915
fdo#101781 https://bugs.freedesktop.org/show_bug.cgi?id=101781
fdo#101597 https://bugs.freedesktop.org/show_bug.cgi?id=101597
fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705

fi-bdw-5557u     total:279  pass:268  dwarn:0   dfail:0   fail:0   skip:11  time:443s
fi-bdw-gvtdvm    total:279  pass:265  dwarn:0   dfail:0   fail:0   skip:14  time:419s
fi-blb-e6850     total:279  pass:224  dwarn:1   dfail:0   fail:0   skip:54  time:358s
fi-bsw-n3050     total:279  pass:243  dwarn:0   dfail:0   fail:0   skip:36  time:493s
fi-bxt-j4205     total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:493s
fi-byt-j1900     total:279  pass:254  dwarn:1   dfail:0   fail:0   skip:24  time:523s
fi-byt-n2820     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:514s
fi-glk-2a        total:279  pass:260  dwarn:0   dfail:0   fail:0   skip:19  time:582s
fi-hsw-4770      total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:430s
fi-hsw-4770r     total:279  pass:263  dwarn:0   dfail:0   fail:0   skip:16  time:404s
fi-ilk-650       total:279  pass:229  dwarn:0   dfail:0   fail:0   skip:50  time:420s
fi-ivb-3520m     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:510s
fi-ivb-3770      total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:474s
fi-kbl-7500u     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:456s
fi-kbl-7560u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:567s
fi-kbl-r         total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:573s
fi-pnv-d510      total:279  pass:222  dwarn:2   dfail:0   fail:0   skip:55  time:574s
fi-skl-6260u     total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:448s
fi-skl-6700k     total:279  pass:261  dwarn:0   dfail:0   fail:0   skip:18  time:643s
fi-skl-6770hq    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:460s
fi-skl-gvtdvm    total:279  pass:266  dwarn:0   dfail:0   fail:0   skip:13  time:429s
fi-skl-x1585l    total:279  pass:269  dwarn:0   dfail:0   fail:0   skip:10  time:494s
fi-snb-2520m     total:279  pass:251  dwarn:0   dfail:0   fail:0   skip:28  time:552s
fi-snb-2600      total:279  pass:250  dwarn:0   dfail:0   fail:0   skip:29  time:405s

e5a4f43e82fc892a23620bd1c998a2b308b59577 drm-tip: 2017y-08m-07d-21h-11m-22s UTC integration manifest
c1aed93f0b00 drm/i915: More surgically unbreak the modeset vs reset deadlock
6944f025d0e6 drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit
c764ffcb0986 drm/i915: Avoid the gpu reset vs. modeset deadlock

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5337/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock
  2017-08-08  8:08 ` [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter
@ 2017-08-14 14:13   ` Michel Thierry
  2017-08-15 15:32     ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Michel Thierry @ 2017-08-14 14:13 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development; +Cc: Daniel Vetter, Mika Kuoppala

On 8/8/2017 1:08 AM, Daniel Vetter wrote:
> There's no reason to entirely wedge the gpu, for the minimal deadlock
> bugfix we only need to unbreak/decouple the atomic commit from the gpu
> reset. The simplest way to fix that is by replacing the
> unconditional fence wait a the top of commit_tail by a wait which
> completes either when the fences are done (normal case, or when a
> reset doesn't need to touch the display state). Or when the gpu reset
> needs to force-unblock all pending modeset states.
> 
> The lesser source of deadlocks is when we try to pin a new framebuffer
> and run into a stall. There's a bunch of places this can happen, like
> eviction, changing the caching mode, acquiring a fence on older
> platforms. And we can't just break the depency loop and keep going,
> the only way would be to break out and restart. But the problem with
> that approach is that we must stall for the reset to complete before
> we grab any locks, and with the atomic infrastructure that's a bit
> tricky. The only place is the ioctl code, and we don't want to insert
> code into e.g. the BUSY ioctl. Hence for that problem just create a
> critical section, and if any code is in there, wedge the GPU. For the
> steady-state this should never be a problem.
> 
> Note that in both cases TDR itself keeps working, so from a userspace
> pov this trickery isn't observable. Users themselvs might spot a short
> glitch while the rendering is catching up again, but that's still
> better than pre-TDR where we've thrown away all the rendering,
> including innocent batches. Also, this fixes the regression TDR
> introduced of making gpu resets deadlock-prone when we do need to
> touch the display.
> 
> One thing I noticed is that gpu_error.flags seems to use both our own
> wait-queue in gpu_error.wait_queue, and the generic wait_on_bit
> facilities. Not entirely sure why this inconsistency exists, I just
> picked one style.
> 
> A possible future avenue could be to insert the gpu reset in-between
> ongoing modeset changes, which would avoid the momentary glitch. But
> that's a lot more work to implement in the atomic commit machinery,
> and given that we only need this for pre-g4x hw, of questionable
> utility just for the sake of polishing gpu reset even more on those
> old boxes. It might be useful for other features though.
> 
> v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/.
> 
> v3: Really emabarrassing fixup, I checked the wrong bit and broke the
> unbreak/wakeup logic.
> 
> v4: Also handle deadlocks in pin_to_display.
> 
> v5: Review from Michel:
> - Fixup the BUILD_BUG_ON
> - Don't forget about the overlay
> 

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2)
> Cc: Michel Thierry <michel.thierry@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h      |  3 +++
>   drivers/gpu/drm/i915/i915_irq.c      |  2 +-
>   drivers/gpu/drm/i915/intel_display.c | 46 ++++++++++++++++++++++++++++++------
>   drivers/gpu/drm/i915/intel_overlay.c | 11 +++++++--
>   4 files changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 907603cba447..4e669b7738d9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1528,6 +1528,8 @@ struct i915_gpu_error {
>   	/* Protected by the above dev->gpu_error.lock. */
>   	struct i915_gpu_state *first_error;
>   
> +	atomic_t pending_fb_pin;
> +
>   	unsigned long missed_irq_rings;
>   
>   	/**
> @@ -1587,6 +1589,7 @@ struct i915_gpu_error {
>   	unsigned long flags;
>   #define I915_RESET_BACKOFF	0
>   #define I915_RESET_HANDOFF	1
> +#define I915_RESET_MODESET	2
>   #define I915_WEDGED		(BITS_PER_LONG - 1)
>   #define I915_RESET_ENGINE	(I915_WEDGED - I915_NUM_ENGINES)
>   
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index cb9f98555c71..f181f19e8436 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2731,7 +2731,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
>   	 */
>   	if (intel_has_reset_engine(dev_priv)) {
>   		for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
> -			BUILD_BUG_ON(I915_RESET_HANDOFF >= I915_RESET_ENGINE);
> +			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
>   			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
>   					     &dev_priv->gpu_error.flags))
>   				continue;
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index da8d0d3b2bc2..b5631d97a317 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -2157,6 +2157,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
>   	 */
>   	intel_runtime_pm_get(dev_priv);
>   
> +	atomic_inc(&dev_priv->gpu_error.pending_fb_pin);
> +
>   	vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view);
>   	if (IS_ERR(vma))
>   		goto err;
> @@ -2184,6 +2186,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation)
>   
>   	i915_vma_get(vma);
>   err:
> +	atomic_dec(&dev_priv->gpu_error.pending_fb_pin);
> +
>   	intel_runtime_pm_put(dev_priv);
>   	return vma;
>   }
> @@ -3458,12 +3462,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>   	    !gpu_reset_clobbers_display(dev_priv))
>   		return;
>   
> -	/* We have a modeset vs reset deadlock, defensively unbreak it.
> -	 *
> -	 * FIXME: We can do a _lot_ better, this is just a first iteration.
> -	 */
> -	i915_gem_set_wedged(dev_priv);
> -	DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
> +	/* We have a modeset vs reset deadlock, defensively unbreak it. */
> +	set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
> +	wake_up_all(&dev_priv->gpu_error.wait_queue);
> +
> +	if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) {
> +		DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n");
> +		i915_gem_set_wedged(dev_priv);
> +	}
>   
>   	/*
>   	 * Need mode_config.mutex so that we don't
> @@ -3551,6 +3557,8 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
>   	drm_modeset_drop_locks(ctx);
>   	drm_modeset_acquire_fini(ctx);
>   	mutex_unlock(&dev->mode_config.mutex);
> +
> +	clear_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
>   }
>   
>   static void intel_update_pipe_config(struct intel_crtc *crtc,
> @@ -12069,6 +12077,30 @@ static void intel_atomic_helper_free_state_worker(struct work_struct *work)
>   	intel_atomic_helper_free_state(dev_priv);
>   }
>   
> +static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_state)
> +{
> +	struct wait_queue_entry wait_fence, wait_reset;
> +	struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev);
> +
> +	init_wait_entry(&wait_fence, 0);
> +	init_wait_entry(&wait_reset, 0);
> +	for (;;) {
> +		prepare_to_wait(&intel_state->commit_ready.wait,
> +				&wait_fence, TASK_UNINTERRUPTIBLE);
> +		prepare_to_wait(&dev_priv->gpu_error.wait_queue,
> +				&wait_reset, TASK_UNINTERRUPTIBLE);
> +
> +
> +		if (i915_sw_fence_done(&intel_state->commit_ready)
> +		    || test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags))
> +			break;
> +
> +		schedule();
> +	}
> +	finish_wait(&intel_state->commit_ready.wait, &wait_fence);
> +	finish_wait(&dev_priv->gpu_error.wait_queue, &wait_reset);
> +}
> +
>   static void intel_atomic_commit_tail(struct drm_atomic_state *state)
>   {
>   	struct drm_device *dev = state->dev;
> @@ -12082,7 +12114,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state)
>   	unsigned crtc_vblank_mask = 0;
>   	int i;
>   
> -	i915_sw_fence_wait(&intel_state->commit_ready);
> +	intel_atomic_commit_fence_wait(intel_state);
>   
>   	drm_atomic_helper_wait_for_dependencies(state);
>   
> diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
> index b96aed941b97..aace22e7ccac 100644
> --- a/drivers/gpu/drm/i915/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/intel_overlay.c
> @@ -799,9 +799,13 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
>   	if (ret != 0)
>   		return ret;
>   
> +	atomic_inc(&dev_priv->gpu_error.pending_fb_pin);
> +
>   	vma = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL);
> -	if (IS_ERR(vma))
> -		return PTR_ERR(vma);
> +	if (IS_ERR(vma)) {
> +		ret = PTR_ERR(vma);
> +		goto out_pin_section;
> +	}
>   
>   	ret = i915_vma_put_fence(vma);
>   	if (ret)
> @@ -886,6 +890,9 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay,
>   
>   out_unpin:
>   	i915_gem_object_unpin_from_display_plane(vma);
> +out_pin_section:
> +	atomic_dec(&dev_priv->gpu_error.pending_fb_pin);
> +
>   	return ret;
>   }
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock
  2017-08-14 14:13   ` Michel Thierry
@ 2017-08-15 15:32     ` Chris Wilson
  2017-08-16 12:43       ` Daniel Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2017-08-15 15:32 UTC (permalink / raw)
  To: Michel Thierry, Daniel Vetter, Intel Graphics Development
  Cc: Daniel Vetter, Mika Kuoppala

Quoting Michel Thierry (2017-08-14 15:13:59)
> On 8/8/2017 1:08 AM, Daniel Vetter wrote:
> > @@ -3458,12 +3462,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
> >           !gpu_reset_clobbers_display(dev_priv))
> >               return;
> >   
> > -     /* We have a modeset vs reset deadlock, defensively unbreak it.
> > -      *
> > -      * FIXME: We can do a _lot_ better, this is just a first iteration.
> > -      */
> > -     i915_gem_set_wedged(dev_priv);
> > -     DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
> > +     /* We have a modeset vs reset deadlock, defensively unbreak it. */
> > +     set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
> > +     wake_up_all(&dev_priv->gpu_error.wait_queue);
> > +
> > +     if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) {
> > +             DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n");

I still maintain that all discarding of user data should be at DRM_ERROR level.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock
  2017-08-15 15:32     ` Chris Wilson
@ 2017-08-16 12:43       ` Daniel Vetter
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Vetter @ 2017-08-16 12:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development, Daniel Vetter, Mika Kuoppala

On Tue, Aug 15, 2017 at 5:32 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Michel Thierry (2017-08-14 15:13:59)
>> On 8/8/2017 1:08 AM, Daniel Vetter wrote:
>> > @@ -3458,12 +3462,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>> >           !gpu_reset_clobbers_display(dev_priv))
>> >               return;
>> >
>> > -     /* We have a modeset vs reset deadlock, defensively unbreak it.
>> > -      *
>> > -      * FIXME: We can do a _lot_ better, this is just a first iteration.
>> > -      */
>> > -     i915_gem_set_wedged(dev_priv);
>> > -     DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n");
>> > +     /* We have a modeset vs reset deadlock, defensively unbreak it. */
>> > +     set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
>> > +     wake_up_all(&dev_priv->gpu_error.wait_queue);
>> > +
>> > +     if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) {
>> > +             DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n");
>
> I still maintain that all discarding of user data should be at DRM_ERROR level.

Fully agreed, except we just argued yesterday and you had the firm
stance that it's not ok. So that idea died with the "don't report this
hang" igt infrastructure. Yes we still shout "reset timed out" into
dmesg for expected hangs and expected cases for gen3, but since that's
just another gem tested I also can't be bothered to fix things up.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-16 12:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-08  8:08 [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter
2017-08-08  8:08 ` [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter
2017-08-08  8:08 ` [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter
2017-08-14 14:13   ` Michel Thierry
2017-08-15 15:32     ` Chris Wilson
2017-08-16 12:43       ` Daniel Vetter
2017-08-08  8:31 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.