* [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock @ 2017-08-08 8:08 Daniel Vetter 2017-08-08 8:08 ` [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Daniel Vetter @ 2017-08-08 8:08 UTC (permalink / raw) To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala ... using the biggest hammer we have. This is essentially a weaponized version of the timeout-based wedging Chris added in commit 36703e79a982c8ce5a8e43833291f2719e92d0d1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jun 22 11:56:25 2017 +0100 drm/i915: Break modeset deadlocks on reset Because defense-in-depth is good it's good to still have both. Also note that with the locking change we can now restrict this a lot (old gpus and special testing only), so this doesn't kill the TDR benefits on at least anything remotely modern. And futuremore with a few tricks it should be possible to make a much more educated guess about whether an atomic commit is stuck waiting on the gpu (atomic_t counting the pending i915_sw_fence used by the atomic modeset code should do it), so we can improve this. But for now just start with something that is guaranteed to recover faster, for much better CI througput. This defacto reverts TDR on these platforms, but there's not really a single commit to specify as the sole offender. v2: Add a debug message to explain what's going on. We can't DRM_ERROR because that spams CI. And the timeout based fallback still prints a DRM_ERROR, in case something goes wrong. v3: Fix comment layout (Michel) Fixes: 4680816be336 ("drm/i915: Wait first for submission, before waiting for request completion") Fixes: 221fe7994554 ("drm/i915: Perform a direct reset of the GPU from the waiter") Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2) Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2) Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- drivers/gpu/drm/i915/intel_display.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index e4af56b5ff27..b2c220ba2575 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -3458,6 +3458,13 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) !gpu_reset_clobbers_display(dev_priv)) return; + /* We have a modeset vs reset deadlock, defensively unbreak it. + * + * FIXME: We can do a _lot_ better, this is just a first iteration. + */ + i915_gem_set_wedged(dev_priv); + DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n"); + /* * Need mode_config.mutex so that we don't * trample ongoing ->detect() and whatnot. -- 2.13.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit 2017-08-08 8:08 [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter @ 2017-08-08 8:08 ` Daniel Vetter 2017-08-08 8:08 ` [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter 2017-08-08 8:31 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Patchwork 2 siblings, 0 replies; 7+ messages in thread From: Daniel Vetter @ 2017-08-08 8:08 UTC (permalink / raw) To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala Blocking in a worker is ok, that's what the unbound_wq is for. And it unifies the paths between the blocking and nonblocking commit, giving me just one path where I have to implement the deadlock avoidance trickery in the next patch. I first tried to implement the following patch without this rework, but force-completing i915_sw_fence creates some serious challenges around properly cleaning things up. So wasn't a feasible short-term approach. Another approach would be to simple keep track of all pending atomic commit work items and manually queue them from the reset code. With the caveat that double-queue in case we race with the i915_sw_fence must be avoided. Given all that, taking the cost of a double schedule in atomic for the short-term fix is the best approach, but can be changed in the future of course. v2: Amend commit message (Chris). v3: Add comment explaining why we do nothing in the sw_fence complete callback (Michel). Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2) Cc: Michel Thierry <michel.thierry@intel.com> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- drivers/gpu/drm/i915/intel_display.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index b2c220ba2575..da8d0d3b2bc2 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -12082,6 +12082,8 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state) unsigned crtc_vblank_mask = 0; int i; + i915_sw_fence_wait(&intel_state->commit_ready); + drm_atomic_helper_wait_for_dependencies(state); if (intel_state->modeset) @@ -12247,10 +12249,8 @@ intel_atomic_commit_ready(struct i915_sw_fence *fence, switch (notify) { case FENCE_COMPLETE: - if (state->base.commit_work.func) - queue_work(system_unbound_wq, &state->base.commit_work); + /* we do blocking waits in the worker, nothing to do here */ break; - case FENCE_FREE: { struct intel_atomic_helper *helper = @@ -12352,14 +12352,14 @@ static int intel_atomic_commit(struct drm_device *dev, } drm_atomic_state_get(state); - INIT_WORK(&state->commit_work, - nonblock ? intel_atomic_commit_work : NULL); + INIT_WORK(&state->commit_work, intel_atomic_commit_work); i915_sw_fence_commit(&intel_state->commit_ready); - if (!nonblock) { - i915_sw_fence_wait(&intel_state->commit_ready); + if (nonblock) + queue_work(system_unbound_wq, &state->commit_work); + else intel_atomic_commit_tail(state); - } + return 0; } -- 2.13.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock 2017-08-08 8:08 [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter 2017-08-08 8:08 ` [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter @ 2017-08-08 8:08 ` Daniel Vetter 2017-08-14 14:13 ` Michel Thierry 2017-08-08 8:31 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Patchwork 2 siblings, 1 reply; 7+ messages in thread From: Daniel Vetter @ 2017-08-08 8:08 UTC (permalink / raw) To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala There's no reason to entirely wedge the gpu, for the minimal deadlock bugfix we only need to unbreak/decouple the atomic commit from the gpu reset. The simplest way to fix that is by replacing the unconditional fence wait a the top of commit_tail by a wait which completes either when the fences are done (normal case, or when a reset doesn't need to touch the display state). Or when the gpu reset needs to force-unblock all pending modeset states. The lesser source of deadlocks is when we try to pin a new framebuffer and run into a stall. There's a bunch of places this can happen, like eviction, changing the caching mode, acquiring a fence on older platforms. And we can't just break the depency loop and keep going, the only way would be to break out and restart. But the problem with that approach is that we must stall for the reset to complete before we grab any locks, and with the atomic infrastructure that's a bit tricky. The only place is the ioctl code, and we don't want to insert code into e.g. the BUSY ioctl. Hence for that problem just create a critical section, and if any code is in there, wedge the GPU. For the steady-state this should never be a problem. Note that in both cases TDR itself keeps working, so from a userspace pov this trickery isn't observable. Users themselvs might spot a short glitch while the rendering is catching up again, but that's still better than pre-TDR where we've thrown away all the rendering, including innocent batches. Also, this fixes the regression TDR introduced of making gpu resets deadlock-prone when we do need to touch the display. One thing I noticed is that gpu_error.flags seems to use both our own wait-queue in gpu_error.wait_queue, and the generic wait_on_bit facilities. Not entirely sure why this inconsistency exists, I just picked one style. A possible future avenue could be to insert the gpu reset in-between ongoing modeset changes, which would avoid the momentary glitch. But that's a lot more work to implement in the atomic commit machinery, and given that we only need this for pre-g4x hw, of questionable utility just for the sake of polishing gpu reset even more on those old boxes. It might be useful for other features though. v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/. v3: Really emabarrassing fixup, I checked the wrong bit and broke the unbreak/wakeup logic. v4: Also handle deadlocks in pin_to_display. v5: Review from Michel: - Fixup the BUILD_BUG_ON - Don't forget about the overlay Cc: Michel Thierry <michel.thierry@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2) Cc: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> --- drivers/gpu/drm/i915/i915_drv.h | 3 +++ drivers/gpu/drm/i915/i915_irq.c | 2 +- drivers/gpu/drm/i915/intel_display.c | 46 ++++++++++++++++++++++++++++++------ drivers/gpu/drm/i915/intel_overlay.c | 11 +++++++-- 4 files changed, 52 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 907603cba447..4e669b7738d9 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1528,6 +1528,8 @@ struct i915_gpu_error { /* Protected by the above dev->gpu_error.lock. */ struct i915_gpu_state *first_error; + atomic_t pending_fb_pin; + unsigned long missed_irq_rings; /** @@ -1587,6 +1589,7 @@ struct i915_gpu_error { unsigned long flags; #define I915_RESET_BACKOFF 0 #define I915_RESET_HANDOFF 1 +#define I915_RESET_MODESET 2 #define I915_WEDGED (BITS_PER_LONG - 1) #define I915_RESET_ENGINE (I915_WEDGED - I915_NUM_ENGINES) diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index cb9f98555c71..f181f19e8436 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2731,7 +2731,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv, */ if (intel_has_reset_engine(dev_priv)) { for_each_engine_masked(engine, dev_priv, engine_mask, tmp) { - BUILD_BUG_ON(I915_RESET_HANDOFF >= I915_RESET_ENGINE); + BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE); if (test_and_set_bit(I915_RESET_ENGINE + engine->id, &dev_priv->gpu_error.flags)) continue; diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index da8d0d3b2bc2..b5631d97a317 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2157,6 +2157,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation) */ intel_runtime_pm_get(dev_priv); + atomic_inc(&dev_priv->gpu_error.pending_fb_pin); + vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view); if (IS_ERR(vma)) goto err; @@ -2184,6 +2186,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation) i915_vma_get(vma); err: + atomic_dec(&dev_priv->gpu_error.pending_fb_pin); + intel_runtime_pm_put(dev_priv); return vma; } @@ -3458,12 +3462,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) !gpu_reset_clobbers_display(dev_priv)) return; - /* We have a modeset vs reset deadlock, defensively unbreak it. - * - * FIXME: We can do a _lot_ better, this is just a first iteration. - */ - i915_gem_set_wedged(dev_priv); - DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n"); + /* We have a modeset vs reset deadlock, defensively unbreak it. */ + set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags); + wake_up_all(&dev_priv->gpu_error.wait_queue); + + if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) { + DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n"); + i915_gem_set_wedged(dev_priv); + } /* * Need mode_config.mutex so that we don't @@ -3551,6 +3557,8 @@ void intel_finish_reset(struct drm_i915_private *dev_priv) drm_modeset_drop_locks(ctx); drm_modeset_acquire_fini(ctx); mutex_unlock(&dev->mode_config.mutex); + + clear_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags); } static void intel_update_pipe_config(struct intel_crtc *crtc, @@ -12069,6 +12077,30 @@ static void intel_atomic_helper_free_state_worker(struct work_struct *work) intel_atomic_helper_free_state(dev_priv); } +static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_state) +{ + struct wait_queue_entry wait_fence, wait_reset; + struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev); + + init_wait_entry(&wait_fence, 0); + init_wait_entry(&wait_reset, 0); + for (;;) { + prepare_to_wait(&intel_state->commit_ready.wait, + &wait_fence, TASK_UNINTERRUPTIBLE); + prepare_to_wait(&dev_priv->gpu_error.wait_queue, + &wait_reset, TASK_UNINTERRUPTIBLE); + + + if (i915_sw_fence_done(&intel_state->commit_ready) + || test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags)) + break; + + schedule(); + } + finish_wait(&intel_state->commit_ready.wait, &wait_fence); + finish_wait(&dev_priv->gpu_error.wait_queue, &wait_reset); +} + static void intel_atomic_commit_tail(struct drm_atomic_state *state) { struct drm_device *dev = state->dev; @@ -12082,7 +12114,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state) unsigned crtc_vblank_mask = 0; int i; - i915_sw_fence_wait(&intel_state->commit_ready); + intel_atomic_commit_fence_wait(intel_state); drm_atomic_helper_wait_for_dependencies(state); diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index b96aed941b97..aace22e7ccac 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -799,9 +799,13 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, if (ret != 0) return ret; + atomic_inc(&dev_priv->gpu_error.pending_fb_pin); + vma = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL); - if (IS_ERR(vma)) - return PTR_ERR(vma); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); + goto out_pin_section; + } ret = i915_vma_put_fence(vma); if (ret) @@ -886,6 +890,9 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, out_unpin: i915_gem_object_unpin_from_display_plane(vma); +out_pin_section: + atomic_dec(&dev_priv->gpu_error.pending_fb_pin); + return ret; } -- 2.13.3 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock 2017-08-08 8:08 ` [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter @ 2017-08-14 14:13 ` Michel Thierry 2017-08-15 15:32 ` Chris Wilson 0 siblings, 1 reply; 7+ messages in thread From: Michel Thierry @ 2017-08-14 14:13 UTC (permalink / raw) To: Daniel Vetter, Intel Graphics Development; +Cc: Daniel Vetter, Mika Kuoppala On 8/8/2017 1:08 AM, Daniel Vetter wrote: > There's no reason to entirely wedge the gpu, for the minimal deadlock > bugfix we only need to unbreak/decouple the atomic commit from the gpu > reset. The simplest way to fix that is by replacing the > unconditional fence wait a the top of commit_tail by a wait which > completes either when the fences are done (normal case, or when a > reset doesn't need to touch the display state). Or when the gpu reset > needs to force-unblock all pending modeset states. > > The lesser source of deadlocks is when we try to pin a new framebuffer > and run into a stall. There's a bunch of places this can happen, like > eviction, changing the caching mode, acquiring a fence on older > platforms. And we can't just break the depency loop and keep going, > the only way would be to break out and restart. But the problem with > that approach is that we must stall for the reset to complete before > we grab any locks, and with the atomic infrastructure that's a bit > tricky. The only place is the ioctl code, and we don't want to insert > code into e.g. the BUSY ioctl. Hence for that problem just create a > critical section, and if any code is in there, wedge the GPU. For the > steady-state this should never be a problem. > > Note that in both cases TDR itself keeps working, so from a userspace > pov this trickery isn't observable. Users themselvs might spot a short > glitch while the rendering is catching up again, but that's still > better than pre-TDR where we've thrown away all the rendering, > including innocent batches. Also, this fixes the regression TDR > introduced of making gpu resets deadlock-prone when we do need to > touch the display. > > One thing I noticed is that gpu_error.flags seems to use both our own > wait-queue in gpu_error.wait_queue, and the generic wait_on_bit > facilities. Not entirely sure why this inconsistency exists, I just > picked one style. > > A possible future avenue could be to insert the gpu reset in-between > ongoing modeset changes, which would avoid the momentary glitch. But > that's a lot more work to implement in the atomic commit machinery, > and given that we only need this for pre-g4x hw, of questionable > utility just for the sake of polishing gpu reset even more on those > old boxes. It might be useful for other features though. > > v2: Rebase onto 4.13 with a s/wait_queue_t/struct wait_queue_entry/. > > v3: Really emabarrassing fixup, I checked the wrong bit and broke the > unbreak/wakeup logic. > > v4: Also handle deadlocks in pin_to_display. > > v5: Review from Michel: > - Fixup the BUILD_BUG_ON > - Don't forget about the overlay > Reviewed-by: Michel Thierry <michel.thierry@intel.com> > Cc: Michel Thierry <michel.thierry@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> (v2) > Cc: Michel Thierry <michel.thierry@intel.com> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> > --- > drivers/gpu/drm/i915/i915_drv.h | 3 +++ > drivers/gpu/drm/i915/i915_irq.c | 2 +- > drivers/gpu/drm/i915/intel_display.c | 46 ++++++++++++++++++++++++++++++------ > drivers/gpu/drm/i915/intel_overlay.c | 11 +++++++-- > 4 files changed, 52 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 907603cba447..4e669b7738d9 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1528,6 +1528,8 @@ struct i915_gpu_error { > /* Protected by the above dev->gpu_error.lock. */ > struct i915_gpu_state *first_error; > > + atomic_t pending_fb_pin; > + > unsigned long missed_irq_rings; > > /** > @@ -1587,6 +1589,7 @@ struct i915_gpu_error { > unsigned long flags; > #define I915_RESET_BACKOFF 0 > #define I915_RESET_HANDOFF 1 > +#define I915_RESET_MODESET 2 > #define I915_WEDGED (BITS_PER_LONG - 1) > #define I915_RESET_ENGINE (I915_WEDGED - I915_NUM_ENGINES) > > diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c > index cb9f98555c71..f181f19e8436 100644 > --- a/drivers/gpu/drm/i915/i915_irq.c > +++ b/drivers/gpu/drm/i915/i915_irq.c > @@ -2731,7 +2731,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv, > */ > if (intel_has_reset_engine(dev_priv)) { > for_each_engine_masked(engine, dev_priv, engine_mask, tmp) { > - BUILD_BUG_ON(I915_RESET_HANDOFF >= I915_RESET_ENGINE); > + BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE); > if (test_and_set_bit(I915_RESET_ENGINE + engine->id, > &dev_priv->gpu_error.flags)) > continue; > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > index da8d0d3b2bc2..b5631d97a317 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -2157,6 +2157,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation) > */ > intel_runtime_pm_get(dev_priv); > > + atomic_inc(&dev_priv->gpu_error.pending_fb_pin); > + > vma = i915_gem_object_pin_to_display_plane(obj, alignment, &view); > if (IS_ERR(vma)) > goto err; > @@ -2184,6 +2186,8 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int rotation) > > i915_vma_get(vma); > err: > + atomic_dec(&dev_priv->gpu_error.pending_fb_pin); > + > intel_runtime_pm_put(dev_priv); > return vma; > } > @@ -3458,12 +3462,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) > !gpu_reset_clobbers_display(dev_priv)) > return; > > - /* We have a modeset vs reset deadlock, defensively unbreak it. > - * > - * FIXME: We can do a _lot_ better, this is just a first iteration. > - */ > - i915_gem_set_wedged(dev_priv); > - DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n"); > + /* We have a modeset vs reset deadlock, defensively unbreak it. */ > + set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags); > + wake_up_all(&dev_priv->gpu_error.wait_queue); > + > + if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) { > + DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n"); > + i915_gem_set_wedged(dev_priv); > + } > > /* > * Need mode_config.mutex so that we don't > @@ -3551,6 +3557,8 @@ void intel_finish_reset(struct drm_i915_private *dev_priv) > drm_modeset_drop_locks(ctx); > drm_modeset_acquire_fini(ctx); > mutex_unlock(&dev->mode_config.mutex); > + > + clear_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags); > } > > static void intel_update_pipe_config(struct intel_crtc *crtc, > @@ -12069,6 +12077,30 @@ static void intel_atomic_helper_free_state_worker(struct work_struct *work) > intel_atomic_helper_free_state(dev_priv); > } > > +static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_state) > +{ > + struct wait_queue_entry wait_fence, wait_reset; > + struct drm_i915_private *dev_priv = to_i915(intel_state->base.dev); > + > + init_wait_entry(&wait_fence, 0); > + init_wait_entry(&wait_reset, 0); > + for (;;) { > + prepare_to_wait(&intel_state->commit_ready.wait, > + &wait_fence, TASK_UNINTERRUPTIBLE); > + prepare_to_wait(&dev_priv->gpu_error.wait_queue, > + &wait_reset, TASK_UNINTERRUPTIBLE); > + > + > + if (i915_sw_fence_done(&intel_state->commit_ready) > + || test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags)) > + break; > + > + schedule(); > + } > + finish_wait(&intel_state->commit_ready.wait, &wait_fence); > + finish_wait(&dev_priv->gpu_error.wait_queue, &wait_reset); > +} > + > static void intel_atomic_commit_tail(struct drm_atomic_state *state) > { > struct drm_device *dev = state->dev; > @@ -12082,7 +12114,7 @@ static void intel_atomic_commit_tail(struct drm_atomic_state *state) > unsigned crtc_vblank_mask = 0; > int i; > > - i915_sw_fence_wait(&intel_state->commit_ready); > + intel_atomic_commit_fence_wait(intel_state); > > drm_atomic_helper_wait_for_dependencies(state); > > diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c > index b96aed941b97..aace22e7ccac 100644 > --- a/drivers/gpu/drm/i915/intel_overlay.c > +++ b/drivers/gpu/drm/i915/intel_overlay.c > @@ -799,9 +799,13 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, > if (ret != 0) > return ret; > > + atomic_inc(&dev_priv->gpu_error.pending_fb_pin); > + > vma = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL); > - if (IS_ERR(vma)) > - return PTR_ERR(vma); > + if (IS_ERR(vma)) { > + ret = PTR_ERR(vma); > + goto out_pin_section; > + } > > ret = i915_vma_put_fence(vma); > if (ret) > @@ -886,6 +890,9 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, > > out_unpin: > i915_gem_object_unpin_from_display_plane(vma); > +out_pin_section: > + atomic_dec(&dev_priv->gpu_error.pending_fb_pin); > + > return ret; > } > > _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock 2017-08-14 14:13 ` Michel Thierry @ 2017-08-15 15:32 ` Chris Wilson 2017-08-16 12:43 ` Daniel Vetter 0 siblings, 1 reply; 7+ messages in thread From: Chris Wilson @ 2017-08-15 15:32 UTC (permalink / raw) To: Michel Thierry, Daniel Vetter, Intel Graphics Development Cc: Daniel Vetter, Mika Kuoppala Quoting Michel Thierry (2017-08-14 15:13:59) > On 8/8/2017 1:08 AM, Daniel Vetter wrote: > > @@ -3458,12 +3462,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) > > !gpu_reset_clobbers_display(dev_priv)) > > return; > > > > - /* We have a modeset vs reset deadlock, defensively unbreak it. > > - * > > - * FIXME: We can do a _lot_ better, this is just a first iteration. > > - */ > > - i915_gem_set_wedged(dev_priv); > > - DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n"); > > + /* We have a modeset vs reset deadlock, defensively unbreak it. */ > > + set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags); > > + wake_up_all(&dev_priv->gpu_error.wait_queue); > > + > > + if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) { > > + DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n"); I still maintain that all discarding of user data should be at DRM_ERROR level. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock 2017-08-15 15:32 ` Chris Wilson @ 2017-08-16 12:43 ` Daniel Vetter 0 siblings, 0 replies; 7+ messages in thread From: Daniel Vetter @ 2017-08-16 12:43 UTC (permalink / raw) To: Chris Wilson; +Cc: Intel Graphics Development, Daniel Vetter, Mika Kuoppala On Tue, Aug 15, 2017 at 5:32 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > Quoting Michel Thierry (2017-08-14 15:13:59) >> On 8/8/2017 1:08 AM, Daniel Vetter wrote: >> > @@ -3458,12 +3462,14 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv) >> > !gpu_reset_clobbers_display(dev_priv)) >> > return; >> > >> > - /* We have a modeset vs reset deadlock, defensively unbreak it. >> > - * >> > - * FIXME: We can do a _lot_ better, this is just a first iteration. >> > - */ >> > - i915_gem_set_wedged(dev_priv); >> > - DRM_DEBUG_DRIVER("Wedging GPU to avoid deadlocks with pending modeset updates\n"); >> > + /* We have a modeset vs reset deadlock, defensively unbreak it. */ >> > + set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags); >> > + wake_up_all(&dev_priv->gpu_error.wait_queue); >> > + >> > + if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) { >> > + DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n"); > > I still maintain that all discarding of user data should be at DRM_ERROR level. Fully agreed, except we just argued yesterday and you had the firm stance that it's not ok. So that idea died with the "don't report this hang" igt infrastructure. Yes we still shout "reset timed out" into dmesg for expected hangs and expected cases for gen3, but since that's just another gem tested I also can't be bothered to fix things up. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
* ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock 2017-08-08 8:08 [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter 2017-08-08 8:08 ` [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter 2017-08-08 8:08 ` [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter @ 2017-08-08 8:31 ` Patchwork 2 siblings, 0 replies; 7+ messages in thread From: Patchwork @ 2017-08-08 8:31 UTC (permalink / raw) To: Daniel Vetter; +Cc: intel-gfx == Series Details == Series: series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock URL : https://patchwork.freedesktop.org/series/28485/ State : success == Summary == Series 28485v1 Series without cover letter https://patchwork.freedesktop.org/api/1.0/series/28485/revisions/1/mbox/ Test gem_ringfill: Subgroup basic-default: skip -> PASS (fi-bsw-n3050) fdo#101915 Test kms_flip: Subgroup basic-flip-vs-modeset: skip -> PASS (fi-skl-x1585l) fdo#101781 Test kms_pipe_crc_basic: Subgroup hang-read-crc-pipe-a: pass -> DMESG-WARN (fi-pnv-d510) fdo#101597 Subgroup suspend-read-crc-pipe-b: dmesg-warn -> PASS (fi-byt-n2820) fdo#101705 fdo#101915 https://bugs.freedesktop.org/show_bug.cgi?id=101915 fdo#101781 https://bugs.freedesktop.org/show_bug.cgi?id=101781 fdo#101597 https://bugs.freedesktop.org/show_bug.cgi?id=101597 fdo#101705 https://bugs.freedesktop.org/show_bug.cgi?id=101705 fi-bdw-5557u total:279 pass:268 dwarn:0 dfail:0 fail:0 skip:11 time:443s fi-bdw-gvtdvm total:279 pass:265 dwarn:0 dfail:0 fail:0 skip:14 time:419s fi-blb-e6850 total:279 pass:224 dwarn:1 dfail:0 fail:0 skip:54 time:358s fi-bsw-n3050 total:279 pass:243 dwarn:0 dfail:0 fail:0 skip:36 time:493s fi-bxt-j4205 total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:493s fi-byt-j1900 total:279 pass:254 dwarn:1 dfail:0 fail:0 skip:24 time:523s fi-byt-n2820 total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:514s fi-glk-2a total:279 pass:260 dwarn:0 dfail:0 fail:0 skip:19 time:582s fi-hsw-4770 total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:430s fi-hsw-4770r total:279 pass:263 dwarn:0 dfail:0 fail:0 skip:16 time:404s fi-ilk-650 total:279 pass:229 dwarn:0 dfail:0 fail:0 skip:50 time:420s fi-ivb-3520m total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:510s fi-ivb-3770 total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:474s fi-kbl-7500u total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:456s fi-kbl-7560u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:567s fi-kbl-r total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:573s fi-pnv-d510 total:279 pass:222 dwarn:2 dfail:0 fail:0 skip:55 time:574s fi-skl-6260u total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:448s fi-skl-6700k total:279 pass:261 dwarn:0 dfail:0 fail:0 skip:18 time:643s fi-skl-6770hq total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:460s fi-skl-gvtdvm total:279 pass:266 dwarn:0 dfail:0 fail:0 skip:13 time:429s fi-skl-x1585l total:279 pass:269 dwarn:0 dfail:0 fail:0 skip:10 time:494s fi-snb-2520m total:279 pass:251 dwarn:0 dfail:0 fail:0 skip:28 time:552s fi-snb-2600 total:279 pass:250 dwarn:0 dfail:0 fail:0 skip:29 time:405s e5a4f43e82fc892a23620bd1c998a2b308b59577 drm-tip: 2017y-08m-07d-21h-11m-22s UTC integration manifest c1aed93f0b00 drm/i915: More surgically unbreak the modeset vs reset deadlock 6944f025d0e6 drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit c764ffcb0986 drm/i915: Avoid the gpu reset vs. modeset deadlock == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5337/ _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-16 12:43 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-08 8:08 [PATCH 1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Daniel Vetter 2017-08-08 8:08 ` [PATCH 2/3] drm/i915: Push i915_sw_fence_wait into the nonblocking atomic commit Daniel Vetter 2017-08-08 8:08 ` [PATCH 3/3] drm/i915: More surgically unbreak the modeset vs reset deadlock Daniel Vetter 2017-08-14 14:13 ` Michel Thierry 2017-08-15 15:32 ` Chris Wilson 2017-08-16 12:43 ` Daniel Vetter 2017-08-08 8:31 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915: Avoid the gpu reset vs. modeset deadlock Patchwork
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.