All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Beware temporary wedging when determining -EIO
@ 2019-02-19 13:31 Chris Wilson
  2019-02-19 13:38 ` [PATCH v2] " Chris Wilson
                   ` (17 more replies)
  0 siblings, 18 replies; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 13:31 UTC (permalink / raw)
  To: intel-gfx

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously, we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset. If
we suspect this is the case, that is we see a wedged driver and reset in
progress, then do a round-trip back to userspace with an -EAGAIN until
the reset attempt is resolved.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     |  5 +++--
 drivers/gpu/drm/i915/i915_request.c |  5 +++--
 drivers/gpu/drm/i915/i915_reset.c   | 14 ++++++++++++++
 drivers/gpu/drm/i915/i915_reset.h   |  2 ++
 4 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..1a730a005d17 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
-		return -EIO;
+	ret = i915_reset_backoff(dev_priv);
+	if (ret)
+		return ret;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5ab4e1c01618..a0d91b24b12b 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -561,8 +561,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return ERR_PTR(-EIO);
+	ret = i915_reset_backoff(i915);
+	if (ret)
+		return ERR_PTR(ret);
 
 	/*
 	 * Pinning the contexts may generate requests in order to acquire
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 4df4c674223d..fa3eb2a840ff 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1334,6 +1334,20 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
 	srcu_read_unlock(&error->reset_backoff_srcu, tag);
 }
 
+int i915_reset_backoff(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+	unsigned long flags = READ_ONCE(error->flags);
+
+	if (likely(!test_bit(I915_WEDGED, &flags)))
+		return 0;
+
+	if (test_bit(I915_RESET_BACKOFF, &flags))
+		return -EAGAIN; /* reset still in progress; may recover */
+
+	return -EIO;
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 893c5d1c2eb8..2aeaad6114e9 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
 int __must_check i915_reset_trylock(struct drm_i915_private *i915);
 void i915_reset_unlock(struct drm_i915_private *i915, int tag);
 
+int i915_reset_backoff(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
@ 2019-02-19 13:38 ` Chris Wilson
  2019-02-19 13:43   ` Chris Wilson
  2019-02-19 14:01 ` [PATCH v3] " Chris Wilson
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 13:38 UTC (permalink / raw)
  To: intel-gfx

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously, we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset. If
we suspect this is the case, that is we see a wedged driver and reset in
progress, then wait until the reset is resolved before reporting upon
the wedged status.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
Why EAGAIN when we have a waitqueue?
---
 drivers/gpu/drm/i915/i915_gem.c     |  5 +++--
 drivers/gpu/drm/i915/i915_request.c |  5 +++--
 drivers/gpu/drm/i915/i915_reset.c   | 15 +++++++++++++++
 drivers/gpu/drm/i915/i915_reset.h   |  2 ++
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..1a730a005d17 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
-		return -EIO;
+	ret = i915_reset_backoff(dev_priv);
+	if (ret)
+		return ret;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5ab4e1c01618..a0d91b24b12b 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -561,8 +561,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return ERR_PTR(-EIO);
+	ret = i915_reset_backoff(i915);
+	if (ret)
+		return ERR_PTR(ret);
 
 	/*
 	 * Pinning the contexts may generate requests in order to acquire
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 4df4c674223d..3c74b828f5be 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1334,6 +1334,21 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
 	srcu_read_unlock(&error->reset_backoff_srcu, tag);
 }
 
+int i915_reset_backoff(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	if (likely(!test_bit(I915_WEDGED, &error->flags)))
+		return 0;
+
+	if (wait_event_interruptible(error->reset_queue,
+				     !test_bit(I915_RESET_BACKOFF,
+					       &error->flags)))
+		return -EINTR;
+
+	return test_bit(I915_WEDGED, &error->flags) ? -EIO : 0;
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 893c5d1c2eb8..2aeaad6114e9 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
 int __must_check i915_reset_trylock(struct drm_i915_private *i915);
 void i915_reset_unlock(struct drm_i915_private *i915, int tag);
 
+int i915_reset_backoff(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:38 ` [PATCH v2] " Chris Wilson
@ 2019-02-19 13:43   ` Chris Wilson
  2019-02-19 13:47     ` Mika Kuoppala
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 13:43 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2019-02-19 13:38:55)
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 4df4c674223d..3c74b828f5be 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -1334,6 +1334,21 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
>         srcu_read_unlock(&error->reset_backoff_srcu, tag);
>  }
>  
> +int i915_reset_backoff(struct drm_i915_private *i915)

Now this is just returning -EIO, we should rethink its name.

Maybe this should be the i915_terminally_wedged() and the existing
inline __i915_terminally_wedged(). Though less on the terminally part!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:43   ` Chris Wilson
@ 2019-02-19 13:47     ` Mika Kuoppala
  2019-02-19 13:51       ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Mika Kuoppala @ 2019-02-19 13:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Chris Wilson (2019-02-19 13:38:55)
>> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
>> index 4df4c674223d..3c74b828f5be 100644
>> --- a/drivers/gpu/drm/i915/i915_reset.c
>> +++ b/drivers/gpu/drm/i915/i915_reset.c
>> @@ -1334,6 +1334,21 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
>>         srcu_read_unlock(&error->reset_backoff_srcu, tag);
>>  }
>>  
>> +int i915_reset_backoff(struct drm_i915_private *i915)
>
> Now this is just returning -EIO, we should rethink its name.
>
> Maybe this should be the i915_terminally_wedged() and the existing
> inline __i915_terminally_wedged(). Though less on the terminally part!

Oh yes please. Lets rethink the terminology and fix it up starting
all the way up from debugs entrypoints =)

-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:47     ` Mika Kuoppala
@ 2019-02-19 13:51       ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 13:51 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-02-19 13:47:33)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Quoting Chris Wilson (2019-02-19 13:38:55)
> >> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> >> index 4df4c674223d..3c74b828f5be 100644
> >> --- a/drivers/gpu/drm/i915/i915_reset.c
> >> +++ b/drivers/gpu/drm/i915/i915_reset.c
> >> @@ -1334,6 +1334,21 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
> >>         srcu_read_unlock(&error->reset_backoff_srcu, tag);
> >>  }
> >>  
> >> +int i915_reset_backoff(struct drm_i915_private *i915)
> >
> > Now this is just returning -EIO, we should rethink its name.
> >
> > Maybe this should be the i915_terminally_wedged() and the existing
> > inline __i915_terminally_wedged(). Though less on the terminally part!
> 
> Oh yes please. Lets rethink the terminology and fix it up starting
> all the way up from debugs entrypoints =)

I'm keeping the debugfs joke alive for another decade at least!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
  2019-02-19 13:38 ` [PATCH v2] " Chris Wilson
@ 2019-02-19 14:01 ` Chris Wilson
  2019-02-19 14:11 ` ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev2) Patchwork
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 14:01 UTC (permalink / raw)
  To: intel-gfx

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously, we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset. If
we suspect this is the case, that is we see a wedged driver and reset in
progress, then wait until the reset is resolved before reporting upon
the wedged status.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
Rename all^some the things.
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  6 ++---
 drivers/gpu/drm/i915/i915_drv.h               |  7 +++++-
 drivers/gpu/drm/i915/i915_gem.c               | 19 ++++++++-------
 drivers/gpu/drm/i915/i915_request.c           |  5 ++--
 drivers/gpu/drm/i915/i915_reset.c             | 19 +++++++++++++--
 drivers/gpu/drm/i915/i915_reset.h             |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c        |  8 +++----
 drivers/gpu/drm/i915/intel_hangcheck.c        |  2 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  2 +-
 .../drm/i915/selftests/i915_gem_coherency.c   |  4 ++--
 .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_object.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  | 24 +++++++++----------
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
 .../drm/i915/selftests/intel_workarounds.c    |  4 ++--
 19 files changed, 68 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2aeea977283f..ffcc98842f25 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3833,9 +3833,7 @@ static const struct file_operations i915_cur_wm_latency_fops = {
 static int
 i915_wedged_get(void *data, u64 *val)
 {
-	struct drm_i915_private *dev_priv = data;
-
-	*val = i915_terminally_wedged(&dev_priv->gpu_error);
+	*val = i915_is_wedged(data);
 
 	return 0;
 }
@@ -3918,7 +3916,7 @@ i915_drop_caches_set(void *data, u64 val)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
+	if (val & DROP_RESET_ACTIVE && i915_is_wedged(i915))
 		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
 
 	fs_reclaim_acquire(GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5c8d0489a1cd..3354b2726ca9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3020,11 +3020,16 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
 
-static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
+static inline bool __i915_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
 }
 
+static inline bool i915_is_wedged(struct drm_i915_private *i915)
+{
+	return __i915_wedged(&i915->gpu_error);
+}
+
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
 	return READ_ONCE(error->reset_count);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..0810718cbeac 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1928,7 +1928,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		 * fail). But any other -EIO isn't ours (e.g. swap in failure)
 		 * and so needs to be reported.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error))
+		if (!i915_is_wedged(dev_priv))
 			return VM_FAULT_SIGBUS;
 		/* else: fall through */
 	case -EAGAIN:
@@ -2958,7 +2958,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return;
 
 	GEM_BUG_ON(i915->gt.active_requests);
@@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
-		return -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
+		return ret;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
@@ -4460,7 +4461,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 * back to defaults, recovering from whatever wedged state we left it
 	 * in and so worth trying to use the device once more.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		i915_gem_unset_wedged(i915);
 
 	/*
@@ -4504,7 +4505,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_is_wedged(i915)) {
 		ret = i915_gem_switch_to_kernel_context(i915);
 		if (ret)
 			goto err_unlock;
@@ -4625,7 +4626,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	return;
 
 err_wedged:
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_is_wedged(i915)) {
 		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
 		i915_gem_set_wedged(i915);
 	}
@@ -4731,7 +4732,7 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
-	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
+	if (i915_is_wedged(dev_priv)) {
 		ret = -EIO;
 		goto out;
 	}
@@ -5107,7 +5108,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
+		if (!i915_is_wedged(dev_priv)) {
 			i915_load_error(dev_priv,
 					"Failed to initialize GPU, declaring it wedged!\n");
 			i915_gem_set_wedged(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5ab4e1c01618..9bd252742913 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -561,8 +561,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return ERR_PTR(-EIO);
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ERR_PTR(ret);
 
 	/*
 	 * Pinning the contexts may generate requests in order to acquire
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 4df4c674223d..d65e3ecd526f 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1027,7 +1027,7 @@ void i915_reset(struct drm_i915_private *i915,
 
 finish:
 	reset_finish(i915);
-	if (!i915_terminally_wedged(error))
+	if (!__i915_wedged(error))
 		reset_restart(i915);
 	return;
 
@@ -1248,7 +1248,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
+	if (intel_has_reset_engine(i915) && !__i915_wedged(error)) {
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
@@ -1334,6 +1334,21 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
 	srcu_read_unlock(&error->reset_backoff_srcu, tag);
 }
 
+int i915_terminally_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	if (!__i915_wedged(error))
+		return 0;
+
+	if (wait_event_interruptible(error->reset_queue,
+				     !test_bit(I915_RESET_BACKOFF,
+					       &error->flags)))
+		return -EINTR;
+
+	return __i915_wedged(error) ? -EIO : 0;
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 893c5d1c2eb8..16f2389f656f 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
 int __must_check i915_reset_trylock(struct drm_i915_private *i915);
 void i915_reset_unlock(struct drm_i915_private *i915, int tag);
 
+int i915_terminally_wedged(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2547e2e51db8..b0f1a9256fd2 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1007,10 +1007,8 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
  */
 bool intel_engine_is_idle(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-
 	/* More white lies, if wedged, hw state is inconsistent */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(engine->i915))
 		return true;
 
 	/* Any inflight/incomplete requests? */
@@ -1054,7 +1052,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
 	 * If the driver is wedged, HW state may be very inconsistent and
 	 * report that it is still busy, even though we have stopped using it.
 	 */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(dev_priv))
 		return true;
 
 	for_each_engine(engine, dev_priv, id) {
@@ -1496,7 +1494,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		va_end(ap);
 	}
 
-	if (i915_terminally_wedged(&engine->i915->gpu_error))
+	if (i915_is_wedged(engine->i915))
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index a219c796e56d..9bfe9cb816cb 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -263,7 +263,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (!READ_ONCE(dev_priv->gt.awake))
 		return;
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(dev_priv))
 		return;
 
 	/* As enabling the GPU requires fairly extensive mmio access,
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index a9a2fa35876f..2cee1b4b3e48 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1764,7 +1764,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
 		return 0;
 	}
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(dev_priv))
 		return 0;
 
 	file = mock_file(dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 337b1f98b923..e67ed445d23e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -150,7 +150,7 @@ int i915_active_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_active_retire),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
index fd89a5a33c1a..81c761acc43e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
@@ -248,12 +248,12 @@ static bool always_valid(struct drm_i915_private *i915)
 
 static bool needs_fence_registers(struct drm_i915_private *i915)
 {
-	return !i915_terminally_wedged(&i915->gpu_error);
+	return !i915_is_wedged(i915);
 }
 
 static bool needs_mi_store_dword(struct drm_i915_private *i915)
 {
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return false;
 
 	return intel_engine_can_store_dword(i915->engine[RCS]);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index b7b97c57ad05..97ccd0f274b7 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1632,7 +1632,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_vm_isolation),
 	};
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(dev_priv))
 		return 0;
 
 	return i915_subtests(tests, dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 32dce7176f63..413d0c16c380 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -547,7 +547,7 @@ int i915_gem_evict_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_evict_contexts),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
index 395ae878e0f7..0d8e759fdbf8 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
@@ -583,7 +583,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
 	for (loop = 0; loop < 3; loop++) {
 		intel_wakeref_t wakeref;
 
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_is_wedged(i915))
 			break;
 
 		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 6733dc5b6b4c..e231cabf7f1a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1246,7 +1246,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index af66e3d4e23a..3bfc36c8bc67 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -29,5 +29,5 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 		i915_gem_set_wedged(i915);
 	}
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_is_wedged(i915) ? -EIO : 0;
 }
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index c32bc31192ae..e512d4406b92 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -342,7 +342,7 @@ static int igt_hang_sanitycheck(void *arg)
 			timeout = i915_request_wait(rq,
 						    I915_WAIT_LOCKED,
 						    MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_is_wedged(i915))
 			timeout = -EIO;
 
 		i915_request_put(rq);
@@ -383,7 +383,7 @@ static int igt_global_reset(void *arg)
 
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		err = -EIO;
 
 	return err;
@@ -401,13 +401,13 @@ static int igt_wedged_reset(void *arg)
 
 	i915_gem_set_wedged(i915);
 
-	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
+	GEM_BUG_ON(!i915_is_wedged(i915));
 	i915_reset(i915, ALL_ENGINES, NULL);
 
 	intel_runtime_pm_put(i915, wakeref);
 	igt_global_reset_unlock(i915);
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_is_wedged(i915) ? -EIO : 0;
 }
 
 static bool wait_for_idle(struct intel_engine_cs *engine)
@@ -529,7 +529,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		err = -EIO;
 
 	if (active) {
@@ -843,7 +843,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		err = -EIO;
 
 	if (flags & TEST_ACTIVE) {
@@ -969,7 +969,7 @@ static int igt_reset_wait(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return -EIO;
 
 	return err;
@@ -1166,7 +1166,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return -EIO;
 
 	return err;
@@ -1372,7 +1372,7 @@ static int igt_reset_queue(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return -EIO;
 
 	return err;
@@ -1552,7 +1552,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 			i915_request_wait(rq,
 					  I915_WAIT_LOCKED,
 					  MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_is_wedged(i915))
 			err = -EIO;
 	}
 
@@ -1591,7 +1591,7 @@ static int igt_atomic_reset(void *arg)
 
 	/* Flush any requests before we get started and check basics */
 	force_reset(i915);
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		goto unlock;
 
 	if (intel_has_gpu_reset(i915)) {
@@ -1665,7 +1665,7 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return -EIO; /* we're long past hope of a successful reset */
 
 	wakeref = intel_runtime_pm_get(i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 58144e024751..2a34db172903 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -895,7 +895,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 	if (!HAS_EXECLISTS(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index fb479a2c04fb..77b76ab868ee 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -181,7 +181,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
 	err = 0;
 	igt_wedge_on_timeout(&wedge, ctx->i915, HZ / 5) /* a safety net! */
 		err = i915_gem_object_set_to_cpu_domain(results, false);
-	if (i915_terminally_wedged(&ctx->i915->gpu_error))
+	if (i915_is_wedged(ctx->i915))
 		err = -EIO;
 	if (err)
 		goto out_put;
@@ -510,7 +510,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 	};
 	int err;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	mutex_lock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev2)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
  2019-02-19 13:38 ` [PATCH v2] " Chris Wilson
  2019-02-19 14:01 ` [PATCH v3] " Chris Wilson
@ 2019-02-19 14:11 ` Patchwork
  2019-02-19 14:15 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev3) Patchwork
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-19 14:11 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev2)
URL   : https://patchwork.freedesktop.org/series/56898/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5630 -> Patchwork_12255
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56898/revisions/2/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12255 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_pipe_crc_basic@read-crc-pipe-a:
    - fi-byt-clapper:     PASS -> FAIL [fdo#107362]

  * igt@prime_vgem@basic-fence-flip:
    - fi-ilk-650:         PASS -> FAIL [fdo#104008]

  
#### Possible fixes ####

  * igt@i915_module_load@reload:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-byt-clapper:     INCOMPLETE [fdo#102657] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102657]: https://bugs.freedesktop.org/show_bug.cgi?id=102657
  [fdo#104008]: https://bugs.freedesktop.org/show_bug.cgi?id=104008
  [fdo#105998]: https://bugs.freedesktop.org/show_bug.cgi?id=105998
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109294]: https://bugs.freedesktop.org/show_bug.cgi?id=109294
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109527]: https://bugs.freedesktop.org/show_bug.cgi?id=109527
  [fdo#109528]: https://bugs.freedesktop.org/show_bug.cgi?id=109528
  [fdo#109530]: https://bugs.freedesktop.org/show_bug.cgi?id=109530


Participating hosts (44 -> 39)
------------------------------

  Additional (2): fi-icl-y fi-bdw-5557u 
  Missing    (7): fi-kbl-soraka fi-kbl-7567u fi-ilk-m540 fi-byt-squawks fi-icl-u2 fi-bsw-cyan fi-skl-6600u 


Build changes
-------------

    * Linux: CI_DRM_5630 -> Patchwork_12255

  CI_DRM_5630: 82d591391bfcd9cfe2eeac149c49a678b571cd62 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4837: 368e76156f752e6ed6ac32ed9f400567aef7d3fc @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12255: c5a9314757f6d20e2600290395f03ef1048de849 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

c5a9314757f6 drm/i915: Beware temporary wedging when determining -EIO

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12255/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev3)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (2 preceding siblings ...)
  2019-02-19 14:11 ` ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev2) Patchwork
@ 2019-02-19 14:15 ` Patchwork
  2019-02-19 14:38 ` [PATCH v4] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-19 14:15 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev3)
URL   : https://patchwork.freedesktop.org/series/56898/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Beware temporary wedging when determining -EIO
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3572:16: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (3 preceding siblings ...)
  2019-02-19 14:15 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev3) Patchwork
@ 2019-02-19 14:38 ` Chris Wilson
  2019-02-19 14:40 ` ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev3) Patchwork
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 14:38 UTC (permalink / raw)
  To: intel-gfx

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously, we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset. If
we suspect this is the case, that is we see a wedged driver and reset in
progress, then wait until the reset is resolved before reporting upon
the wedged status.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
Ah! intel_reset_finish() can still hit struct_mutex so we can't simply
wait for the reset to complete while holding struct_mutex ourselves.
#$#@#$@
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  6 ++---
 drivers/gpu/drm/i915/i915_drv.h               |  7 ++++-
 drivers/gpu/drm/i915/i915_gem.c               | 19 ++++++-------
 drivers/gpu/drm/i915/i915_request.c           |  5 ++--
 drivers/gpu/drm/i915/i915_reset.c             | 27 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_reset.h             |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c        |  8 +++---
 drivers/gpu/drm/i915/intel_hangcheck.c        |  2 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  2 +-
 .../drm/i915/selftests/i915_gem_coherency.c   |  4 +--
 .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_object.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  | 24 ++++++++---------
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
 .../drm/i915/selftests/intel_workarounds.c    |  4 +--
 19 files changed, 76 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2aeea977283f..ffcc98842f25 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3833,9 +3833,7 @@ static const struct file_operations i915_cur_wm_latency_fops = {
 static int
 i915_wedged_get(void *data, u64 *val)
 {
-	struct drm_i915_private *dev_priv = data;
-
-	*val = i915_terminally_wedged(&dev_priv->gpu_error);
+	*val = i915_is_wedged(data);
 
 	return 0;
 }
@@ -3918,7 +3916,7 @@ i915_drop_caches_set(void *data, u64 val)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
+	if (val & DROP_RESET_ACTIVE && i915_is_wedged(i915))
 		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
 
 	fs_reclaim_acquire(GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5c8d0489a1cd..3354b2726ca9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3020,11 +3020,16 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
 
-static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
+static inline bool __i915_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
 }
 
+static inline bool i915_is_wedged(struct drm_i915_private *i915)
+{
+	return __i915_wedged(&i915->gpu_error);
+}
+
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
 	return READ_ONCE(error->reset_count);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..0810718cbeac 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1928,7 +1928,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		 * fail). But any other -EIO isn't ours (e.g. swap in failure)
 		 * and so needs to be reported.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error))
+		if (!i915_is_wedged(dev_priv))
 			return VM_FAULT_SIGBUS;
 		/* else: fall through */
 	case -EAGAIN:
@@ -2958,7 +2958,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return;
 
 	GEM_BUG_ON(i915->gt.active_requests);
@@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
-		return -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
+		return ret;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
@@ -4460,7 +4461,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 * back to defaults, recovering from whatever wedged state we left it
 	 * in and so worth trying to use the device once more.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		i915_gem_unset_wedged(i915);
 
 	/*
@@ -4504,7 +4505,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_is_wedged(i915)) {
 		ret = i915_gem_switch_to_kernel_context(i915);
 		if (ret)
 			goto err_unlock;
@@ -4625,7 +4626,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	return;
 
 err_wedged:
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_is_wedged(i915)) {
 		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
 		i915_gem_set_wedged(i915);
 	}
@@ -4731,7 +4732,7 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
-	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
+	if (i915_is_wedged(dev_priv)) {
 		ret = -EIO;
 		goto out;
 	}
@@ -5107,7 +5108,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
+		if (!i915_is_wedged(dev_priv)) {
 			i915_load_error(dev_priv,
 					"Failed to initialize GPU, declaring it wedged!\n");
 			i915_gem_set_wedged(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5ab4e1c01618..9bd252742913 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -561,8 +561,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return ERR_PTR(-EIO);
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ERR_PTR(ret);
 
 	/*
 	 * Pinning the contexts may generate requests in order to acquire
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 4df4c674223d..c474c9297333 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1027,7 +1027,7 @@ void i915_reset(struct drm_i915_private *i915,
 
 finish:
 	reset_finish(i915);
-	if (!i915_terminally_wedged(error))
+	if (!__i915_wedged(error))
 		reset_restart(i915);
 	return;
 
@@ -1248,7 +1248,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
+	if (intel_has_reset_engine(i915) && !__i915_wedged(error)) {
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
@@ -1334,6 +1334,29 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
 	srcu_read_unlock(&error->reset_backoff_srcu, tag);
 }
 
+int i915_terminally_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	if (!__i915_wedged(error))
+		return 0;
+
+	/* Reset still in progress? Maybe we will recover? */
+	if (!test_bit(I915_RESET_BACKOFF, &error->flags))
+		return -EIO;
+
+	/* XXX intel_reset_finish() still takes struct_mutex!!! */
+	if (mutex_is_locked(&i915->drm.struct_mutex))
+		return -EAGAIN;
+
+	if (wait_event_interruptible(error->reset_queue,
+				     !test_bit(I915_RESET_BACKOFF,
+					       &error->flags)))
+		return -EINTR;
+
+	return __i915_wedged(error) ? -EIO : 0;
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 893c5d1c2eb8..16f2389f656f 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
 int __must_check i915_reset_trylock(struct drm_i915_private *i915);
 void i915_reset_unlock(struct drm_i915_private *i915, int tag);
 
+int i915_terminally_wedged(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2547e2e51db8..b0f1a9256fd2 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1007,10 +1007,8 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
  */
 bool intel_engine_is_idle(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-
 	/* More white lies, if wedged, hw state is inconsistent */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(engine->i915))
 		return true;
 
 	/* Any inflight/incomplete requests? */
@@ -1054,7 +1052,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
 	 * If the driver is wedged, HW state may be very inconsistent and
 	 * report that it is still busy, even though we have stopped using it.
 	 */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(dev_priv))
 		return true;
 
 	for_each_engine(engine, dev_priv, id) {
@@ -1496,7 +1494,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		va_end(ap);
 	}
 
-	if (i915_terminally_wedged(&engine->i915->gpu_error))
+	if (i915_is_wedged(engine->i915))
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index a219c796e56d..9bfe9cb816cb 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -263,7 +263,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (!READ_ONCE(dev_priv->gt.awake))
 		return;
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(dev_priv))
 		return;
 
 	/* As enabling the GPU requires fairly extensive mmio access,
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index a9a2fa35876f..2cee1b4b3e48 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1764,7 +1764,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
 		return 0;
 	}
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(dev_priv))
 		return 0;
 
 	file = mock_file(dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 337b1f98b923..e67ed445d23e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -150,7 +150,7 @@ int i915_active_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_active_retire),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
index fd89a5a33c1a..81c761acc43e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
@@ -248,12 +248,12 @@ static bool always_valid(struct drm_i915_private *i915)
 
 static bool needs_fence_registers(struct drm_i915_private *i915)
 {
-	return !i915_terminally_wedged(&i915->gpu_error);
+	return !i915_is_wedged(i915);
 }
 
 static bool needs_mi_store_dword(struct drm_i915_private *i915)
 {
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return false;
 
 	return intel_engine_can_store_dword(i915->engine[RCS]);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index b7b97c57ad05..97ccd0f274b7 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1632,7 +1632,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_vm_isolation),
 	};
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_is_wedged(dev_priv))
 		return 0;
 
 	return i915_subtests(tests, dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 32dce7176f63..413d0c16c380 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -547,7 +547,7 @@ int i915_gem_evict_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_evict_contexts),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
index 395ae878e0f7..0d8e759fdbf8 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
@@ -583,7 +583,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
 	for (loop = 0; loop < 3; loop++) {
 		intel_wakeref_t wakeref;
 
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_is_wedged(i915))
 			break;
 
 		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 6733dc5b6b4c..e231cabf7f1a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1246,7 +1246,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index af66e3d4e23a..3bfc36c8bc67 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -29,5 +29,5 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 		i915_gem_set_wedged(i915);
 	}
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_is_wedged(i915) ? -EIO : 0;
 }
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index c32bc31192ae..e512d4406b92 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -342,7 +342,7 @@ static int igt_hang_sanitycheck(void *arg)
 			timeout = i915_request_wait(rq,
 						    I915_WAIT_LOCKED,
 						    MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_is_wedged(i915))
 			timeout = -EIO;
 
 		i915_request_put(rq);
@@ -383,7 +383,7 @@ static int igt_global_reset(void *arg)
 
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		err = -EIO;
 
 	return err;
@@ -401,13 +401,13 @@ static int igt_wedged_reset(void *arg)
 
 	i915_gem_set_wedged(i915);
 
-	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
+	GEM_BUG_ON(!i915_is_wedged(i915));
 	i915_reset(i915, ALL_ENGINES, NULL);
 
 	intel_runtime_pm_put(i915, wakeref);
 	igt_global_reset_unlock(i915);
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_is_wedged(i915) ? -EIO : 0;
 }
 
 static bool wait_for_idle(struct intel_engine_cs *engine)
@@ -529,7 +529,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		err = -EIO;
 
 	if (active) {
@@ -843,7 +843,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		err = -EIO;
 
 	if (flags & TEST_ACTIVE) {
@@ -969,7 +969,7 @@ static int igt_reset_wait(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return -EIO;
 
 	return err;
@@ -1166,7 +1166,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return -EIO;
 
 	return err;
@@ -1372,7 +1372,7 @@ static int igt_reset_queue(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return -EIO;
 
 	return err;
@@ -1552,7 +1552,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 			i915_request_wait(rq,
 					  I915_WAIT_LOCKED,
 					  MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_is_wedged(i915))
 			err = -EIO;
 	}
 
@@ -1591,7 +1591,7 @@ static int igt_atomic_reset(void *arg)
 
 	/* Flush any requests before we get started and check basics */
 	force_reset(i915);
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		goto unlock;
 
 	if (intel_has_gpu_reset(i915)) {
@@ -1665,7 +1665,7 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return -EIO; /* we're long past hope of a successful reset */
 
 	wakeref = intel_runtime_pm_get(i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 58144e024751..2a34db172903 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -895,7 +895,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 	if (!HAS_EXECLISTS(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index fb479a2c04fb..77b76ab868ee 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -181,7 +181,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
 	err = 0;
 	igt_wedge_on_timeout(&wedge, ctx->i915, HZ / 5) /* a safety net! */
 		err = i915_gem_object_set_to_cpu_domain(results, false);
-	if (i915_terminally_wedged(&ctx->i915->gpu_error))
+	if (i915_is_wedged(ctx->i915))
 		err = -EIO;
 	if (err)
 		goto out_put;
@@ -510,7 +510,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 	};
 	int err;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_is_wedged(i915))
 		return 0;
 
 	mutex_lock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev3)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (4 preceding siblings ...)
  2019-02-19 14:38 ` [PATCH v4] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
@ 2019-02-19 14:40 ` Patchwork
  2019-02-19 15:14 ` [PATCH v5] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-19 14:40 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev3)
URL   : https://patchwork.freedesktop.org/series/56898/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5630 -> Patchwork_12256
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56898/revisions/3/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12256 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

  * igt@prime_vgem@basic-fence-flip:
    - fi-skl-guc:         PASS -> FAIL [fdo#104008]

  
#### Possible fixes ####

  * igt@i915_module_load@reload:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  
#### Warnings ####

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-byt-clapper:     INCOMPLETE [fdo#102657] -> FAIL [fdo#103191] / [fdo#107362]

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102657]: https://bugs.freedesktop.org/show_bug.cgi?id=102657
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#104008]: https://bugs.freedesktop.org/show_bug.cgi?id=104008
  [fdo#105998]: https://bugs.freedesktop.org/show_bug.cgi?id=105998
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109294]: https://bugs.freedesktop.org/show_bug.cgi?id=109294
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109527]: https://bugs.freedesktop.org/show_bug.cgi?id=109527
  [fdo#109528]: https://bugs.freedesktop.org/show_bug.cgi?id=109528
  [fdo#109530]: https://bugs.freedesktop.org/show_bug.cgi?id=109530


Participating hosts (44 -> 42)
------------------------------

  Additional (2): fi-icl-y fi-bdw-5557u 
  Missing    (4): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan 


Build changes
-------------

    * Linux: CI_DRM_5630 -> Patchwork_12256

  CI_DRM_5630: 82d591391bfcd9cfe2eeac149c49a678b571cd62 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4837: 368e76156f752e6ed6ac32ed9f400567aef7d3fc @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12256: e9c7e90e15d36021e4ae12b4440e6f9bc3cd3a3c @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

e9c7e90e15d3 drm/i915: Beware temporary wedging when determining -EIO

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12256/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (5 preceding siblings ...)
  2019-02-19 14:40 ` ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev3) Patchwork
@ 2019-02-19 15:14 ` Chris Wilson
  2019-02-19 15:34 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev5) Patchwork
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 15:14 UTC (permalink / raw)
  To: intel-gfx

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously, we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset. If
we suspect this is the case, that is we see a wedged driver and reset in
progress, then wait until the reset is resolved before reporting upon
the wedged status.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
Rename s/i915_is_wedged/i915_reset_failed/ and reconsider all callers.
If it looks like an ABI, it gets i915_terminally_wedged() otherwise use
i915_reset_failed() as befits.
---
 drivers/gpu/drm/i915/i915_debugfs.c           | 12 ++++++---
 drivers/gpu/drm/i915/i915_drv.h               |  7 ++++-
 drivers/gpu/drm/i915/i915_gem.c               | 22 +++++++--------
 drivers/gpu/drm/i915/i915_request.c           |  5 ++--
 drivers/gpu/drm/i915/i915_reset.c             | 27 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_reset.h             |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c        |  8 +++---
 drivers/gpu/drm/i915/intel_hangcheck.c        |  2 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  2 +-
 .../drm/i915/selftests/i915_gem_coherency.c   |  4 +--
 .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_object.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  | 25 +++++++++--------
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
 .../drm/i915/selftests/intel_workarounds.c    |  4 +--
 19 files changed, 83 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2aeea977283f..8cee9e4fc31b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3833,11 +3833,15 @@ static const struct file_operations i915_cur_wm_latency_fops = {
 static int
 i915_wedged_get(void *data, u64 *val)
 {
-	struct drm_i915_private *dev_priv = data;
+	int ret;
 
-	*val = i915_terminally_wedged(&dev_priv->gpu_error);
+	ret = i915_terminally_wedged(data);
+	if (ret == -EIO) {
+		*val = 1;
+		ret = 0;
+	}
 
-	return 0;
+	return ret;
 }
 
 static int
@@ -3918,7 +3922,7 @@ i915_drop_caches_set(void *data, u64 val)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
+	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(i915))
 		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
 
 	fs_reclaim_acquire(GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5c8d0489a1cd..14c6f5e65b8d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3020,11 +3020,16 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
 
-static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
+static inline bool __i915_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
 }
 
+static inline bool i915_reset_failed(struct drm_i915_private *i915)
+{
+	return __i915_wedged(&i915->gpu_error);
+}
+
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
 	return READ_ONCE(error->reset_count);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..b3d918d90c1f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1928,7 +1928,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		 * fail). But any other -EIO isn't ours (e.g. swap in failure)
 		 * and so needs to be reported.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error))
+		if (!i915_terminally_wedged(dev_priv))
 			return VM_FAULT_SIGBUS;
 		/* else: fall through */
 	case -EAGAIN:
@@ -2958,7 +2958,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return;
 
 	GEM_BUG_ON(i915->gt.active_requests);
@@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
-		return -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
+		return ret;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
@@ -4460,7 +4461,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 * back to defaults, recovering from whatever wedged state we left it
 	 * in and so worth trying to use the device once more.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		i915_gem_unset_wedged(i915);
 
 	/*
@@ -4504,7 +4505,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_reset_failed(i915)) {
 		ret = i915_gem_switch_to_kernel_context(i915);
 		if (ret)
 			goto err_unlock;
@@ -4625,7 +4626,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	return;
 
 err_wedged:
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_terminally_wedged(i915)) {
 		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
 		i915_gem_set_wedged(i915);
 	}
@@ -4731,10 +4732,9 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
-	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
-		ret = -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
 		goto out;
-	}
 
 	ret = i915_ppgtt_init_hw(dev_priv);
 	if (ret) {
@@ -5107,7 +5107,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
+		if (!i915_reset_failed(dev_priv)) {
 			i915_load_error(dev_priv,
 					"Failed to initialize GPU, declaring it wedged!\n");
 			i915_gem_set_wedged(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index a61e3a4fc9dc..124b3e279c88 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -559,8 +559,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return ERR_PTR(-EIO);
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ERR_PTR(ret);
 
 	/*
 	 * Pinning the contexts may generate requests in order to acquire
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index c234feb5fdf5..0853cdcdce99 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1032,7 +1032,7 @@ void i915_reset(struct drm_i915_private *i915,
 
 finish:
 	reset_finish(i915);
-	if (!i915_terminally_wedged(error))
+	if (!__i915_wedged(error))
 		reset_restart(i915);
 	return;
 
@@ -1253,7 +1253,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
+	if (intel_has_reset_engine(i915) && !__i915_wedged(error)) {
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
@@ -1339,6 +1339,29 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
 	srcu_read_unlock(&error->reset_backoff_srcu, tag);
 }
 
+int i915_terminally_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	if (!__i915_wedged(error))
+		return 0;
+
+	/* Reset still in progress? Maybe we will recover? */
+	if (!test_bit(I915_RESET_BACKOFF, &error->flags))
+		return -EIO;
+
+	/* XXX intel_reset_finish() still takes struct_mutex!!! */
+	if (mutex_is_locked(&i915->drm.struct_mutex))
+		return -EAGAIN;
+
+	if (wait_event_interruptible(error->reset_queue,
+				     !test_bit(I915_RESET_BACKOFF,
+					       &error->flags)))
+		return -EINTR;
+
+	return __i915_wedged(error) ? -EIO : 0;
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 893c5d1c2eb8..16f2389f656f 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
 int __must_check i915_reset_trylock(struct drm_i915_private *i915);
 void i915_reset_unlock(struct drm_i915_private *i915, int tag);
 
+int i915_terminally_wedged(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2547e2e51db8..81b80f8fd9ea 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1007,10 +1007,8 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
  */
 bool intel_engine_is_idle(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-
 	/* More white lies, if wedged, hw state is inconsistent */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_reset_failed(engine->i915))
 		return true;
 
 	/* Any inflight/incomplete requests? */
@@ -1054,7 +1052,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
 	 * If the driver is wedged, HW state may be very inconsistent and
 	 * report that it is still busy, even though we have stopped using it.
 	 */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_reset_failed(dev_priv))
 		return true;
 
 	for_each_engine(engine, dev_priv, id) {
@@ -1496,7 +1494,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		va_end(ap);
 	}
 
-	if (i915_terminally_wedged(&engine->i915->gpu_error))
+	if (i915_reset_failed(engine->i915))
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index a219c796e56d..9be033b6f4d2 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -263,7 +263,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (!READ_ONCE(dev_priv->gt.awake))
 		return;
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return;
 
 	/* As enabling the GPU requires fairly extensive mmio access,
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index a9a2fa35876f..40607ba7dda6 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1764,7 +1764,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
 		return 0;
 	}
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return 0;
 
 	file = mock_file(dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 337b1f98b923..27d8f853111b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -150,7 +150,7 @@ int i915_active_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_active_retire),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
index fd89a5a33c1a..9c16aff87028 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
@@ -248,12 +248,12 @@ static bool always_valid(struct drm_i915_private *i915)
 
 static bool needs_fence_registers(struct drm_i915_private *i915)
 {
-	return !i915_terminally_wedged(&i915->gpu_error);
+	return !i915_terminally_wedged(i915);
 }
 
 static bool needs_mi_store_dword(struct drm_i915_private *i915)
 {
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return false;
 
 	return intel_engine_can_store_dword(i915->engine[RCS]);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index b7b97c57ad05..3a238b9628b3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1632,7 +1632,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_vm_isolation),
 	};
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return 0;
 
 	return i915_subtests(tests, dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 32dce7176f63..b270eab1cad1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -547,7 +547,7 @@ int i915_gem_evict_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_evict_contexts),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
index 395ae878e0f7..784982aed625 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
@@ -583,7 +583,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
 	for (loop = 0; loop < 3; loop++) {
 		intel_wakeref_t wakeref;
 
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_terminally_wedged(i915))
 			break;
 
 		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 6733dc5b6b4c..03cc8ab6a620 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1246,7 +1246,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index af66e3d4e23a..e0d3122fd35a 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -29,5 +29,5 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 		i915_gem_set_wedged(i915);
 	}
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_terminally_wedged(i915);
 }
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index c32bc31192ae..f0822757f3f6 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -342,7 +342,7 @@ static int igt_hang_sanitycheck(void *arg)
 			timeout = i915_request_wait(rq,
 						    I915_WAIT_LOCKED,
 						    MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_terminally_wedged(i915))
 			timeout = -EIO;
 
 		i915_request_put(rq);
@@ -383,7 +383,7 @@ static int igt_global_reset(void *arg)
 
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	return err;
@@ -401,13 +401,13 @@ static int igt_wedged_reset(void *arg)
 
 	i915_gem_set_wedged(i915);
 
-	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
+	GEM_BUG_ON(!i915_terminally_wedged(i915));
 	i915_reset(i915, ALL_ENGINES, NULL);
 
 	intel_runtime_pm_put(i915, wakeref);
 	igt_global_reset_unlock(i915);
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_reset_failed(i915) ? -EIO : 0;
 }
 
 static bool wait_for_idle(struct intel_engine_cs *engine)
@@ -529,7 +529,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	if (active) {
@@ -843,7 +843,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	if (flags & TEST_ACTIVE) {
@@ -969,7 +969,7 @@ static int igt_reset_wait(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1166,7 +1166,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1372,7 +1372,7 @@ static int igt_reset_queue(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1552,8 +1552,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 			i915_request_wait(rq,
 					  I915_WAIT_LOCKED,
 					  MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
-			err = -EIO;
+		err = i915_terminally_wedged(i915);
 	}
 
 	i915_request_put(rq);
@@ -1591,7 +1590,7 @@ static int igt_atomic_reset(void *arg)
 
 	/* Flush any requests before we get started and check basics */
 	force_reset(i915);
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		goto unlock;
 
 	if (intel_has_gpu_reset(i915)) {
@@ -1665,7 +1664,7 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return -EIO; /* we're long past hope of a successful reset */
 
 	wakeref = intel_runtime_pm_get(i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 58144e024751..0f7a5bf69646 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -895,7 +895,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 	if (!HAS_EXECLISTS(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index fb479a2c04fb..e6ffc8ac22dc 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -181,7 +181,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
 	err = 0;
 	igt_wedge_on_timeout(&wedge, ctx->i915, HZ / 5) /* a safety net! */
 		err = i915_gem_object_set_to_cpu_domain(results, false);
-	if (i915_terminally_wedged(&ctx->i915->gpu_error))
+	if (i915_terminally_wedged(ctx->i915))
 		err = -EIO;
 	if (err)
 		goto out_put;
@@ -510,7 +510,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 	};
 	int err;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	mutex_lock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev5)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (6 preceding siblings ...)
  2019-02-19 15:14 ` [PATCH v5] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
@ 2019-02-19 15:34 ` Patchwork
  2019-02-19 16:03 ` ✗ Fi.CI.BAT: failure " Patchwork
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-19 15:34 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev5)
URL   : https://patchwork.freedesktop.org/series/56898/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Beware temporary wedging when determining -EIO
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3572:16: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915: Beware temporary wedging when determining -EIO (rev5)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (7 preceding siblings ...)
  2019-02-19 15:34 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev5) Patchwork
@ 2019-02-19 16:03 ` Patchwork
  2019-02-19 16:06   ` Chris Wilson
  2019-02-19 16:08 ` [PATCH v6] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Patchwork @ 2019-02-19 16:03 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev5)
URL   : https://patchwork.freedesktop.org/series/56898/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5631 -> Patchwork_12258
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12258 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12258, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56898/revisions/5/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12258:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live_hangcheck:
    - fi-bdw-gvtdvm:      PASS -> TIMEOUT
    - fi-bwr-2160:        PASS -> TIMEOUT
    - fi-snb-2520m:       PASS -> TIMEOUT
    - fi-elk-e7500:       PASS -> TIMEOUT
    - fi-skl-gvtdvm:      PASS -> TIMEOUT
    - fi-bxt-j4205:       PASS -> TIMEOUT
    - fi-kbl-7500u:       PASS -> TIMEOUT
    - fi-bsw-kefka:       PASS -> TIMEOUT
    - fi-gdg-551:         PASS -> TIMEOUT
    - fi-byt-clapper:     PASS -> TIMEOUT
    - fi-bsw-n3050:       PASS -> TIMEOUT
    - fi-hsw-4770:        PASS -> TIMEOUT
    - fi-skl-6260u:       PASS -> TIMEOUT
    - fi-hsw-peppy:       PASS -> TIMEOUT

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-a:
    - fi-hsw-peppy:       PASS -> FAIL +2

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-b:
    - fi-bwr-2160:        PASS -> FAIL

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live_hangcheck:
    - {fi-icl-u3}:        PASS -> TIMEOUT

  
Known issues
------------

  Here are the changes found in Patchwork_12258 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_module_load@reload:
    - fi-blb-e6850:       PASS -> INCOMPLETE [fdo#107718]

  * igt@kms_busy@basic-flip-a:
    - fi-gdg-551:         PASS -> FAIL [fdo#103182]

  
#### Possible fixes ####

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-b:
    - fi-byt-clapper:     FAIL [fdo#103191] / [fdo#107362] -> PASS +1

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718


Participating hosts (45 -> 40)
------------------------------

  Missing    (5): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-pnv-d510 


Build changes
-------------

    * Linux: CI_DRM_5631 -> Patchwork_12258

  CI_DRM_5631: bf9a463be5b19fcc0a6d768e5da1f95d20d3a4a3 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4837: 368e76156f752e6ed6ac32ed9f400567aef7d3fc @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12258: 31342d6c1fffe68544a3379a097862bc00be3cc2 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

31342d6c1fff drm/i915: Beware temporary wedging when determining -EIO

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12258/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for drm/i915: Beware temporary wedging when determining -EIO (rev5)
  2019-02-19 16:03 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2019-02-19 16:06   ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 16:06 UTC (permalink / raw)
  To: Patchwork, intel-gfx

Quoting Patchwork (2019-02-19 16:03:30)
> == Series Details ==
> 
> Series: drm/i915: Beware temporary wedging when determining -EIO (rev5)
> URL   : https://patchwork.freedesktop.org/series/56898/
> State : failure
> 
> == Summary ==
> 
> CI Bug Log - changes from CI_DRM_5631 -> Patchwork_12258
> ====================================================
> 
> Summary
> -------
> 
>   **FAILURE**
> 
>   Serious unknown changes coming with Patchwork_12258 absolutely need to be
>   verified manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_12258, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   External URL: https://patchwork.freedesktop.org/api/1.0/series/56898/revisions/5/mbox/
> 
> Possible new issues
> -------------------
> 
>   Here are the unknown changes that may have been introduced in Patchwork_12258:
> 
> ### IGT changes ###
> 
> #### Possible regressions ####
> 
>   * igt@i915_selftest@live_hangcheck:
>     - fi-bdw-gvtdvm:      PASS -> TIMEOUT
>     - fi-bwr-2160:        PASS -> TIMEOUT
>     - fi-snb-2520m:       PASS -> TIMEOUT
>     - fi-elk-e7500:       PASS -> TIMEOUT
>     - fi-skl-gvtdvm:      PASS -> TIMEOUT
>     - fi-bxt-j4205:       PASS -> TIMEOUT
>     - fi-kbl-7500u:       PASS -> TIMEOUT
>     - fi-bsw-kefka:       PASS -> TIMEOUT
>     - fi-gdg-551:         PASS -> TIMEOUT
>     - fi-byt-clapper:     PASS -> TIMEOUT
>     - fi-bsw-n3050:       PASS -> TIMEOUT
>     - fi-hsw-4770:        PASS -> TIMEOUT
>     - fi-skl-6260u:       PASS -> TIMEOUT
>     - fi-hsw-peppy:       PASS -> TIMEOUT

Morale of the story: don't wait while faking it.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v6] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (8 preceding siblings ...)
  2019-02-19 16:03 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2019-02-19 16:08 ` Chris Wilson
  2019-02-19 16:18 ` [PATCH v7] " Chris Wilson
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 16:08 UTC (permalink / raw)
  To: intel-gfx

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously, we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset. If
we suspect this is the case, that is we see a wedged driver and reset in
progress, then wait until the reset is resolved before reporting upon
the wedged status.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c           | 12 ++++++---
 drivers/gpu/drm/i915/i915_drv.h               |  7 ++++-
 drivers/gpu/drm/i915/i915_gem.c               | 22 +++++++--------
 drivers/gpu/drm/i915/i915_request.c           |  5 ++--
 drivers/gpu/drm/i915/i915_reset.c             | 27 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_reset.h             |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c        |  8 +++---
 drivers/gpu/drm/i915/intel_hangcheck.c        |  2 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  2 +-
 .../drm/i915/selftests/i915_gem_coherency.c   |  4 +--
 .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_object.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  | 24 ++++++++---------
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
 .../drm/i915/selftests/intel_workarounds.c    |  4 +--
 19 files changed, 83 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2aeea977283f..8cee9e4fc31b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3833,11 +3833,15 @@ static const struct file_operations i915_cur_wm_latency_fops = {
 static int
 i915_wedged_get(void *data, u64 *val)
 {
-	struct drm_i915_private *dev_priv = data;
+	int ret;
 
-	*val = i915_terminally_wedged(&dev_priv->gpu_error);
+	ret = i915_terminally_wedged(data);
+	if (ret == -EIO) {
+		*val = 1;
+		ret = 0;
+	}
 
-	return 0;
+	return ret;
 }
 
 static int
@@ -3918,7 +3922,7 @@ i915_drop_caches_set(void *data, u64 val)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
+	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(i915))
 		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
 
 	fs_reclaim_acquire(GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5c8d0489a1cd..14c6f5e65b8d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3020,11 +3020,16 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
 
-static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
+static inline bool __i915_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
 }
 
+static inline bool i915_reset_failed(struct drm_i915_private *i915)
+{
+	return __i915_wedged(&i915->gpu_error);
+}
+
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
 	return READ_ONCE(error->reset_count);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..b3d918d90c1f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1928,7 +1928,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		 * fail). But any other -EIO isn't ours (e.g. swap in failure)
 		 * and so needs to be reported.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error))
+		if (!i915_terminally_wedged(dev_priv))
 			return VM_FAULT_SIGBUS;
 		/* else: fall through */
 	case -EAGAIN:
@@ -2958,7 +2958,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return;
 
 	GEM_BUG_ON(i915->gt.active_requests);
@@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
-		return -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
+		return ret;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
@@ -4460,7 +4461,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 * back to defaults, recovering from whatever wedged state we left it
 	 * in and so worth trying to use the device once more.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		i915_gem_unset_wedged(i915);
 
 	/*
@@ -4504,7 +4505,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_reset_failed(i915)) {
 		ret = i915_gem_switch_to_kernel_context(i915);
 		if (ret)
 			goto err_unlock;
@@ -4625,7 +4626,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	return;
 
 err_wedged:
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_terminally_wedged(i915)) {
 		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
 		i915_gem_set_wedged(i915);
 	}
@@ -4731,10 +4732,9 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
-	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
-		ret = -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
 		goto out;
-	}
 
 	ret = i915_ppgtt_init_hw(dev_priv);
 	if (ret) {
@@ -5107,7 +5107,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
+		if (!i915_reset_failed(dev_priv)) {
 			i915_load_error(dev_priv,
 					"Failed to initialize GPU, declaring it wedged!\n");
 			i915_gem_set_wedged(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index a61e3a4fc9dc..124b3e279c88 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -559,8 +559,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return ERR_PTR(-EIO);
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ERR_PTR(ret);
 
 	/*
 	 * Pinning the contexts may generate requests in order to acquire
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index c234feb5fdf5..0853cdcdce99 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1032,7 +1032,7 @@ void i915_reset(struct drm_i915_private *i915,
 
 finish:
 	reset_finish(i915);
-	if (!i915_terminally_wedged(error))
+	if (!__i915_wedged(error))
 		reset_restart(i915);
 	return;
 
@@ -1253,7 +1253,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
+	if (intel_has_reset_engine(i915) && !__i915_wedged(error)) {
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
@@ -1339,6 +1339,29 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
 	srcu_read_unlock(&error->reset_backoff_srcu, tag);
 }
 
+int i915_terminally_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	if (!__i915_wedged(error))
+		return 0;
+
+	/* Reset still in progress? Maybe we will recover? */
+	if (!test_bit(I915_RESET_BACKOFF, &error->flags))
+		return -EIO;
+
+	/* XXX intel_reset_finish() still takes struct_mutex!!! */
+	if (mutex_is_locked(&i915->drm.struct_mutex))
+		return -EAGAIN;
+
+	if (wait_event_interruptible(error->reset_queue,
+				     !test_bit(I915_RESET_BACKOFF,
+					       &error->flags)))
+		return -EINTR;
+
+	return __i915_wedged(error) ? -EIO : 0;
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 893c5d1c2eb8..16f2389f656f 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
 int __must_check i915_reset_trylock(struct drm_i915_private *i915);
 void i915_reset_unlock(struct drm_i915_private *i915, int tag);
 
+int i915_terminally_wedged(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2547e2e51db8..81b80f8fd9ea 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1007,10 +1007,8 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
  */
 bool intel_engine_is_idle(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-
 	/* More white lies, if wedged, hw state is inconsistent */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_reset_failed(engine->i915))
 		return true;
 
 	/* Any inflight/incomplete requests? */
@@ -1054,7 +1052,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
 	 * If the driver is wedged, HW state may be very inconsistent and
 	 * report that it is still busy, even though we have stopped using it.
 	 */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_reset_failed(dev_priv))
 		return true;
 
 	for_each_engine(engine, dev_priv, id) {
@@ -1496,7 +1494,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		va_end(ap);
 	}
 
-	if (i915_terminally_wedged(&engine->i915->gpu_error))
+	if (i915_reset_failed(engine->i915))
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index a219c796e56d..9be033b6f4d2 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -263,7 +263,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (!READ_ONCE(dev_priv->gt.awake))
 		return;
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return;
 
 	/* As enabling the GPU requires fairly extensive mmio access,
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index a9a2fa35876f..40607ba7dda6 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1764,7 +1764,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
 		return 0;
 	}
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return 0;
 
 	file = mock_file(dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 337b1f98b923..27d8f853111b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -150,7 +150,7 @@ int i915_active_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_active_retire),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
index fd89a5a33c1a..9c16aff87028 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
@@ -248,12 +248,12 @@ static bool always_valid(struct drm_i915_private *i915)
 
 static bool needs_fence_registers(struct drm_i915_private *i915)
 {
-	return !i915_terminally_wedged(&i915->gpu_error);
+	return !i915_terminally_wedged(i915);
 }
 
 static bool needs_mi_store_dword(struct drm_i915_private *i915)
 {
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return false;
 
 	return intel_engine_can_store_dword(i915->engine[RCS]);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index b7b97c57ad05..3a238b9628b3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1632,7 +1632,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_vm_isolation),
 	};
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return 0;
 
 	return i915_subtests(tests, dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 32dce7176f63..b270eab1cad1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -547,7 +547,7 @@ int i915_gem_evict_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_evict_contexts),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
index 395ae878e0f7..784982aed625 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
@@ -583,7 +583,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
 	for (loop = 0; loop < 3; loop++) {
 		intel_wakeref_t wakeref;
 
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_terminally_wedged(i915))
 			break;
 
 		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 6733dc5b6b4c..03cc8ab6a620 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1246,7 +1246,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index af66e3d4e23a..e0d3122fd35a 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -29,5 +29,5 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 		i915_gem_set_wedged(i915);
 	}
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_terminally_wedged(i915);
 }
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index c32bc31192ae..a77e4bf1ab55 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -342,7 +342,7 @@ static int igt_hang_sanitycheck(void *arg)
 			timeout = i915_request_wait(rq,
 						    I915_WAIT_LOCKED,
 						    MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_reset_failed(i915))
 			timeout = -EIO;
 
 		i915_request_put(rq);
@@ -383,7 +383,7 @@ static int igt_global_reset(void *arg)
 
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	return err;
@@ -401,13 +401,13 @@ static int igt_wedged_reset(void *arg)
 
 	i915_gem_set_wedged(i915);
 
-	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
+	GEM_BUG_ON(!i915_reset_failed(i915));
 	i915_reset(i915, ALL_ENGINES, NULL);
 
 	intel_runtime_pm_put(i915, wakeref);
 	igt_global_reset_unlock(i915);
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_reset_failed(i915) ? -EIO : 0;
 }
 
 static bool wait_for_idle(struct intel_engine_cs *engine)
@@ -529,7 +529,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	if (active) {
@@ -843,7 +843,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	if (flags & TEST_ACTIVE) {
@@ -969,7 +969,7 @@ static int igt_reset_wait(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1166,7 +1166,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1372,7 +1372,7 @@ static int igt_reset_queue(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1552,7 +1552,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 			i915_request_wait(rq,
 					  I915_WAIT_LOCKED,
 					  MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_reset_failed(i915))
 			err = -EIO;
 	}
 
@@ -1591,7 +1591,7 @@ static int igt_atomic_reset(void *arg)
 
 	/* Flush any requests before we get started and check basics */
 	force_reset(i915);
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		goto unlock;
 
 	if (intel_has_gpu_reset(i915)) {
@@ -1665,7 +1665,7 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return -EIO; /* we're long past hope of a successful reset */
 
 	wakeref = intel_runtime_pm_get(i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 58144e024751..0f7a5bf69646 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -895,7 +895,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 	if (!HAS_EXECLISTS(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index fb479a2c04fb..e6ffc8ac22dc 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -181,7 +181,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
 	err = 0;
 	igt_wedge_on_timeout(&wedge, ctx->i915, HZ / 5) /* a safety net! */
 		err = i915_gem_object_set_to_cpu_domain(results, false);
-	if (i915_terminally_wedged(&ctx->i915->gpu_error))
+	if (i915_terminally_wedged(ctx->i915))
 		err = -EIO;
 	if (err)
 		goto out_put;
@@ -510,7 +510,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 	};
 	int err;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	mutex_lock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v7] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (9 preceding siblings ...)
  2019-02-19 16:08 ` [PATCH v6] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
@ 2019-02-19 16:18 ` Chris Wilson
  2019-02-20 14:44   ` Mika Kuoppala
  2019-02-19 16:28 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev7) Patchwork
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2019-02-19 16:18 UTC (permalink / raw)
  To: intel-gfx

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously, we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset. If
we suspect this is the case, that is we see a wedged driver and reset in
progress, then wait until the reset is resolved before reporting upon
the wedged status.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c           | 16 ++++++++---
 drivers/gpu/drm/i915/i915_drv.h               |  7 ++++-
 drivers/gpu/drm/i915/i915_gem.c               | 22 +++++++--------
 drivers/gpu/drm/i915/i915_request.c           |  5 ++--
 drivers/gpu/drm/i915/i915_reset.c             | 27 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_reset.h             |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c        |  8 +++---
 drivers/gpu/drm/i915/intel_hangcheck.c        |  2 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  2 +-
 .../drm/i915/selftests/i915_gem_coherency.c   |  4 +--
 .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_object.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  | 24 ++++++++---------
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
 .../drm/i915/selftests/intel_workarounds.c    |  4 +--
 19 files changed, 87 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2aeea977283f..54928d693c85 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3833,11 +3833,19 @@ static const struct file_operations i915_cur_wm_latency_fops = {
 static int
 i915_wedged_get(void *data, u64 *val)
 {
-	struct drm_i915_private *dev_priv = data;
+	int ret = i915_terminally_wedged(data);
 
-	*val = i915_terminally_wedged(&dev_priv->gpu_error);
+	switch (ret) {
+	case -EIO:
+		*val = 1;
+		ret = 0;
+		break;
+	case 0:
+		*val = 0;
+		break;
+	}
 
-	return 0;
+	return ret;
 }
 
 static int
@@ -3918,7 +3926,7 @@ i915_drop_caches_set(void *data, u64 val)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
+	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(i915))
 		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
 
 	fs_reclaim_acquire(GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 5c8d0489a1cd..14c6f5e65b8d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3020,11 +3020,16 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
 
-static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
+static inline bool __i915_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
 }
 
+static inline bool i915_reset_failed(struct drm_i915_private *i915)
+{
+	return __i915_wedged(&i915->gpu_error);
+}
+
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
 	return READ_ONCE(error->reset_count);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..b3d918d90c1f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1928,7 +1928,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		 * fail). But any other -EIO isn't ours (e.g. swap in failure)
 		 * and so needs to be reported.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error))
+		if (!i915_terminally_wedged(dev_priv))
 			return VM_FAULT_SIGBUS;
 		/* else: fall through */
 	case -EAGAIN:
@@ -2958,7 +2958,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return;
 
 	GEM_BUG_ON(i915->gt.active_requests);
@@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
-		return -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
+		return ret;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
@@ -4460,7 +4461,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 * back to defaults, recovering from whatever wedged state we left it
 	 * in and so worth trying to use the device once more.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		i915_gem_unset_wedged(i915);
 
 	/*
@@ -4504,7 +4505,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_reset_failed(i915)) {
 		ret = i915_gem_switch_to_kernel_context(i915);
 		if (ret)
 			goto err_unlock;
@@ -4625,7 +4626,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	return;
 
 err_wedged:
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_terminally_wedged(i915)) {
 		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
 		i915_gem_set_wedged(i915);
 	}
@@ -4731,10 +4732,9 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
-	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
-		ret = -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
 		goto out;
-	}
 
 	ret = i915_ppgtt_init_hw(dev_priv);
 	if (ret) {
@@ -5107,7 +5107,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
+		if (!i915_reset_failed(dev_priv)) {
 			i915_load_error(dev_priv,
 					"Failed to initialize GPU, declaring it wedged!\n");
 			i915_gem_set_wedged(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index a61e3a4fc9dc..124b3e279c88 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -559,8 +559,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return ERR_PTR(-EIO);
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ERR_PTR(ret);
 
 	/*
 	 * Pinning the contexts may generate requests in order to acquire
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index c234feb5fdf5..0853cdcdce99 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1032,7 +1032,7 @@ void i915_reset(struct drm_i915_private *i915,
 
 finish:
 	reset_finish(i915);
-	if (!i915_terminally_wedged(error))
+	if (!__i915_wedged(error))
 		reset_restart(i915);
 	return;
 
@@ -1253,7 +1253,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
+	if (intel_has_reset_engine(i915) && !__i915_wedged(error)) {
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
@@ -1339,6 +1339,29 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
 	srcu_read_unlock(&error->reset_backoff_srcu, tag);
 }
 
+int i915_terminally_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	if (!__i915_wedged(error))
+		return 0;
+
+	/* Reset still in progress? Maybe we will recover? */
+	if (!test_bit(I915_RESET_BACKOFF, &error->flags))
+		return -EIO;
+
+	/* XXX intel_reset_finish() still takes struct_mutex!!! */
+	if (mutex_is_locked(&i915->drm.struct_mutex))
+		return -EAGAIN;
+
+	if (wait_event_interruptible(error->reset_queue,
+				     !test_bit(I915_RESET_BACKOFF,
+					       &error->flags)))
+		return -EINTR;
+
+	return __i915_wedged(error) ? -EIO : 0;
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 893c5d1c2eb8..16f2389f656f 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
 int __must_check i915_reset_trylock(struct drm_i915_private *i915);
 void i915_reset_unlock(struct drm_i915_private *i915, int tag);
 
+int i915_terminally_wedged(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2547e2e51db8..81b80f8fd9ea 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1007,10 +1007,8 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
  */
 bool intel_engine_is_idle(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-
 	/* More white lies, if wedged, hw state is inconsistent */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_reset_failed(engine->i915))
 		return true;
 
 	/* Any inflight/incomplete requests? */
@@ -1054,7 +1052,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
 	 * If the driver is wedged, HW state may be very inconsistent and
 	 * report that it is still busy, even though we have stopped using it.
 	 */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_reset_failed(dev_priv))
 		return true;
 
 	for_each_engine(engine, dev_priv, id) {
@@ -1496,7 +1494,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		va_end(ap);
 	}
 
-	if (i915_terminally_wedged(&engine->i915->gpu_error))
+	if (i915_reset_failed(engine->i915))
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index a219c796e56d..9be033b6f4d2 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -263,7 +263,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (!READ_ONCE(dev_priv->gt.awake))
 		return;
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return;
 
 	/* As enabling the GPU requires fairly extensive mmio access,
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index a9a2fa35876f..40607ba7dda6 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1764,7 +1764,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
 		return 0;
 	}
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return 0;
 
 	file = mock_file(dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 337b1f98b923..27d8f853111b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -150,7 +150,7 @@ int i915_active_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_active_retire),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
index fd89a5a33c1a..9c16aff87028 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
@@ -248,12 +248,12 @@ static bool always_valid(struct drm_i915_private *i915)
 
 static bool needs_fence_registers(struct drm_i915_private *i915)
 {
-	return !i915_terminally_wedged(&i915->gpu_error);
+	return !i915_terminally_wedged(i915);
 }
 
 static bool needs_mi_store_dword(struct drm_i915_private *i915)
 {
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return false;
 
 	return intel_engine_can_store_dword(i915->engine[RCS]);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index b7b97c57ad05..3a238b9628b3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1632,7 +1632,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_vm_isolation),
 	};
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return 0;
 
 	return i915_subtests(tests, dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 32dce7176f63..b270eab1cad1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -547,7 +547,7 @@ int i915_gem_evict_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_evict_contexts),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
index 395ae878e0f7..784982aed625 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
@@ -583,7 +583,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
 	for (loop = 0; loop < 3; loop++) {
 		intel_wakeref_t wakeref;
 
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_terminally_wedged(i915))
 			break;
 
 		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 6733dc5b6b4c..03cc8ab6a620 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1246,7 +1246,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index af66e3d4e23a..e0d3122fd35a 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -29,5 +29,5 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 		i915_gem_set_wedged(i915);
 	}
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_terminally_wedged(i915);
 }
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index c32bc31192ae..a77e4bf1ab55 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -342,7 +342,7 @@ static int igt_hang_sanitycheck(void *arg)
 			timeout = i915_request_wait(rq,
 						    I915_WAIT_LOCKED,
 						    MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_reset_failed(i915))
 			timeout = -EIO;
 
 		i915_request_put(rq);
@@ -383,7 +383,7 @@ static int igt_global_reset(void *arg)
 
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	return err;
@@ -401,13 +401,13 @@ static int igt_wedged_reset(void *arg)
 
 	i915_gem_set_wedged(i915);
 
-	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
+	GEM_BUG_ON(!i915_reset_failed(i915));
 	i915_reset(i915, ALL_ENGINES, NULL);
 
 	intel_runtime_pm_put(i915, wakeref);
 	igt_global_reset_unlock(i915);
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_reset_failed(i915) ? -EIO : 0;
 }
 
 static bool wait_for_idle(struct intel_engine_cs *engine)
@@ -529,7 +529,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	if (active) {
@@ -843,7 +843,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	if (flags & TEST_ACTIVE) {
@@ -969,7 +969,7 @@ static int igt_reset_wait(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1166,7 +1166,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1372,7 +1372,7 @@ static int igt_reset_queue(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1552,7 +1552,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 			i915_request_wait(rq,
 					  I915_WAIT_LOCKED,
 					  MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_reset_failed(i915))
 			err = -EIO;
 	}
 
@@ -1591,7 +1591,7 @@ static int igt_atomic_reset(void *arg)
 
 	/* Flush any requests before we get started and check basics */
 	force_reset(i915);
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		goto unlock;
 
 	if (intel_has_gpu_reset(i915)) {
@@ -1665,7 +1665,7 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return -EIO; /* we're long past hope of a successful reset */
 
 	wakeref = intel_runtime_pm_get(i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 58144e024751..0f7a5bf69646 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -895,7 +895,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 	if (!HAS_EXECLISTS(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index fb479a2c04fb..e6ffc8ac22dc 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -181,7 +181,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
 	err = 0;
 	igt_wedge_on_timeout(&wedge, ctx->i915, HZ / 5) /* a safety net! */
 		err = i915_gem_object_set_to_cpu_domain(results, false);
-	if (i915_terminally_wedged(&ctx->i915->gpu_error))
+	if (i915_terminally_wedged(ctx->i915))
 		err = -EIO;
 	if (err)
 		goto out_put;
@@ -510,7 +510,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 	};
 	int err;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	mutex_lock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev7)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (10 preceding siblings ...)
  2019-02-19 16:18 ` [PATCH v7] " Chris Wilson
@ 2019-02-19 16:28 ` Patchwork
  2019-02-19 16:48 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-19 16:28 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev7)
URL   : https://patchwork.freedesktop.org/series/56898/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Beware temporary wedging when determining -EIO
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3572:16: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev7)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (11 preceding siblings ...)
  2019-02-19 16:28 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev7) Patchwork
@ 2019-02-19 16:48 ` Patchwork
  2019-02-19 19:21 ` ✓ Fi.CI.IGT: " Patchwork
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-19 16:48 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev7)
URL   : https://patchwork.freedesktop.org/series/56898/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5631 -> Patchwork_12259
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56898/revisions/7/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12259 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@userptr:
    - fi-kbl-8809g:       PASS -> DMESG-WARN [fdo#108965]

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-blb-e6850:       PASS -> INCOMPLETE [fdo#107718]

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a:
    - fi-byt-clapper:     PASS -> FAIL [fdo#107362]

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

  
#### Possible fixes ####

  * igt@i915_selftest@live_execlists:
    - fi-apl-guc:         INCOMPLETE [fdo#103927] -> PASS

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-b:
    - fi-byt-clapper:     FAIL [fdo#103191] / [fdo#107362] -> PASS +1

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#108965]: https://bugs.freedesktop.org/show_bug.cgi?id=108965
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109316]: https://bugs.freedesktop.org/show_bug.cgi?id=109316
  [fdo#109635 ]: https://bugs.freedesktop.org/show_bug.cgi?id=109635 


Participating hosts (45 -> 41)
------------------------------

  Additional (1): fi-icl-u2 
  Missing    (5): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-icl-u3 


Build changes
-------------

    * Linux: CI_DRM_5631 -> Patchwork_12259

  CI_DRM_5631: bf9a463be5b19fcc0a6d768e5da1f95d20d3a4a3 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4837: 368e76156f752e6ed6ac32ed9f400567aef7d3fc @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12259: ae733c24e7c433bfc4a238ae646463bef1a83564 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ae733c24e7c4 drm/i915: Beware temporary wedging when determining -EIO

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12259/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Beware temporary wedging when determining -EIO (rev7)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (12 preceding siblings ...)
  2019-02-19 16:48 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-19 19:21 ` Patchwork
  2019-02-20 14:56 ` [CI] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-19 19:21 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev7)
URL   : https://patchwork.freedesktop.org/series/56898/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5631_full -> Patchwork_12259_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_12259_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_color@pipe-a-ctm-max:
    - shard-apl:          PASS -> FAIL [fdo#108147]

  * igt@kms_cursor_crc@cursor-256x256-suspend:
    - shard-apl:          PASS -> DMESG-FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-256x85-random:
    - shard-apl:          PASS -> FAIL [fdo#103232] +2

  * igt@kms_cursor_crc@cursor-alpha-opaque:
    - shard-apl:          PASS -> FAIL [fdo#109350]

  * igt@kms_flip@2x-flip-vs-wf_vblank-interruptible:
    - shard-hsw:          PASS -> DMESG-WARN [fdo#102614] +1

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-blt:
    - shard-glk:          PASS -> FAIL [fdo#103167] +1

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-move:
    - shard-iclb:         PASS -> FAIL [fdo#103167] +3

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
    - shard-iclb:         PASS -> INCOMPLETE [fdo#107713] +1

  * igt@kms_plane@pixel-format-pipe-a-planes-source-clamping:
    - shard-apl:          PASS -> FAIL [fdo#108948]

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
    - shard-apl:          PASS -> FAIL [fdo#103166]

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-y:
    - shard-iclb:         PASS -> FAIL [fdo#103166] +2

  * igt@kms_rotation_crc@multiplane-rotation:
    - shard-kbl:          PASS -> FAIL [fdo#109016]

  * igt@kms_vblank@pipe-a-query-forked:
    - shard-kbl:          PASS -> DMESG-WARN [fdo#103558] / [fdo#105602] +23

  * igt@kms_vblank@pipe-b-query-forked-busy:
    - shard-glk:          NOTRUN -> INCOMPLETE [fdo#103359] / [k.org#198133]

  * igt@pm_rpm@cursor-dpms:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#107724] +2

  * igt@pm_rpm@universal-planes:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#107724] / [fdo#108654]

  
#### Possible fixes ####

  * igt@gem_exec_big:
    - shard-hsw:          TIMEOUT [fdo#107937] -> PASS

  * igt@kms_busy@extended-modeset-hang-newfb-render-c:
    - shard-iclb:         DMESG-WARN [fdo#107956] -> PASS +1

  * igt@kms_color@pipe-a-legacy-gamma:
    - shard-apl:          FAIL [fdo#104782] / [fdo#108145] -> PASS
    - shard-glk:          FAIL [fdo#104782] / [fdo#108145] -> PASS

  * igt@kms_cursor_crc@cursor-64x21-sliding:
    - shard-apl:          FAIL [fdo#103232] -> PASS +4

  * igt@kms_flip@flip-vs-dpms-interruptible:
    - shard-glk:          INCOMPLETE [fdo#103359] / [k.org#198133] -> PASS

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-kbl:          FAIL [fdo#102887] / [fdo#105363] -> PASS

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-iclb:         INCOMPLETE [fdo#109507] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-pwrite:
    - shard-apl:          FAIL [fdo#103167] -> PASS +9

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-onoff:
    - shard-iclb:         FAIL [fdo#103167] -> PASS

  * igt@kms_plane@pixel-format-pipe-b-planes:
    - shard-glk:          FAIL [fdo#103166] -> PASS

  * igt@kms_plane@pixel-format-pipe-c-planes-source-clamping:
    - shard-apl:          FAIL [fdo#108948] -> PASS

  * igt@kms_plane@plane-position-covered-pipe-a-planes:
    - shard-iclb:         FAIL [fdo#103166] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
    - shard-apl:          FAIL [fdo#103166] -> PASS +2

  * igt@pm_rpm@gem-pread:
    - shard-iclb:         DMESG-WARN [fdo#107724] -> PASS +4

  * igt@pm_rps@waitboost:
    - shard-glk:          FAIL [fdo#102250] -> PASS

  * igt@tools_test@tools_test:
    - shard-glk:          {SKIP} [fdo#109271] -> PASS

  
#### Warnings ####

  * igt@kms_plane_alpha_blend@pipe-b-alpha-opaque-fb:
    - shard-kbl:          FAIL [fdo#108145] -> DMESG-FAIL [fdo#103558] / [fdo#105602] / [fdo#108145]

  * igt@kms_rotation_crc@multiplane-rotation-cropping-bottom:
    - shard-kbl:          DMESG-FAIL [fdo#105763] -> DMESG-WARN [fdo#103558] / [fdo#105602]

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102250]: https://bugs.freedesktop.org/show_bug.cgi?id=102250
  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#102887]: https://bugs.freedesktop.org/show_bug.cgi?id=102887
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103558]: https://bugs.freedesktop.org/show_bug.cgi?id=103558
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105602]: https://bugs.freedesktop.org/show_bug.cgi?id=105602
  [fdo#105763]: https://bugs.freedesktop.org/show_bug.cgi?id=105763
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#107937]: https://bugs.freedesktop.org/show_bug.cgi?id=107937
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108147]: https://bugs.freedesktop.org/show_bug.cgi?id=108147
  [fdo#108654]: https://bugs.freedesktop.org/show_bug.cgi?id=108654
  [fdo#108756]: https://bugs.freedesktop.org/show_bug.cgi?id=108756
  [fdo#108948]: https://bugs.freedesktop.org/show_bug.cgi?id=108948
  [fdo#109016]: https://bugs.freedesktop.org/show_bug.cgi?id=109016
  [fdo#109225]: https://bugs.freedesktop.org/show_bug.cgi?id=109225
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109277]: https://bugs.freedesktop.org/show_bug.cgi?id=109277
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109281]: https://bugs.freedesktop.org/show_bug.cgi?id=109281
  [fdo#109350]: https://bugs.freedesktop.org/show_bug.cgi?id=109350
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109507]: https://bugs.freedesktop.org/show_bug.cgi?id=109507
  [fdo#109624]: https://bugs.freedesktop.org/show_bug.cgi?id=109624
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (7 -> 6)
------------------------------

  Missing    (1): shard-skl 


Build changes
-------------

    * Linux: CI_DRM_5631 -> Patchwork_12259

  CI_DRM_5631: bf9a463be5b19fcc0a6d768e5da1f95d20d3a4a3 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4837: 368e76156f752e6ed6ac32ed9f400567aef7d3fc @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12259: ae733c24e7c433bfc4a238ae646463bef1a83564 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12259/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v7] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 16:18 ` [PATCH v7] " Chris Wilson
@ 2019-02-20 14:44   ` Mika Kuoppala
  0 siblings, 0 replies; 24+ messages in thread
From: Mika Kuoppala @ 2019-02-20 14:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> At a few points in our uABI, we check to see if the driver is wedged and
> report -EIO back to the user in that case. However, as we perform the
> check and reset asynchronously, we may instead see the temporary wedging
> used to cancel inflight rendering to avoid a deadlock during reset. If

This could mention that it is the wedge on timeout which will
flash the -EIO to userspace.

> we suspect this is the case, that is we see a wedged driver and reset in
> progress, then wait until the reset is resolved before reporting upon
> the wedged status.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c           | 16 ++++++++---
>  drivers/gpu/drm/i915/i915_drv.h               |  7 ++++-
>  drivers/gpu/drm/i915/i915_gem.c               | 22 +++++++--------
>  drivers/gpu/drm/i915/i915_request.c           |  5 ++--
>  drivers/gpu/drm/i915/i915_reset.c             | 27 +++++++++++++++++--
>  drivers/gpu/drm/i915/i915_reset.h             |  2 ++
>  drivers/gpu/drm/i915/intel_engine_cs.c        |  8 +++---
>  drivers/gpu/drm/i915/intel_hangcheck.c        |  2 +-
>  drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
>  drivers/gpu/drm/i915/selftests/i915_active.c  |  2 +-
>  .../drm/i915/selftests/i915_gem_coherency.c   |  4 +--
>  .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  2 +-
>  .../gpu/drm/i915/selftests/i915_gem_object.c  |  2 +-
>  drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
>  .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
>  .../gpu/drm/i915/selftests/intel_hangcheck.c  | 24 ++++++++---------
>  drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
>  .../drm/i915/selftests/intel_workarounds.c    |  4 +--
>  19 files changed, 87 insertions(+), 50 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 2aeea977283f..54928d693c85 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -3833,11 +3833,19 @@ static const struct file_operations i915_cur_wm_latency_fops = {
>  static int
>  i915_wedged_get(void *data, u64 *val)
>  {
> -	struct drm_i915_private *dev_priv = data;
> +	int ret = i915_terminally_wedged(data);
>  
> -	*val = i915_terminally_wedged(&dev_priv->gpu_error);
> +	switch (ret) {
> +	case -EIO:
> +		*val = 1;
> +		ret = 0;
> +		break;
> +	case 0:
> +		*val = 0;
> +		break;
> +	}

Hmm, joke is still there but it is better.

>  
> -	return 0;
> +	return ret;
>  }
>  
>  static int
> @@ -3918,7 +3926,7 @@ i915_drop_caches_set(void *data, u64 val)
>  		mutex_unlock(&i915->drm.struct_mutex);
>  	}
>  
> -	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
> +	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(i915))
>  		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
>  
>  	fs_reclaim_acquire(GFP_KERNEL);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 5c8d0489a1cd..14c6f5e65b8d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3020,11 +3020,16 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
>  struct i915_request *
>  i915_gem_find_active_request(struct intel_engine_cs *engine);
>  
> -static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
> +static inline bool __i915_wedged(struct i915_gpu_error *error)
>  {
>  	return unlikely(test_bit(I915_WEDGED, &error->flags));
>  }
>  
> +static inline bool i915_reset_failed(struct drm_i915_private *i915)
> +{
> +	return __i915_wedged(&i915->gpu_error);
> +}
> +
>  static inline u32 i915_reset_count(struct i915_gpu_error *error)
>  {
>  	return READ_ONCE(error->reset_count);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b421bc7a2e26..b3d918d90c1f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1928,7 +1928,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  		 * fail). But any other -EIO isn't ours (e.g. swap in failure)
>  		 * and so needs to be reported.
>  		 */
> -		if (!i915_terminally_wedged(&dev_priv->gpu_error))
> +		if (!i915_terminally_wedged(dev_priv))
>  			return VM_FAULT_SIGBUS;
>  		/* else: fall through */
>  	case -EAGAIN:
> @@ -2958,7 +2958,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
>  	struct intel_engine_cs *engine;
>  	enum intel_engine_id id;
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_reset_failed(i915))
>  		return;
>  
>  	GEM_BUG_ON(i915->gt.active_requests);
> @@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
>  	long ret;
>  
>  	/* ABI: return -EIO if already wedged */
> -	if (i915_terminally_wedged(&dev_priv->gpu_error))
> -		return -EIO;
> +	ret = i915_terminally_wedged(dev_priv);
> +	if (ret)
> +		return ret;
>  
>  	spin_lock(&file_priv->mm.lock);
>  	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
> @@ -4460,7 +4461,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>  	 * back to defaults, recovering from whatever wedged state we left it
>  	 * in and so worth trying to use the device once more.
>  	 */
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_terminally_wedged(i915))
>  		i915_gem_unset_wedged(i915);
>  
>  	/*
> @@ -4504,7 +4505,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>  	 * state. Fortunately, the kernel_context is disposable and we do
>  	 * not rely on its state.
>  	 */
> -	if (!i915_terminally_wedged(&i915->gpu_error)) {
> +	if (!i915_reset_failed(i915)) {
>  		ret = i915_gem_switch_to_kernel_context(i915);
>  		if (ret)
>  			goto err_unlock;
> @@ -4625,7 +4626,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
>  	return;
>  
>  err_wedged:
> -	if (!i915_terminally_wedged(&i915->gpu_error)) {
> +	if (!i915_terminally_wedged(i915)) {
>  		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
>  		i915_gem_set_wedged(i915);
>  	}
> @@ -4731,10 +4732,9 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
>  	init_unused_rings(dev_priv);
>  
>  	BUG_ON(!dev_priv->kernel_context);
> -	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
> -		ret = -EIO;
> +	ret = i915_terminally_wedged(dev_priv);
> +	if (ret)
>  		goto out;
> -	}
>  
>  	ret = i915_ppgtt_init_hw(dev_priv);
>  	if (ret) {
> @@ -5107,7 +5107,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
>  		 * wedged. But we only want to do this where the GPU is angry,
>  		 * for all other failure, such as an allocation failure, bail.
>  		 */
> -		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
> +		if (!i915_reset_failed(dev_priv)) {
>  			i915_load_error(dev_priv,
>  					"Failed to initialize GPU, declaring it wedged!\n");
>  			i915_gem_set_wedged(dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index a61e3a4fc9dc..124b3e279c88 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -559,8 +559,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>  	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
>  	 * EIO if the GPU is already wedged.
>  	 */
> -	if (i915_terminally_wedged(&i915->gpu_error))
> -		return ERR_PTR(-EIO);
> +	ret = i915_terminally_wedged(i915);
> +	if (ret)
> +		return ERR_PTR(ret);
>  
>  	/*
>  	 * Pinning the contexts may generate requests in order to acquire
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index c234feb5fdf5..0853cdcdce99 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -1032,7 +1032,7 @@ void i915_reset(struct drm_i915_private *i915,
>  
>  finish:
>  	reset_finish(i915);
> -	if (!i915_terminally_wedged(error))
> +	if (!__i915_wedged(error))
>  		reset_restart(i915);
>  	return;
>  
> @@ -1253,7 +1253,7 @@ void i915_handle_error(struct drm_i915_private *i915,
>  	 * Try engine reset when available. We fall back to full reset if
>  	 * single reset fails.
>  	 */
> -	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
> +	if (intel_has_reset_engine(i915) && !__i915_wedged(error)) {
>  		for_each_engine_masked(engine, i915, engine_mask, tmp) {
>  			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
>  			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> @@ -1339,6 +1339,29 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
>  	srcu_read_unlock(&error->reset_backoff_srcu, tag);
>  }
>  
> +int i915_terminally_wedged(struct drm_i915_private *i915)
> +{
> +	struct i915_gpu_error *error = &i915->gpu_error;
> +

might_sleep();

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> +	if (!__i915_wedged(error))
> +		return 0;
> +
> +	/* Reset still in progress? Maybe we will recover? */
> +	if (!test_bit(I915_RESET_BACKOFF, &error->flags))
> +		return -EIO;
> +
> +	/* XXX intel_reset_finish() still takes struct_mutex!!! */
> +	if (mutex_is_locked(&i915->drm.struct_mutex))
> +		return -EAGAIN;
> +
> +	if (wait_event_interruptible(error->reset_queue,
> +				     !test_bit(I915_RESET_BACKOFF,
> +					       &error->flags)))
> +		return -EINTR;
> +
> +	return __i915_wedged(error) ? -EIO : 0;
> +}
> +
>  bool i915_reset_flush(struct drm_i915_private *i915)
>  {
>  	int err;
> diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
> index 893c5d1c2eb8..16f2389f656f 100644
> --- a/drivers/gpu/drm/i915/i915_reset.h
> +++ b/drivers/gpu/drm/i915/i915_reset.h
> @@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
>  int __must_check i915_reset_trylock(struct drm_i915_private *i915);
>  void i915_reset_unlock(struct drm_i915_private *i915, int tag);
>  
> +int i915_terminally_wedged(struct drm_i915_private *i915);
> +
>  bool intel_has_gpu_reset(struct drm_i915_private *i915);
>  bool intel_has_reset_engine(struct drm_i915_private *i915);
>  
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 2547e2e51db8..81b80f8fd9ea 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1007,10 +1007,8 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
>   */
>  bool intel_engine_is_idle(struct intel_engine_cs *engine)
>  {
> -	struct drm_i915_private *dev_priv = engine->i915;
> -
>  	/* More white lies, if wedged, hw state is inconsistent */
> -	if (i915_terminally_wedged(&dev_priv->gpu_error))
> +	if (i915_reset_failed(engine->i915))
>  		return true;
>  
>  	/* Any inflight/incomplete requests? */
> @@ -1054,7 +1052,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
>  	 * If the driver is wedged, HW state may be very inconsistent and
>  	 * report that it is still busy, even though we have stopped using it.
>  	 */
> -	if (i915_terminally_wedged(&dev_priv->gpu_error))
> +	if (i915_reset_failed(dev_priv))
>  		return true;
>  
>  	for_each_engine(engine, dev_priv, id) {
> @@ -1496,7 +1494,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>  		va_end(ap);
>  	}
>  
> -	if (i915_terminally_wedged(&engine->i915->gpu_error))
> +	if (i915_reset_failed(engine->i915))
>  		drm_printf(m, "*** WEDGED ***\n");
>  
>  	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index a219c796e56d..9be033b6f4d2 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -263,7 +263,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>  	if (!READ_ONCE(dev_priv->gt.awake))
>  		return;
>  
> -	if (i915_terminally_wedged(&dev_priv->gpu_error))
> +	if (i915_terminally_wedged(dev_priv))
>  		return;
>  
>  	/* As enabling the GPU requires fairly extensive mmio access,
> diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
> index a9a2fa35876f..40607ba7dda6 100644
> --- a/drivers/gpu/drm/i915/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
> @@ -1764,7 +1764,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
>  		return 0;
>  	}
>  
> -	if (i915_terminally_wedged(&dev_priv->gpu_error))
> +	if (i915_terminally_wedged(dev_priv))
>  		return 0;
>  
>  	file = mock_file(dev_priv);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
> index 337b1f98b923..27d8f853111b 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_active.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_active.c
> @@ -150,7 +150,7 @@ int i915_active_live_selftests(struct drm_i915_private *i915)
>  		SUBTEST(live_active_retire),
>  	};
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_terminally_wedged(i915))
>  		return 0;
>  
>  	return i915_subtests(tests, i915);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
> index fd89a5a33c1a..9c16aff87028 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
> @@ -248,12 +248,12 @@ static bool always_valid(struct drm_i915_private *i915)
>  
>  static bool needs_fence_registers(struct drm_i915_private *i915)
>  {
> -	return !i915_terminally_wedged(&i915->gpu_error);
> +	return !i915_terminally_wedged(i915);
>  }
>  
>  static bool needs_mi_store_dword(struct drm_i915_private *i915)
>  {
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_terminally_wedged(i915))
>  		return false;
>  
>  	return intel_engine_can_store_dword(i915->engine[RCS]);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index b7b97c57ad05..3a238b9628b3 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -1632,7 +1632,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
>  		SUBTEST(igt_vm_isolation),
>  	};
>  
> -	if (i915_terminally_wedged(&dev_priv->gpu_error))
> +	if (i915_terminally_wedged(dev_priv))
>  		return 0;
>  
>  	return i915_subtests(tests, dev_priv);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> index 32dce7176f63..b270eab1cad1 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
> @@ -547,7 +547,7 @@ int i915_gem_evict_live_selftests(struct drm_i915_private *i915)
>  		SUBTEST(igt_evict_contexts),
>  	};
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_terminally_wedged(i915))
>  		return 0;
>  
>  	return i915_subtests(tests, i915);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
> index 395ae878e0f7..784982aed625 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
> @@ -583,7 +583,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
>  	for (loop = 0; loop < 3; loop++) {
>  		intel_wakeref_t wakeref;
>  
> -		if (i915_terminally_wedged(&i915->gpu_error))
> +		if (i915_terminally_wedged(i915))
>  			break;
>  
>  		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index 6733dc5b6b4c..03cc8ab6a620 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -1246,7 +1246,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
>  		SUBTEST(live_breadcrumbs_smoketest),
>  	};
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_terminally_wedged(i915))
>  		return 0;
>  
>  	return i915_subtests(tests, i915);
> diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> index af66e3d4e23a..e0d3122fd35a 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> @@ -29,5 +29,5 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
>  		i915_gem_set_wedged(i915);
>  	}
>  
> -	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
> +	return i915_terminally_wedged(i915);
>  }
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index c32bc31192ae..a77e4bf1ab55 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -342,7 +342,7 @@ static int igt_hang_sanitycheck(void *arg)
>  			timeout = i915_request_wait(rq,
>  						    I915_WAIT_LOCKED,
>  						    MAX_SCHEDULE_TIMEOUT);
> -		if (i915_terminally_wedged(&i915->gpu_error))
> +		if (i915_reset_failed(i915))
>  			timeout = -EIO;
>  
>  		i915_request_put(rq);
> @@ -383,7 +383,7 @@ static int igt_global_reset(void *arg)
>  
>  	igt_global_reset_unlock(i915);
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_reset_failed(i915))
>  		err = -EIO;
>  
>  	return err;
> @@ -401,13 +401,13 @@ static int igt_wedged_reset(void *arg)
>  
>  	i915_gem_set_wedged(i915);
>  
> -	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
> +	GEM_BUG_ON(!i915_reset_failed(i915));
>  	i915_reset(i915, ALL_ENGINES, NULL);
>  
>  	intel_runtime_pm_put(i915, wakeref);
>  	igt_global_reset_unlock(i915);
>  
> -	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
> +	return i915_reset_failed(i915) ? -EIO : 0;
>  }
>  
>  static bool wait_for_idle(struct intel_engine_cs *engine)
> @@ -529,7 +529,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
>  			break;
>  	}
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_reset_failed(i915))
>  		err = -EIO;
>  
>  	if (active) {
> @@ -843,7 +843,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
>  			break;
>  	}
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_reset_failed(i915))
>  		err = -EIO;
>  
>  	if (flags & TEST_ACTIVE) {
> @@ -969,7 +969,7 @@ static int igt_reset_wait(void *arg)
>  	mutex_unlock(&i915->drm.struct_mutex);
>  	igt_global_reset_unlock(i915);
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_reset_failed(i915))
>  		return -EIO;
>  
>  	return err;
> @@ -1166,7 +1166,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>  unlock:
>  	mutex_unlock(&i915->drm.struct_mutex);
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_reset_failed(i915))
>  		return -EIO;
>  
>  	return err;
> @@ -1372,7 +1372,7 @@ static int igt_reset_queue(void *arg)
>  	mutex_unlock(&i915->drm.struct_mutex);
>  	igt_global_reset_unlock(i915);
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_reset_failed(i915))
>  		return -EIO;
>  
>  	return err;
> @@ -1552,7 +1552,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
>  			i915_request_wait(rq,
>  					  I915_WAIT_LOCKED,
>  					  MAX_SCHEDULE_TIMEOUT);
> -		if (i915_terminally_wedged(&i915->gpu_error))
> +		if (i915_reset_failed(i915))
>  			err = -EIO;
>  	}
>  
> @@ -1591,7 +1591,7 @@ static int igt_atomic_reset(void *arg)
>  
>  	/* Flush any requests before we get started and check basics */
>  	force_reset(i915);
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_reset_failed(i915))
>  		goto unlock;
>  
>  	if (intel_has_gpu_reset(i915)) {
> @@ -1665,7 +1665,7 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
>  	if (!intel_has_gpu_reset(i915))
>  		return 0;
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_terminally_wedged(i915))
>  		return -EIO; /* we're long past hope of a successful reset */
>  
>  	wakeref = intel_runtime_pm_get(i915);
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 58144e024751..0f7a5bf69646 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -895,7 +895,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>  	if (!HAS_EXECLISTS(i915))
>  		return 0;
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_terminally_wedged(i915))
>  		return 0;
>  
>  	return i915_subtests(tests, i915);
> diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> index fb479a2c04fb..e6ffc8ac22dc 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> @@ -181,7 +181,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
>  	err = 0;
>  	igt_wedge_on_timeout(&wedge, ctx->i915, HZ / 5) /* a safety net! */
>  		err = i915_gem_object_set_to_cpu_domain(results, false);
> -	if (i915_terminally_wedged(&ctx->i915->gpu_error))
> +	if (i915_terminally_wedged(ctx->i915))
>  		err = -EIO;
>  	if (err)
>  		goto out_put;
> @@ -510,7 +510,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
>  	};
>  	int err;
>  
> -	if (i915_terminally_wedged(&i915->gpu_error))
> +	if (i915_terminally_wedged(i915))
>  		return 0;
>  
>  	mutex_lock(&i915->drm.struct_mutex);
> -- 
> 2.20.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [CI] drm/i915: Beware temporary wedging when determining -EIO
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (13 preceding siblings ...)
  2019-02-19 19:21 ` ✓ Fi.CI.IGT: " Patchwork
@ 2019-02-20 14:56 ` Chris Wilson
  2019-02-20 15:42 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev8) Patchwork
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2019-02-20 14:56 UTC (permalink / raw)
  To: intel-gfx

At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously (where once before they were both
serialised by the struct_mutex), we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset
(caused by either us timing out in our reset handler,
i915_wedge_on_timeout or with malice aforethought in intel_reset_prepare
for a stuck modeset). If we suspect this is the case, that is we see a
wedged driver *and* reset in progress, then wait until the reset is
resolved before reporting upon the wedged status.

v2: might_sleep() (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c           | 17 +++++++----
 drivers/gpu/drm/i915/i915_drv.h               |  7 ++++-
 drivers/gpu/drm/i915/i915_gem.c               | 25 ++++++++--------
 drivers/gpu/drm/i915/i915_request.c           |  5 ++--
 drivers/gpu/drm/i915/i915_reset.c             | 29 +++++++++++++++++--
 drivers/gpu/drm/i915/i915_reset.h             |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c        |  8 ++---
 drivers/gpu/drm/i915/intel_hangcheck.c        |  2 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |  2 +-
 .../drm/i915/selftests/i915_gem_coherency.c   |  4 +--
 .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_object.c  |  2 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  | 24 +++++++--------
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
 .../drm/i915/selftests/intel_workarounds.c    |  4 +--
 19 files changed, 91 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 2aeea977283f..37175414ce89 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3833,11 +3833,18 @@ static const struct file_operations i915_cur_wm_latency_fops = {
 static int
 i915_wedged_get(void *data, u64 *val)
 {
-	struct drm_i915_private *dev_priv = data;
-
-	*val = i915_terminally_wedged(&dev_priv->gpu_error);
+	int ret = i915_terminally_wedged(data);
 
-	return 0;
+	switch (ret) {
+	case -EIO:
+		*val = 1;
+		return 0;
+	case 0:
+		*val = 0;
+		return 0;
+	default:
+		return ret;
+	}
 }
 
 static int
@@ -3918,7 +3925,7 @@ i915_drop_caches_set(void *data, u64 val)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(&i915->gpu_error))
+	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(i915))
 		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
 
 	fs_reclaim_acquire(GFP_KERNEL);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 58c9058da4b4..2a78fb3e6eee 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3020,11 +3020,16 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
 
-static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
+static inline bool __i915_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
 }
 
+static inline bool i915_reset_failed(struct drm_i915_private *i915)
+{
+	return __i915_wedged(&i915->gpu_error);
+}
+
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
 	return READ_ONCE(error->reset_count);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b421bc7a2e26..cc88e7974422 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1928,7 +1928,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		 * fail). But any other -EIO isn't ours (e.g. swap in failure)
 		 * and so needs to be reported.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error))
+		if (!i915_terminally_wedged(dev_priv))
 			return VM_FAULT_SIGBUS;
 		/* else: fall through */
 	case -EAGAIN:
@@ -2958,7 +2958,7 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return;
 
 	GEM_BUG_ON(i915->gt.active_requests);
@@ -3806,8 +3806,9 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
-		return -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
+		return ret;
 
 	spin_lock(&file_priv->mm.lock);
 	list_for_each_entry(request, &file_priv->mm.request_list, client_link) {
@@ -4460,7 +4461,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 * back to defaults, recovering from whatever wedged state we left it
 	 * in and so worth trying to use the device once more.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		i915_gem_unset_wedged(i915);
 
 	/*
@@ -4504,7 +4505,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
+	if (!i915_reset_failed(i915)) {
 		ret = i915_gem_switch_to_kernel_context(i915);
 		if (ret)
 			goto err_unlock;
@@ -4625,8 +4626,9 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	return;
 
 err_wedged:
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
-		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
+	if (!i915_reset_failed(i915)) {
+		dev_err(i915->drm.dev,
+			"Failed to re-initialize GPU, declaring it wedged!\n");
 		i915_gem_set_wedged(i915);
 	}
 	goto out_unlock;
@@ -4731,10 +4733,9 @@ int i915_gem_init_hw(struct drm_i915_private *dev_priv)
 	init_unused_rings(dev_priv);
 
 	BUG_ON(!dev_priv->kernel_context);
-	if (i915_terminally_wedged(&dev_priv->gpu_error)) {
-		ret = -EIO;
+	ret = i915_terminally_wedged(dev_priv);
+	if (ret)
 		goto out;
-	}
 
 	ret = i915_ppgtt_init_hw(dev_priv);
 	if (ret) {
@@ -5107,7 +5108,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
+		if (!i915_reset_failed(dev_priv)) {
 			i915_load_error(dev_priv,
 					"Failed to initialize GPU, declaring it wedged!\n");
 			i915_gem_set_wedged(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index a61e3a4fc9dc..124b3e279c88 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -559,8 +559,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return ERR_PTR(-EIO);
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ERR_PTR(ret);
 
 	/*
 	 * Pinning the contexts may generate requests in order to acquire
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index c234feb5fdf5..19ad56ba22a7 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1032,7 +1032,7 @@ void i915_reset(struct drm_i915_private *i915,
 
 finish:
 	reset_finish(i915);
-	if (!i915_terminally_wedged(error))
+	if (!__i915_wedged(error))
 		reset_restart(i915);
 	return;
 
@@ -1253,7 +1253,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
+	if (intel_has_reset_engine(i915) && !__i915_wedged(error)) {
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
@@ -1339,6 +1339,31 @@ __releases(&i915->gpu_error.reset_backoff_srcu)
 	srcu_read_unlock(&error->reset_backoff_srcu, tag);
 }
 
+int i915_terminally_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	might_sleep();
+
+	if (!__i915_wedged(error))
+		return 0;
+
+	/* Reset still in progress? Maybe we will recover? */
+	if (!test_bit(I915_RESET_BACKOFF, &error->flags))
+		return -EIO;
+
+	/* XXX intel_reset_finish() still takes struct_mutex!!! */
+	if (mutex_is_locked(&i915->drm.struct_mutex))
+		return -EAGAIN;
+
+	if (wait_event_interruptible(error->reset_queue,
+				     !test_bit(I915_RESET_BACKOFF,
+					       &error->flags)))
+		return -EINTR;
+
+	return __i915_wedged(error) ? -EIO : 0;
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 893c5d1c2eb8..16f2389f656f 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -36,6 +36,8 @@ bool i915_reset_flush(struct drm_i915_private *i915);
 int __must_check i915_reset_trylock(struct drm_i915_private *i915);
 void i915_reset_unlock(struct drm_i915_private *i915, int tag);
 
+int i915_terminally_wedged(struct drm_i915_private *i915);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2547e2e51db8..81b80f8fd9ea 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1007,10 +1007,8 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
  */
 bool intel_engine_is_idle(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
-
 	/* More white lies, if wedged, hw state is inconsistent */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_reset_failed(engine->i915))
 		return true;
 
 	/* Any inflight/incomplete requests? */
@@ -1054,7 +1052,7 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
 	 * If the driver is wedged, HW state may be very inconsistent and
 	 * report that it is still busy, even though we have stopped using it.
 	 */
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_reset_failed(dev_priv))
 		return true;
 
 	for_each_engine(engine, dev_priv, id) {
@@ -1496,7 +1494,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		va_end(ap);
 	}
 
-	if (i915_terminally_wedged(&engine->i915->gpu_error))
+	if (i915_reset_failed(engine->i915))
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index a219c796e56d..9be033b6f4d2 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -263,7 +263,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (!READ_ONCE(dev_priv->gt.awake))
 		return;
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return;
 
 	/* As enabling the GPU requires fairly extensive mmio access,
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index a9a2fa35876f..40607ba7dda6 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1764,7 +1764,7 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
 		return 0;
 	}
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return 0;
 
 	file = mock_file(dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 337b1f98b923..27d8f853111b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -150,7 +150,7 @@ int i915_active_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_active_retire),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
index fd89a5a33c1a..9c16aff87028 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_coherency.c
@@ -248,12 +248,12 @@ static bool always_valid(struct drm_i915_private *i915)
 
 static bool needs_fence_registers(struct drm_i915_private *i915)
 {
-	return !i915_terminally_wedged(&i915->gpu_error);
+	return !i915_terminally_wedged(i915);
 }
 
 static bool needs_mi_store_dword(struct drm_i915_private *i915)
 {
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return false;
 
 	return intel_engine_can_store_dword(i915->engine[RCS]);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index b7b97c57ad05..3a238b9628b3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1632,7 +1632,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_vm_isolation),
 	};
 
-	if (i915_terminally_wedged(&dev_priv->gpu_error))
+	if (i915_terminally_wedged(dev_priv))
 		return 0;
 
 	return i915_subtests(tests, dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 32dce7176f63..b270eab1cad1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -547,7 +547,7 @@ int i915_gem_evict_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_evict_contexts),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_object.c b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
index 395ae878e0f7..784982aed625 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_object.c
@@ -583,7 +583,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
 	for (loop = 0; loop < 3; loop++) {
 		intel_wakeref_t wakeref;
 
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_terminally_wedged(i915))
 			break;
 
 		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 6733dc5b6b4c..03cc8ab6a620 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1246,7 +1246,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index af66e3d4e23a..e0d3122fd35a 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -29,5 +29,5 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 		i915_gem_set_wedged(i915);
 	}
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_terminally_wedged(i915);
 }
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index c32bc31192ae..a77e4bf1ab55 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -342,7 +342,7 @@ static int igt_hang_sanitycheck(void *arg)
 			timeout = i915_request_wait(rq,
 						    I915_WAIT_LOCKED,
 						    MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_reset_failed(i915))
 			timeout = -EIO;
 
 		i915_request_put(rq);
@@ -383,7 +383,7 @@ static int igt_global_reset(void *arg)
 
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	return err;
@@ -401,13 +401,13 @@ static int igt_wedged_reset(void *arg)
 
 	i915_gem_set_wedged(i915);
 
-	GEM_BUG_ON(!i915_terminally_wedged(&i915->gpu_error));
+	GEM_BUG_ON(!i915_reset_failed(i915));
 	i915_reset(i915, ALL_ENGINES, NULL);
 
 	intel_runtime_pm_put(i915, wakeref);
 	igt_global_reset_unlock(i915);
 
-	return i915_terminally_wedged(&i915->gpu_error) ? -EIO : 0;
+	return i915_reset_failed(i915) ? -EIO : 0;
 }
 
 static bool wait_for_idle(struct intel_engine_cs *engine)
@@ -529,7 +529,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	if (active) {
@@ -843,7 +843,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 			break;
 	}
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		err = -EIO;
 
 	if (flags & TEST_ACTIVE) {
@@ -969,7 +969,7 @@ static int igt_reset_wait(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1166,7 +1166,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1372,7 +1372,7 @@ static int igt_reset_queue(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 	igt_global_reset_unlock(i915);
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		return -EIO;
 
 	return err;
@@ -1552,7 +1552,7 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 			i915_request_wait(rq,
 					  I915_WAIT_LOCKED,
 					  MAX_SCHEDULE_TIMEOUT);
-		if (i915_terminally_wedged(&i915->gpu_error))
+		if (i915_reset_failed(i915))
 			err = -EIO;
 	}
 
@@ -1591,7 +1591,7 @@ static int igt_atomic_reset(void *arg)
 
 	/* Flush any requests before we get started and check basics */
 	force_reset(i915);
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_reset_failed(i915))
 		goto unlock;
 
 	if (intel_has_gpu_reset(i915)) {
@@ -1665,7 +1665,7 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return -EIO; /* we're long past hope of a successful reset */
 
 	wakeref = intel_runtime_pm_get(i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 58144e024751..0f7a5bf69646 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -895,7 +895,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 	if (!HAS_EXECLISTS(i915))
 		return 0;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index fb479a2c04fb..e6ffc8ac22dc 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -181,7 +181,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
 	err = 0;
 	igt_wedge_on_timeout(&wedge, ctx->i915, HZ / 5) /* a safety net! */
 		err = i915_gem_object_set_to_cpu_domain(results, false);
-	if (i915_terminally_wedged(&ctx->i915->gpu_error))
+	if (i915_terminally_wedged(ctx->i915))
 		err = -EIO;
 	if (err)
 		goto out_put;
@@ -510,7 +510,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 	};
 	int err;
 
-	if (i915_terminally_wedged(&i915->gpu_error))
+	if (i915_terminally_wedged(i915))
 		return 0;
 
 	mutex_lock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev8)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (14 preceding siblings ...)
  2019-02-20 14:56 ` [CI] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
@ 2019-02-20 15:42 ` Patchwork
  2019-02-20 16:29 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-02-20 18:45 ` ✓ Fi.CI.IGT: " Patchwork
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-20 15:42 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev8)
URL   : https://patchwork.freedesktop.org/series/56898/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Beware temporary wedging when determining -EIO
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3572:16: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev8)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (15 preceding siblings ...)
  2019-02-20 15:42 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev8) Patchwork
@ 2019-02-20 16:29 ` Patchwork
  2019-02-20 18:45 ` ✓ Fi.CI.IGT: " Patchwork
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-20 16:29 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev8)
URL   : https://patchwork.freedesktop.org/series/56898/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5636 -> Patchwork_12264
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56898/revisions/8/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12264 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-blb-e6850:       NOTRUN -> INCOMPLETE [fdo#107718]

  * igt@kms_busy@basic-flip-b:
    - fi-gdg-551:         PASS -> FAIL [fdo#103182]

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
    - fi-hsw-4770:        PASS -> SKIP [fdo#109271] +1

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@prime_vgem@basic-fence-flip:
    - fi-ilk-650:         FAIL [fdo#104008] -> PASS

  
#### Warnings ####

  * igt@i915_selftest@live_contexts:
    - fi-icl-u2:          INCOMPLETE [fdo#108569] -> DMESG-FAIL [fdo#108569]

  
  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#104008]: https://bugs.freedesktop.org/show_bug.cgi?id=104008
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271


Participating hosts (46 -> 41)
------------------------------

  Missing    (5): fi-kbl-soraka fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-bdw-samus 


Build changes
-------------

    * Linux: CI_DRM_5636 -> Patchwork_12264

  CI_DRM_5636: b33b7e4ffb889d3d3e03ad9b64d0fe15dd2184b4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4842: 54e0e8b14f128919a0dbeb4d4f7b4fbbe30b5f60 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12264: fbed268514495c7420bca22ba206ccdc1cf637a6 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

fbed26851449 drm/i915: Beware temporary wedging when determining -EIO

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12264/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Beware temporary wedging when determining -EIO (rev8)
  2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
                   ` (16 preceding siblings ...)
  2019-02-20 16:29 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-20 18:45 ` Patchwork
  17 siblings, 0 replies; 24+ messages in thread
From: Patchwork @ 2019-02-20 18:45 UTC (permalink / raw)
  To: intel-gfx

== Series Details ==

Series: drm/i915: Beware temporary wedging when determining -EIO (rev8)
URL   : https://patchwork.freedesktop.org/series/56898/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5636_full -> Patchwork_12264_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_12264_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_isolation@vcs0-dirty-create:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109281] +1

  * igt@gem_ctx_isolation@vcs1-dirty-create:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109276] / [fdo#109281]

  * igt@gem_exec_schedule@preempt-queue-bsd1:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109276] +8

  * igt@gem_mocs_settings@mocs-settings-bsd1:
    - shard-apl:          NOTRUN -> SKIP [fdo#109271] +2

  * igt@gem_mocs_settings@mocs-settings-render:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109287] +1

  * igt@gem_stolen@stolen-pread:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109277] +1

  * igt@i915_pm_rc6_residency@rc6-accuracy:
    - shard-snb:          PASS -> SKIP [fdo#109271]

  * igt@i915_pm_rpm@fences:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#108654]

  * igt@i915_pm_rpm@legacy-planes:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#107724]

  * igt@i915_selftest@live_workarounds:
    - shard-iclb:         PASS -> DMESG-FAIL [fdo#108954]

  * igt@kms_atomic_transition@3x-modeset-transitions-fencing:
    - shard-kbl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +1

  * igt@kms_busy@extended-modeset-hang-newfb-render-f:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109278] +3

  * igt@kms_chamelium@dp-hpd-storm:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109284] +5

  * igt@kms_color@pipe-b-ctm-0-75:
    - shard-iclb:         NOTRUN -> DMESG-WARN [fdo#109624]

  * igt@kms_color@pipe-c-degamma:
    - shard-apl:          PASS -> FAIL [fdo#104782]

  * igt@kms_cursor_crc@cursor-256x256-dpms:
    - shard-apl:          PASS -> FAIL [fdo#103232] +2

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions-varying-size:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109274] +5

  * igt@kms_force_connector_basic@force-load-detect:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109285]

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-pwrite:
    - shard-glk:          PASS -> FAIL [fdo#103167] +1

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-pgflip-blt:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109280] +14

  * igt@kms_frontbuffer_tracking@psr-2p-scndscrn-pri-shrfb-draw-mmap-gtt:
    - shard-kbl:          NOTRUN -> SKIP [fdo#109271] +19

  * igt@kms_plane@pixel-format-pipe-b-planes-source-clamping:
    - shard-glk:          PASS -> FAIL [fdo#108948]

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-x:
    - shard-kbl:          NOTRUN -> FAIL [fdo#103166]

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-y:
    - shard-glk:          PASS -> FAIL [fdo#103166]

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-yf:
    - shard-apl:          PASS -> FAIL [fdo#103166] +1

  * igt@kms_psr@psr2_dpms:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109441] +1

  * igt@kms_setmode@basic:
    - shard-apl:          PASS -> FAIL [fdo#99912]

  * igt@kms_vblank@pipe-c-ts-continuation-modeset-hang:
    - shard-apl:          PASS -> FAIL [fdo#104894] +1

  * igt@prime_nv_api@nv_self_import:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109291]

  * igt@testdisplay:
    - shard-apl:          PASS -> INCOMPLETE [fdo#103927]

  
#### Possible fixes ####

  * igt@i915_pm_rpm@debugfs-forcewake-user:
    - shard-iclb:         DMESG-WARN [fdo#107724] -> PASS +3

  * igt@kms_cursor_crc@cursor-256x256-random:
    - shard-apl:          FAIL [fdo#103232] -> PASS

  * igt@kms_cursor_legacy@2x-long-cursor-vs-flip-atomic:
    - shard-hsw:          FAIL [fdo#105767] -> PASS

  * igt@kms_fbcon_fbt@fbc:
    - shard-iclb:         FAIL [fdo#103833] / [fdo#105681] -> PASS

  * igt@kms_flip@2x-plain-flip-fb-recreate-interruptible:
    - shard-glk:          FAIL [fdo#100368] -> PASS

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
    - shard-iclb:         INCOMPLETE [fdo#107713] -> PASS +1

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-none:
    - shard-glk:          FAIL [fdo#103166] -> PASS +1

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-y:
    - shard-apl:          FAIL [fdo#103166] -> PASS

  * igt@kms_rotation_crc@multiplane-rotation-cropping-bottom:
    - shard-kbl:          DMESG-FAIL [fdo#105763] -> PASS

  * igt@kms_vblank@pipe-b-ts-continuation-modeset:
    - shard-apl:          FAIL [fdo#104894] -> PASS +2

  
  [fdo#100368]: https://bugs.freedesktop.org/show_bug.cgi?id=100368
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103833]: https://bugs.freedesktop.org/show_bug.cgi?id=103833
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#104894]: https://bugs.freedesktop.org/show_bug.cgi?id=104894
  [fdo#105681]: https://bugs.freedesktop.org/show_bug.cgi?id=105681
  [fdo#105763]: https://bugs.freedesktop.org/show_bug.cgi?id=105763
  [fdo#105767]: https://bugs.freedesktop.org/show_bug.cgi?id=105767
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#108654]: https://bugs.freedesktop.org/show_bug.cgi?id=108654
  [fdo#108948]: https://bugs.freedesktop.org/show_bug.cgi?id=108948
  [fdo#108954]: https://bugs.freedesktop.org/show_bug.cgi?id=108954
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109277]: https://bugs.freedesktop.org/show_bug.cgi?id=109277
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109281]: https://bugs.freedesktop.org/show_bug.cgi?id=109281
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109287]: https://bugs.freedesktop.org/show_bug.cgi?id=109287
  [fdo#109291]: https://bugs.freedesktop.org/show_bug.cgi?id=109291
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109624]: https://bugs.freedesktop.org/show_bug.cgi?id=109624
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912


Participating hosts (7 -> 6)
------------------------------

  Missing    (1): shard-skl 


Build changes
-------------

    * Linux: CI_DRM_5636 -> Patchwork_12264

  CI_DRM_5636: b33b7e4ffb889d3d3e03ad9b64d0fe15dd2184b4 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4842: 54e0e8b14f128919a0dbeb4d4f7b4fbbe30b5f60 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12264: fbed268514495c7420bca22ba206ccdc1cf637a6 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12264/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-02-20 18:45 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 13:31 [PATCH] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
2019-02-19 13:38 ` [PATCH v2] " Chris Wilson
2019-02-19 13:43   ` Chris Wilson
2019-02-19 13:47     ` Mika Kuoppala
2019-02-19 13:51       ` Chris Wilson
2019-02-19 14:01 ` [PATCH v3] " Chris Wilson
2019-02-19 14:11 ` ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev2) Patchwork
2019-02-19 14:15 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev3) Patchwork
2019-02-19 14:38 ` [PATCH v4] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
2019-02-19 14:40 ` ✓ Fi.CI.BAT: success for drm/i915: Beware temporary wedging when determining -EIO (rev3) Patchwork
2019-02-19 15:14 ` [PATCH v5] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
2019-02-19 15:34 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev5) Patchwork
2019-02-19 16:03 ` ✗ Fi.CI.BAT: failure " Patchwork
2019-02-19 16:06   ` Chris Wilson
2019-02-19 16:08 ` [PATCH v6] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
2019-02-19 16:18 ` [PATCH v7] " Chris Wilson
2019-02-20 14:44   ` Mika Kuoppala
2019-02-19 16:28 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev7) Patchwork
2019-02-19 16:48 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-19 19:21 ` ✓ Fi.CI.IGT: " Patchwork
2019-02-20 14:56 ` [CI] drm/i915: Beware temporary wedging when determining -EIO Chris Wilson
2019-02-20 15:42 ` ✗ Fi.CI.SPARSE: warning for drm/i915: Beware temporary wedging when determining -EIO (rev8) Patchwork
2019-02-20 16:29 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-20 18:45 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.