All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/8] drm/i915: Order assert forcewake test
@ 2019-07-05  7:45 Chris Wilson
  2019-07-05  7:45 ` [PATCH 2/8] drm/i915: Teach execbuffer to take the engine wakeref not GT Chris Wilson
                   ` (11 more replies)
  0 siblings, 12 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-05  7:45 UTC (permalink / raw)
  To: intel-gfx

Read the current value before computing the expected to ensure that if
the timer does complete early (against our will), it should not cause a
false positive.

v2: The local irq disable did not prevent the timer from running.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_uncore.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 2042c94c9cc9..bb9e0da30e94 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -758,19 +758,18 @@ void assert_forcewakes_active(struct intel_uncore *uncore,
 	 * Check that the caller has an explicit wakeref and we don't mistake
 	 * it for the auto wakeref.
 	 */
-	local_irq_disable();
 	for_each_fw_domain_masked(domain, fw_domains, uncore, tmp) {
+		unsigned int actual = READ_ONCE(domain->wake_count);
 		unsigned int expect = 1;
 
 		if (hrtimer_active(&domain->timer) && READ_ONCE(domain->active))
 			expect++; /* pending automatic release */
 
-		if (WARN(domain->wake_count < expect,
+		if (WARN(actual < expect,
 			 "Expected domain %d to be held awake by caller, count=%d\n",
-			 domain->id, domain->wake_count))
+			 domain->id, actual))
 			break;
 	}
-	local_irq_enable();
 }
 
 /* We give fast paths for the really cool registers */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/8] drm/i915: Teach execbuffer to take the engine wakeref not GT
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
@ 2019-07-05  7:45 ` Chris Wilson
  2019-07-05  7:45 ` [PATCH 3/8] drm/i915/gt: Track timeline activeness in enter/exit Chris Wilson
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-05  7:45 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we would like to couple into the engine wakeref to
free the batch pool on idling. The caveat here is that we therefore want
to track the engine wakeref more precisely and to hold it instead of the
broader GT wakeref as we process the ioctl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 36 ++++++++++++-------
 drivers/gpu/drm/i915/gt/intel_context.h       |  7 ++++
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 1c5dfbfad71b..f43eaaa5db5f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2143,13 +2143,35 @@ static int eb_pin_context(struct i915_execbuffer *eb, struct intel_context *ce)
 	if (err)
 		return err;
 
+	/*
+	 * Take a local wakeref for preparing to dispatch the execbuf as
+	 * we expect to access the hardware fairly frequently in the
+	 * process. Upon first dispatch, we acquire another prolonged
+	 * wakeref that we hold until the GPU has been idle for at least
+	 * 100ms.
+	 */
+	err = intel_context_timeline_lock(ce);
+	if (err)
+		goto err_unpin;
+
+	intel_context_enter(ce);
+	intel_context_timeline_unlock(ce);
+
 	eb->engine = ce->engine;
 	eb->context = ce;
 	return 0;
+
+err_unpin:
+	intel_context_unpin(ce);
+	return err;
 }
 
 static void eb_unpin_context(struct i915_execbuffer *eb)
 {
+	__intel_context_timeline_lock(eb->context);
+	intel_context_exit(eb->context);
+	intel_context_timeline_unlock(eb->context);
+
 	intel_context_unpin(eb->context);
 }
 
@@ -2430,18 +2452,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
-	/*
-	 * Take a local wakeref for preparing to dispatch the execbuf as
-	 * we expect to access the hardware fairly frequently in the
-	 * process. Upon first dispatch, we acquire another prolonged
-	 * wakeref that we hold until the GPU has been idle for at least
-	 * 100ms.
-	 */
-	intel_gt_pm_get(&eb.i915->gt);
-
 	err = i915_mutex_lock_interruptible(dev);
 	if (err)
-		goto err_rpm;
+		goto err_context;
 
 	err = eb_select_engine(&eb, file, args);
 	if (unlikely(err))
@@ -2606,8 +2619,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	eb_unpin_context(&eb);
 err_unlock:
 	mutex_unlock(&dev->struct_mutex);
-err_rpm:
-	intel_gt_pm_put(&eb.i915->gt);
+err_context:
 	i915_gem_context_put(eb.gem_context);
 err_destroy:
 	eb_destroy(&eb);
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 40cd8320fcc3..065ba4ac4e87 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -126,6 +126,13 @@ static inline void intel_context_put(struct intel_context *ce)
 	kref_put(&ce->ref, ce->ops->destroy);
 }
 
+static inline void
+__intel_context_timeline_lock(struct intel_context *ce)
+	__acquires(&ce->ring->timeline->mutex)
+{
+	mutex_lock(&ce->ring->timeline->mutex);
+}
+
 static inline int __must_check
 intel_context_timeline_lock(struct intel_context *ce)
 	__acquires(&ce->ring->timeline->mutex)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/8] drm/i915/gt: Track timeline activeness in enter/exit
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
  2019-07-05  7:45 ` [PATCH 2/8] drm/i915: Teach execbuffer to take the engine wakeref not GT Chris Wilson
@ 2019-07-05  7:45 ` Chris Wilson
  2019-07-05  7:46 ` [PATCH 4/8] drm/i915/gt: Convert timeline tracking to spinlock Chris Wilson
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-05  7:45 UTC (permalink / raw)
  To: intel-gfx

Lift moving the timeline to/from the active_list on enter/exit in order
to shorten the active tracking span in comparison to the existing
pin/unpin.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c        |  1 -
 drivers/gpu/drm/i915/gt/intel_context.c       |  2 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  4 +
 drivers/gpu/drm/i915/gt/intel_timeline.c      | 98 +++++++------------
 drivers/gpu/drm/i915/gt/intel_timeline.h      |  3 +-
 .../gpu/drm/i915/gt/intel_timeline_types.h    |  1 +
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 -
 8 files changed, 46 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 4d774376f5b8..93d188526457 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -38,7 +38,6 @@ static void i915_gem_park(struct drm_i915_private *i915)
 		i915_gem_batch_pool_fini(&engine->batch_pool);
 	}
 
-	intel_timelines_park(i915);
 	i915_vma_parked(i915);
 
 	i915_globals_park();
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 1110fc8f657a..0111a18c1f02 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -232,10 +232,12 @@ int __init i915_global_context_init(void)
 void intel_context_enter_engine(struct intel_context *ce)
 {
 	intel_engine_pm_get(ce->engine);
+	intel_timeline_enter(ce->ring->timeline);
 }
 
 void intel_context_exit_engine(struct intel_context *ce)
 {
+	intel_timeline_exit(ce->ring->timeline);
 	intel_engine_pm_put(ce->engine);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 84e432abe8e0..9751a02d86bc 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -88,6 +88,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 
 	/* Check again on the next retirement. */
 	engine->wakeref_serial = engine->serial + 1;
+	intel_timeline_enter(rq->timeline);
 
 	i915_request_add_barriers(rq);
 	__i915_request_commit(rq);
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index e1ae1399c72b..473c9ae8818c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -3188,6 +3188,8 @@ static void virtual_context_enter(struct intel_context *ce)
 
 	for (n = 0; n < ve->num_siblings; n++)
 		intel_engine_pm_get(ve->siblings[n]);
+
+	intel_timeline_enter(ce->ring->timeline);
 }
 
 static void virtual_context_exit(struct intel_context *ce)
@@ -3195,6 +3197,8 @@ static void virtual_context_exit(struct intel_context *ce)
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
 	unsigned int n;
 
+	intel_timeline_exit(ce->ring->timeline);
+
 	for (n = 0; n < ve->num_siblings; n++)
 		intel_engine_pm_put(ve->siblings[n]);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 6daa9eb59e19..4af0b9801d91 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -278,64 +278,11 @@ void intel_timelines_init(struct drm_i915_private *i915)
 	timelines_init(&i915->gt);
 }
 
-static void timeline_add_to_active(struct intel_timeline *tl)
-{
-	struct intel_gt_timelines *gt = &tl->gt->timelines;
-
-	mutex_lock(&gt->mutex);
-	list_add(&tl->link, &gt->active_list);
-	mutex_unlock(&gt->mutex);
-}
-
-static void timeline_remove_from_active(struct intel_timeline *tl)
-{
-	struct intel_gt_timelines *gt = &tl->gt->timelines;
-
-	mutex_lock(&gt->mutex);
-	list_del(&tl->link);
-	mutex_unlock(&gt->mutex);
-}
-
-static void timelines_park(struct intel_gt *gt)
-{
-	struct intel_gt_timelines *timelines = &gt->timelines;
-	struct intel_timeline *timeline;
-
-	mutex_lock(&timelines->mutex);
-	list_for_each_entry(timeline, &timelines->active_list, link) {
-		/*
-		 * All known fences are completed so we can scrap
-		 * the current sync point tracking and start afresh,
-		 * any attempt to wait upon a previous sync point
-		 * will be skipped as the fence was signaled.
-		 */
-		i915_syncmap_free(&timeline->sync);
-	}
-	mutex_unlock(&timelines->mutex);
-}
-
-/**
- * intel_timelines_park - called when the driver idles
- * @i915: the drm_i915_private device
- *
- * When the driver is completely idle, we know that all of our sync points
- * have been signaled and our tracking is then entirely redundant. Any request
- * to wait upon an older sync point will be completed instantly as we know
- * the fence is signaled and therefore we will not even look them up in the
- * sync point map.
- */
-void intel_timelines_park(struct drm_i915_private *i915)
-{
-	timelines_park(&i915->gt);
-}
-
 void intel_timeline_fini(struct intel_timeline *timeline)
 {
 	GEM_BUG_ON(timeline->pin_count);
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 
-	i915_syncmap_free(&timeline->sync);
-
 	if (timeline->hwsp_cacheline)
 		cacheline_free(timeline->hwsp_cacheline);
 	else
@@ -370,6 +317,7 @@ int intel_timeline_pin(struct intel_timeline *tl)
 	if (tl->pin_count++)
 		return 0;
 	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(tl->active_count);
 
 	err = i915_vma_pin(tl->hwsp_ggtt, 0, 0, PIN_GLOBAL | PIN_HIGH);
 	if (err)
@@ -380,7 +328,6 @@ int intel_timeline_pin(struct intel_timeline *tl)
 		offset_in_page(tl->hwsp_offset);
 
 	cacheline_acquire(tl->hwsp_cacheline);
-	timeline_add_to_active(tl);
 
 	return 0;
 
@@ -389,6 +336,40 @@ int intel_timeline_pin(struct intel_timeline *tl)
 	return err;
 }
 
+void intel_timeline_enter(struct intel_timeline *tl)
+{
+	struct intel_gt_timelines *timelines = &tl->gt->timelines;
+
+	GEM_BUG_ON(!tl->pin_count);
+	if (tl->active_count++)
+		return;
+	GEM_BUG_ON(!tl->active_count); /* overflow? */
+
+	mutex_lock(&timelines->mutex);
+	list_add(&tl->link, &timelines->active_list);
+	mutex_unlock(&timelines->mutex);
+}
+
+void intel_timeline_exit(struct intel_timeline *tl)
+{
+	struct intel_gt_timelines *timelines = &tl->gt->timelines;
+
+	GEM_BUG_ON(!tl->active_count);
+	if (--tl->active_count)
+		return;
+
+	mutex_lock(&timelines->mutex);
+	list_del(&tl->link);
+	mutex_unlock(&timelines->mutex);
+
+	/*
+	 * Since this timeline is idle, all bariers upon which we were waiting
+	 * must also be complete and so we can discard the last used barriers
+	 * without loss of information.
+	 */
+	i915_syncmap_free(&tl->sync);
+}
+
 static u32 timeline_advance(struct intel_timeline *tl)
 {
 	GEM_BUG_ON(!tl->pin_count);
@@ -546,16 +527,9 @@ void intel_timeline_unpin(struct intel_timeline *tl)
 	if (--tl->pin_count)
 		return;
 
-	timeline_remove_from_active(tl);
+	GEM_BUG_ON(tl->active_count);
 	cacheline_release(tl->hwsp_cacheline);
 
-	/*
-	 * Since this timeline is idle, all bariers upon which we were waiting
-	 * must also be complete and so we can discard the last used barriers
-	 * without loss of information.
-	 */
-	i915_syncmap_free(&tl->sync);
-
 	__i915_vma_unpin(tl->hwsp_ggtt);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h b/drivers/gpu/drm/i915/gt/intel_timeline.h
index e08cebf64833..f583af1ba18d 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.h
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.h
@@ -77,9 +77,11 @@ static inline bool intel_timeline_sync_is_later(struct intel_timeline *tl,
 }
 
 int intel_timeline_pin(struct intel_timeline *tl);
+void intel_timeline_enter(struct intel_timeline *tl);
 int intel_timeline_get_seqno(struct intel_timeline *tl,
 			     struct i915_request *rq,
 			     u32 *seqno);
+void intel_timeline_exit(struct intel_timeline *tl);
 void intel_timeline_unpin(struct intel_timeline *tl);
 
 int intel_timeline_read_hwsp(struct i915_request *from,
@@ -87,7 +89,6 @@ int intel_timeline_read_hwsp(struct i915_request *from,
 			     u32 *hwsp_offset);
 
 void intel_timelines_init(struct drm_i915_private *i915);
-void intel_timelines_park(struct drm_i915_private *i915);
 void intel_timelines_fini(struct drm_i915_private *i915);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
index 9a71aea7a338..b820ee76b7f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
@@ -58,6 +58,7 @@ struct intel_timeline {
 	 */
 	struct i915_syncmap *sync;
 
+	unsigned int active_count;
 	struct list_head link;
 	struct intel_gt *gt;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
index eae3b1963bf7..9f3100135590 100644
--- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
+++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
@@ -815,8 +815,6 @@ static int live_hwsp_recycle(void *arg)
 
 			if (err)
 				goto out;
-
-			intel_timelines_park(i915); /* Encourage recycling! */
 		} while (!__igt_timeout(end_time, NULL));
 	}
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/8] drm/i915/gt: Convert timeline tracking to spinlock
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
  2019-07-05  7:45 ` [PATCH 2/8] drm/i915: Teach execbuffer to take the engine wakeref not GT Chris Wilson
  2019-07-05  7:45 ` [PATCH 3/8] drm/i915/gt: Track timeline activeness in enter/exit Chris Wilson
@ 2019-07-05  7:46 ` Chris Wilson
  2019-07-05  7:46 ` [PATCH 5/8] drm/i915/gt: Guard timeline pinning with its own mutex Chris Wilson
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-05  7:46 UTC (permalink / raw)
  To: intel-gfx

Convert the list manipulation of active to use spinlocks so that we can
perform the updates from underneath a quick interrupt callback.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_gt_types.h |  2 +-
 drivers/gpu/drm/i915/gt/intel_reset.c    | 13 ++++++++++---
 drivers/gpu/drm/i915/gt/intel_timeline.c | 12 +++++-------
 drivers/gpu/drm/i915/i915_gem.c          | 20 ++++++++++----------
 4 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index c03e56628ee2..cfd41e6c54e1 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -26,7 +26,7 @@ struct intel_gt {
 	struct i915_ggtt *ggtt;
 
 	struct intel_gt_timelines {
-		struct mutex mutex; /* protects list */
+		spinlock_t lock; /* protects active_list */
 		struct list_head active_list;
 
 		/* Pack multiple timelines' seqnos into the same page */
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index adfdb908587f..72002c0f9698 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -858,6 +858,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
+	struct intel_gt_timelines *timelines = &i915->gt.timelines;
 	struct intel_timeline *tl;
 
 	if (!test_bit(I915_WEDGED, &error->flags))
@@ -878,14 +879,16 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 *
 	 * No more can be submitted until we reset the wedged bit.
 	 */
-	mutex_lock(&i915->gt.timelines.mutex);
-	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
+	spin_lock(&timelines->lock);
+	list_for_each_entry(tl, &timelines->active_list, link) {
 		struct i915_request *rq;
 
 		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
+		spin_unlock(&timelines->lock);
+
 		/*
 		 * All internal dependencies (i915_requests) will have
 		 * been flushed by the set-wedge, but we may be stuck waiting
@@ -895,8 +898,12 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 		 */
 		dma_fence_default_wait(&rq->fence, false, MAX_SCHEDULE_TIMEOUT);
 		i915_request_put(rq);
+
+		/* Restart iteration after droping lock */
+		spin_lock(&timelines->lock);
+		tl = list_entry(&timelines->active_list, typeof(*tl), link);
 	}
-	mutex_unlock(&i915->gt.timelines.mutex);
+	spin_unlock(&timelines->lock);
 
 	intel_gt_sanitize(&i915->gt, false);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 4af0b9801d91..355dfc52c804 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -266,7 +266,7 @@ static void timelines_init(struct intel_gt *gt)
 {
 	struct intel_gt_timelines *timelines = &gt->timelines;
 
-	mutex_init(&timelines->mutex);
+	spin_lock_init(&timelines->lock);
 	INIT_LIST_HEAD(&timelines->active_list);
 
 	spin_lock_init(&timelines->hwsp_lock);
@@ -345,9 +345,9 @@ void intel_timeline_enter(struct intel_timeline *tl)
 		return;
 	GEM_BUG_ON(!tl->active_count); /* overflow? */
 
-	mutex_lock(&timelines->mutex);
+	spin_lock(&timelines->lock);
 	list_add(&tl->link, &timelines->active_list);
-	mutex_unlock(&timelines->mutex);
+	spin_unlock(&timelines->lock);
 }
 
 void intel_timeline_exit(struct intel_timeline *tl)
@@ -358,9 +358,9 @@ void intel_timeline_exit(struct intel_timeline *tl)
 	if (--tl->active_count)
 		return;
 
-	mutex_lock(&timelines->mutex);
+	spin_lock(&timelines->lock);
 	list_del(&tl->link);
-	mutex_unlock(&timelines->mutex);
+	spin_unlock(&timelines->lock);
 
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
@@ -548,8 +548,6 @@ static void timelines_fini(struct intel_gt *gt)
 
 	GEM_BUG_ON(!list_empty(&timelines->active_list));
 	GEM_BUG_ON(!list_empty(&timelines->hwsp_free_list));
-
-	mutex_destroy(&timelines->mutex);
 }
 
 void intel_timelines_fini(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7ade42b8ec99..b6f3baa74da4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -909,20 +909,20 @@ static int wait_for_engines(struct drm_i915_private *i915)
 
 static long
 wait_for_timelines(struct drm_i915_private *i915,
-		   unsigned int flags, long timeout)
+		   unsigned int wait, long timeout)
 {
-	struct intel_gt_timelines *gt = &i915->gt.timelines;
+	struct intel_gt_timelines *timelines = &i915->gt.timelines;
 	struct intel_timeline *tl;
 
-	mutex_lock(&gt->mutex);
-	list_for_each_entry(tl, &gt->active_list, link) {
+	spin_lock(&timelines->lock);
+	list_for_each_entry(tl, &timelines->active_list, link) {
 		struct i915_request *rq;
 
 		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
-		mutex_unlock(&gt->mutex);
+		spin_unlock(&timelines->lock);
 
 		/*
 		 * "Race-to-idle".
@@ -933,19 +933,19 @@ wait_for_timelines(struct drm_i915_private *i915,
 		 * want to complete as quickly as possible to avoid prolonged
 		 * stalls, so allow the gpu to boost to maximum clocks.
 		 */
-		if (flags & I915_WAIT_FOR_IDLE_BOOST)
+		if (wait & I915_WAIT_FOR_IDLE_BOOST)
 			gen6_rps_boost(rq);
 
-		timeout = i915_request_wait(rq, flags, timeout);
+		timeout = i915_request_wait(rq, wait, timeout);
 		i915_request_put(rq);
 		if (timeout < 0)
 			return timeout;
 
 		/* restart after reacquiring the lock */
-		mutex_lock(&gt->mutex);
-		tl = list_entry(&gt->active_list, typeof(*tl), link);
+		spin_lock(&timelines->lock);
+		tl = list_entry(&timelines->active_list, typeof(*tl), link);
 	}
-	mutex_unlock(&gt->mutex);
+	spin_unlock(&timelines->lock);
 
 	return timeout;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/8] drm/i915/gt: Guard timeline pinning with its own mutex
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (2 preceding siblings ...)
  2019-07-05  7:46 ` [PATCH 4/8] drm/i915/gt: Convert timeline tracking to spinlock Chris Wilson
@ 2019-07-05  7:46 ` Chris Wilson
  2019-07-05  7:46 ` [PATCH 6/8] drm/i915: Protect request retirement with timeline->mutex Chris Wilson
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-05  7:46 UTC (permalink / raw)
  To: intel-gfx

In preparation for removing struct_mutex from around context retirement,
we need to make timeline pinning safe. Since multiple engines/contexts
can share a single timeline, it needs to be protected by a mutex.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_timeline.c      | 27 +++++++++----------
 .../gpu/drm/i915/gt/intel_timeline_types.h    |  2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  6 ++---
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 355dfc52c804..7b476cd55dac 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -211,9 +211,9 @@ int intel_timeline_init(struct intel_timeline *timeline,
 	void *vaddr;
 
 	kref_init(&timeline->kref);
+	atomic_set(&timeline->pin_count, 0);
 
 	timeline->gt = gt;
-	timeline->pin_count = 0;
 
 	timeline->has_initial_breadcrumb = !hwsp;
 	timeline->hwsp_cacheline = NULL;
@@ -280,7 +280,7 @@ void intel_timelines_init(struct drm_i915_private *i915)
 
 void intel_timeline_fini(struct intel_timeline *timeline)
 {
-	GEM_BUG_ON(timeline->pin_count);
+	GEM_BUG_ON(atomic_read(&timeline->pin_count));
 	GEM_BUG_ON(!list_empty(&timeline->requests));
 
 	if (timeline->hwsp_cacheline)
@@ -314,33 +314,31 @@ int intel_timeline_pin(struct intel_timeline *tl)
 {
 	int err;
 
-	if (tl->pin_count++)
+	if (atomic_add_unless(&tl->pin_count, 1, 0))
 		return 0;
-	GEM_BUG_ON(!tl->pin_count);
-	GEM_BUG_ON(tl->active_count);
 
 	err = i915_vma_pin(tl->hwsp_ggtt, 0, 0, PIN_GLOBAL | PIN_HIGH);
 	if (err)
-		goto unpin;
+		return err;
 
 	tl->hwsp_offset =
 		i915_ggtt_offset(tl->hwsp_ggtt) +
 		offset_in_page(tl->hwsp_offset);
 
 	cacheline_acquire(tl->hwsp_cacheline);
+	if (atomic_fetch_inc(&tl->pin_count)) {
+		cacheline_release(tl->hwsp_cacheline);
+		__i915_vma_unpin(tl->hwsp_ggtt);
+	}
 
 	return 0;
-
-unpin:
-	tl->pin_count = 0;
-	return err;
 }
 
 void intel_timeline_enter(struct intel_timeline *tl)
 {
 	struct intel_gt_timelines *timelines = &tl->gt->timelines;
 
-	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(!atomic_read(&tl->pin_count));
 	if (tl->active_count++)
 		return;
 	GEM_BUG_ON(!tl->active_count); /* overflow? */
@@ -372,7 +370,7 @@ void intel_timeline_exit(struct intel_timeline *tl)
 
 static u32 timeline_advance(struct intel_timeline *tl)
 {
-	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(!atomic_read(&tl->pin_count));
 	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
 
 	return tl->seqno += 1 + tl->has_initial_breadcrumb;
@@ -523,11 +521,10 @@ int intel_timeline_read_hwsp(struct i915_request *from,
 
 void intel_timeline_unpin(struct intel_timeline *tl)
 {
-	GEM_BUG_ON(!tl->pin_count);
-	if (--tl->pin_count)
+	GEM_BUG_ON(!atomic_read(&tl->pin_count));
+	if (!atomic_dec_and_test(&tl->pin_count))
 		return;
 
-	GEM_BUG_ON(tl->active_count);
 	cacheline_release(tl->hwsp_cacheline);
 
 	__i915_vma_unpin(tl->hwsp_ggtt);
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline_types.h b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
index b820ee76b7f5..8dd14a2b8781 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_timeline_types.h
@@ -25,7 +25,7 @@ struct intel_timeline {
 
 	struct mutex mutex; /* protects the flow of requests */
 
-	unsigned int pin_count;
+	atomic_t pin_count;
 	const u32 *hwsp_seqno;
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 490ebd121f4c..a48b36d31e65 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -38,13 +38,13 @@ struct mock_ring {
 
 static void mock_timeline_pin(struct intel_timeline *tl)
 {
-	tl->pin_count++;
+	atomic_inc(&tl->pin_count);
 }
 
 static void mock_timeline_unpin(struct intel_timeline *tl)
 {
-	GEM_BUG_ON(!tl->pin_count);
-	tl->pin_count--;
+	GEM_BUG_ON(!atomic_read(&tl->pin_count));
+	atomic_dec(&tl->pin_count);
 }
 
 static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/8] drm/i915: Protect request retirement with timeline->mutex
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (3 preceding siblings ...)
  2019-07-05  7:46 ` [PATCH 5/8] drm/i915/gt: Guard timeline pinning with its own mutex Chris Wilson
@ 2019-07-05  7:46 ` Chris Wilson
  2019-07-05  7:46 ` [PATCH 7/8] drm/i915: Replace struct_mutex for batch pool serialisation Chris Wilson
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-05  7:46 UTC (permalink / raw)
  To: intel-gfx

Forgo the struct_mutex requirement for request retirement as we have
been transitioning over to only using the timeline->mutex for
controlling the lifetime of a request on that timeline.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 192 ++++++++++--------
 drivers/gpu/drm/i915/gt/intel_context.h       |  25 +--
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |   1 -
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   2 -
 drivers/gpu/drm/i915/gt/intel_gt.c            |   1 -
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |   2 -
 drivers/gpu/drm/i915/gt/intel_lrc.c           |   1 +
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    |  13 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |   1 -
 drivers/gpu/drm/i915/i915_request.c           | 151 +++++++-------
 drivers/gpu/drm/i915/i915_request.h           |   3 -
 11 files changed, 203 insertions(+), 189 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index f43eaaa5db5f..80c9c57a302f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -739,63 +739,6 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	return 0;
 }
 
-static struct i915_request *__eb_wait_for_ring(struct intel_ring *ring)
-{
-	struct i915_request *rq;
-
-	/*
-	 * Completely unscientific finger-in-the-air estimates for suitable
-	 * maximum user request size (to avoid blocking) and then backoff.
-	 */
-	if (intel_ring_update_space(ring) >= PAGE_SIZE)
-		return NULL;
-
-	/*
-	 * Find a request that after waiting upon, there will be at least half
-	 * the ring available. The hysteresis allows us to compete for the
-	 * shared ring and should mean that we sleep less often prior to
-	 * claiming our resources, but not so long that the ring completely
-	 * drains before we can submit our next request.
-	 */
-	list_for_each_entry(rq, &ring->request_list, ring_link) {
-		if (__intel_ring_space(rq->postfix,
-				       ring->emit, ring->size) > ring->size / 2)
-			break;
-	}
-	if (&rq->ring_link == &ring->request_list)
-		return NULL; /* weird, we will check again later for real */
-
-	return i915_request_get(rq);
-}
-
-static int eb_wait_for_ring(const struct i915_execbuffer *eb)
-{
-	struct i915_request *rq;
-	int ret = 0;
-
-	/*
-	 * Apply a light amount of backpressure to prevent excessive hogs
-	 * from blocking waiting for space whilst holding struct_mutex and
-	 * keeping all of their resources pinned.
-	 */
-
-	rq = __eb_wait_for_ring(eb->context->ring);
-	if (rq) {
-		mutex_unlock(&eb->i915->drm.struct_mutex);
-
-		if (i915_request_wait(rq,
-				      I915_WAIT_INTERRUPTIBLE,
-				      MAX_SCHEDULE_TIMEOUT) < 0)
-			ret = -EINTR;
-
-		i915_request_put(rq);
-
-		mutex_lock(&eb->i915->drm.struct_mutex);
-	}
-
-	return ret;
-}
-
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct radix_tree_root *handles_vma = &eb->gem_context->handles_vma;
@@ -2122,10 +2065,75 @@ static const enum intel_engine_id user_ring_map[] = {
 	[I915_EXEC_VEBOX]	= VECS0
 };
 
-static int eb_pin_context(struct i915_execbuffer *eb, struct intel_context *ce)
+static struct i915_request *eb_throttle(struct intel_context *ce)
+{
+	struct intel_ring *ring = ce->ring;
+	struct intel_timeline *tl = ring->timeline;
+	struct i915_request *rq;
+
+	/*
+	 * Completely unscientific finger-in-the-air estimates for suitable
+	 * maximum user request size (to avoid blocking) and then backoff.
+	 */
+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
+		return NULL;
+
+	/*
+	 * Find a request that after waiting upon, there will be at least half
+	 * the ring available. The hysteresis allows us to compete for the
+	 * shared ring and should mean that we sleep less often prior to
+	 * claiming our resources, but not so long that the ring completely
+	 * drains before we can submit our next request.
+	 */
+	list_for_each_entry(rq, &tl->requests, link) {
+		if (rq->ring != ring)
+			continue;
+
+		if (__intel_ring_space(rq->postfix,
+				       ring->emit, ring->size) > ring->size / 2)
+			break;
+	}
+	if (&rq->link == &tl->requests)
+		return NULL; /* weird, we will check again later for real */
+
+	return i915_request_get(rq);
+}
+
+static int
+__eb_pin_context(struct i915_execbuffer *eb, struct intel_context *ce)
 {
 	int err;
 
+	if (likely(atomic_inc_not_zero(&ce->pin_count)))
+		return 0;
+
+	err = mutex_lock_interruptible(&eb->i915->drm.struct_mutex);
+	if (err)
+		return err;
+
+	err = __intel_context_do_pin(ce);
+	mutex_unlock(&eb->i915->drm.struct_mutex);
+
+	return err;
+}
+
+static void
+__eb_unpin_context(struct i915_execbuffer *eb, struct intel_context *ce)
+{
+	if (likely(atomic_add_unless(&ce->pin_count, -1, 1)))
+		return;
+
+	mutex_lock(&eb->i915->drm.struct_mutex);
+	intel_context_unpin(ce);
+	mutex_unlock(&eb->i915->drm.struct_mutex);
+}
+
+static int __eb_pin_engine(struct i915_execbuffer *eb, struct intel_context *ce)
+{
+	struct intel_timeline *tl;
+	struct i915_request *rq;
+	int err;
+
 	/*
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
@@ -2139,7 +2147,7 @@ static int eb_pin_context(struct i915_execbuffer *eb, struct intel_context *ce)
 	 * GGTT space, so do this first before we reserve a seqno for
 	 * ourselves.
 	 */
-	err = intel_context_pin(ce);
+	err = __eb_pin_context(eb, ce);
 	if (err)
 		return err;
 
@@ -2150,29 +2158,52 @@ static int eb_pin_context(struct i915_execbuffer *eb, struct intel_context *ce)
 	 * wakeref that we hold until the GPU has been idle for at least
 	 * 100ms.
 	 */
-	err = intel_context_timeline_lock(ce);
-	if (err)
+	tl = intel_context_timeline_lock(ce);
+	if (IS_ERR(tl)) {
+		err = PTR_ERR(tl);
 		goto err_unpin;
+	}
 
 	intel_context_enter(ce);
-	intel_context_timeline_unlock(ce);
+	rq = eb_throttle(ce);
+
+	intel_context_timeline_unlock(tl);
+
+	if (rq) {
+		if (i915_request_wait(rq,
+				      I915_WAIT_INTERRUPTIBLE,
+				      MAX_SCHEDULE_TIMEOUT) < 0) {
+			i915_request_put(rq);
+			err = -EINTR;
+			goto err_exit;
+		}
+
+		i915_request_put(rq);
+	}
 
 	eb->engine = ce->engine;
 	eb->context = ce;
 	return 0;
 
+err_exit:
+	mutex_lock(&tl->mutex);
+	intel_context_exit(ce);
+	intel_context_timeline_unlock(tl);
 err_unpin:
-	intel_context_unpin(ce);
+	__eb_unpin_context(eb, ce);
 	return err;
 }
 
-static void eb_unpin_context(struct i915_execbuffer *eb)
+static void eb_unpin_engine(struct i915_execbuffer *eb)
 {
-	__intel_context_timeline_lock(eb->context);
-	intel_context_exit(eb->context);
-	intel_context_timeline_unlock(eb->context);
+	struct intel_context *ce = eb->context;
+	struct intel_timeline *tl = ce->ring->timeline;
+
+	mutex_lock(&tl->mutex);
+	intel_context_exit(ce);
+	intel_context_timeline_unlock(tl);
 
-	intel_context_unpin(eb->context);
+	__eb_unpin_context(eb, ce);
 }
 
 static unsigned int
@@ -2217,9 +2248,9 @@ eb_select_legacy_ring(struct i915_execbuffer *eb,
 }
 
 static int
-eb_select_engine(struct i915_execbuffer *eb,
-		 struct drm_file *file,
-		 struct drm_i915_gem_execbuffer2 *args)
+eb_pin_engine(struct i915_execbuffer *eb,
+	      struct drm_file *file,
+	      struct drm_i915_gem_execbuffer2 *args)
 {
 	struct intel_context *ce;
 	unsigned int idx;
@@ -2234,7 +2265,7 @@ eb_select_engine(struct i915_execbuffer *eb,
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
-	err = eb_pin_context(eb, ce);
+	err = __eb_pin_engine(eb, ce);
 	intel_context_put(ce);
 
 	return err;
@@ -2452,16 +2483,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
-	err = i915_mutex_lock_interruptible(dev);
-	if (err)
-		goto err_context;
-
-	err = eb_select_engine(&eb, file, args);
+	err = eb_pin_engine(&eb, file, args);
 	if (unlikely(err))
-		goto err_unlock;
+		goto err_context;
 
-	err = eb_wait_for_ring(&eb); /* may temporarily drop struct_mutex */
-	if (unlikely(err))
+	err = i915_mutex_lock_interruptible(dev);
+	if (err)
 		goto err_engine;
 
 	err = eb_relocate(&eb);
@@ -2615,10 +2642,9 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_vma:
 	if (eb.exec)
 		eb_release_vmas(&eb);
-err_engine:
-	eb_unpin_context(&eb);
-err_unlock:
 	mutex_unlock(&dev->struct_mutex);
+err_engine:
+	eb_unpin_engine(&eb);
 err_context:
 	i915_gem_context_put(eb.gem_context);
 err_destroy:
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 065ba4ac4e87..38b60cbf2592 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -12,6 +12,7 @@
 #include "i915_active.h"
 #include "intel_context_types.h"
 #include "intel_engine_types.h"
+#include "intel_timeline_types.h"
 
 void intel_context_init(struct intel_context *ce,
 			struct i915_gem_context *ctx,
@@ -126,24 +127,24 @@ static inline void intel_context_put(struct intel_context *ce)
 	kref_put(&ce->ref, ce->ops->destroy);
 }
 
-static inline void
-__intel_context_timeline_lock(struct intel_context *ce)
-	__acquires(&ce->ring->timeline->mutex)
-{
-	mutex_lock(&ce->ring->timeline->mutex);
-}
-
-static inline int __must_check
+static inline struct intel_timeline *__must_check
 intel_context_timeline_lock(struct intel_context *ce)
 	__acquires(&ce->ring->timeline->mutex)
 {
-	return mutex_lock_interruptible(&ce->ring->timeline->mutex);
+	struct intel_timeline *tl = ce->ring->timeline;
+	int err;
+
+	err = mutex_lock_interruptible(&tl->mutex);
+	if (err)
+		return ERR_PTR(err);
+
+	return tl;
 }
 
-static inline void intel_context_timeline_unlock(struct intel_context *ce)
-	__releases(&ce->ring->timeline->mutex)
+static inline void intel_context_timeline_unlock(struct intel_timeline *tl)
+	__releases(&tl->mutex)
 {
-	mutex_unlock(&ce->ring->timeline->mutex);
+	mutex_unlock(&tl->mutex);
 }
 
 struct i915_request *intel_context_create_request(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index df5932f5f578..2a10daf1d379 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -750,7 +750,6 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 				engine->status_page.vma))
 		goto out_frame;
 
-	INIT_LIST_HEAD(&frame->ring.request_list);
 	frame->ring.timeline = &frame->timeline;
 	frame->ring.vaddr = frame->cs;
 	frame->ring.size = sizeof(frame->cs);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 7e056114344e..0dde7e04b102 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -69,8 +69,6 @@ struct intel_ring {
 	void *vaddr;
 
 	struct intel_timeline *timeline;
-	struct list_head request_list;
-	struct list_head active_link;
 
 	/*
 	 * As we have two types of rings, one global to the engine used
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 8cca6b22b386..46d24d9d62ac 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -14,7 +14,6 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)
 	gt->i915 = i915;
 	gt->uncore = &i915->uncore;
 
-	INIT_LIST_HEAD(&gt->active_rings);
 	INIT_LIST_HEAD(&gt->closed_vma);
 
 	spin_lock_init(&gt->closed_lock);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index cfd41e6c54e1..f43ea830b1e8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -34,8 +34,6 @@ struct intel_gt {
 		struct list_head hwsp_free_list;
 	} timelines;
 
-	struct list_head active_rings;
-
 	struct intel_wakeref wakeref;
 
 	struct list_head closed_vma;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 473c9ae8818c..4f5195d1bc8b 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1535,6 +1535,7 @@ static void execlists_context_unpin(struct intel_context *ce)
 {
 	i915_gem_context_unpin_hw_id(ce->gem_context);
 	i915_gem_object_unpin_map(ce->state->obj);
+	intel_ring_reset(ce->ring, ce->ring->tail);
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index f804ec35037d..c46ce676d28f 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -1226,7 +1226,7 @@ void intel_ring_unpin(struct intel_ring *ring)
 	GEM_TRACE("ring:%llx unpin\n", ring->timeline->fence_context);
 
 	/* Discard any unused bytes beyond that submitted to hw. */
-	intel_ring_reset(ring, ring->tail);
+	intel_ring_reset(ring, ring->emit);
 
 	GEM_BUG_ON(!ring->vma);
 	i915_vma_unset_ggtt_write(ring->vma);
@@ -1292,7 +1292,6 @@ intel_engine_create_ring(struct intel_engine_cs *engine,
 		return ERR_PTR(-ENOMEM);
 
 	kref_init(&ring->ref);
-	INIT_LIST_HEAD(&ring->request_list);
 	ring->timeline = intel_timeline_get(timeline);
 
 	ring->size = size;
@@ -1816,21 +1815,25 @@ static int ring_request_alloc(struct i915_request *request)
 
 static noinline int wait_for_space(struct intel_ring *ring, unsigned int bytes)
 {
+	struct intel_timeline *tl = ring->timeline;
 	struct i915_request *target;
 	long timeout;
 
 	if (intel_ring_update_space(ring) >= bytes)
 		return 0;
 
-	GEM_BUG_ON(list_empty(&ring->request_list));
-	list_for_each_entry(target, &ring->request_list, ring_link) {
+	GEM_BUG_ON(list_empty(&tl->requests));
+	list_for_each_entry(target, &tl->requests, link) {
+		if (target->ring != ring)
+			continue;
+
 		/* Would completion of this request free enough space? */
 		if (bytes <= __intel_ring_space(target->postfix,
 						ring->emit, ring->size))
 			break;
 	}
 
-	if (WARN_ON(&target->ring_link == &ring->request_list))
+	if (GEM_WARN_ON(&target->link == &tl->requests))
 		return -ENOSPC;
 
 	timeout = i915_request_wait(target,
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index a48b36d31e65..5bcb461b8372 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -68,7 +68,6 @@ static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
 	ring->base.timeline = &ring->timeline;
 	atomic_set(&ring->base.pin_count, 1);
 
-	INIT_LIST_HEAD(&ring->base.request_list);
 	intel_ring_update_space(&ring->base);
 
 	return &ring->base;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5ff87c4a0cd5..2aabed8ed594 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -180,40 +180,6 @@ i915_request_remove_from_client(struct i915_request *request)
 	spin_unlock(&file_priv->mm.lock);
 }
 
-static void advance_ring(struct i915_request *request)
-{
-	struct intel_ring *ring = request->ring;
-	unsigned int tail;
-
-	/*
-	 * We know the GPU must have read the request to have
-	 * sent us the seqno + interrupt, so use the position
-	 * of tail of the request to update the last known position
-	 * of the GPU head.
-	 *
-	 * Note this requires that we are always called in request
-	 * completion order.
-	 */
-	GEM_BUG_ON(!list_is_first(&request->ring_link, &ring->request_list));
-	if (list_is_last(&request->ring_link, &ring->request_list)) {
-		/*
-		 * We may race here with execlists resubmitting this request
-		 * as we retire it. The resubmission will move the ring->tail
-		 * forwards (to request->wa_tail). We either read the
-		 * current value that was written to hw, or the value that
-		 * is just about to be. Either works, if we miss the last two
-		 * noops - they are safe to be replayed on a reset.
-		 */
-		tail = READ_ONCE(request->tail);
-		list_del(&ring->active_link);
-	} else {
-		tail = request->postfix;
-	}
-	list_del_init(&request->ring_link);
-
-	ring->head = tail;
-}
-
 static void free_capture_list(struct i915_request *request)
 {
 	struct i915_capture_list *capture;
@@ -231,7 +197,7 @@ static bool i915_request_retire(struct i915_request *rq)
 {
 	struct i915_active_request *active, *next;
 
-	lockdep_assert_held(&rq->i915->drm.struct_mutex);
+	lockdep_assert_held(&rq->timeline->mutex);
 	if (!i915_request_completed(rq))
 		return false;
 
@@ -243,7 +209,17 @@ static bool i915_request_retire(struct i915_request *rq)
 	GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
 	trace_i915_request_retire(rq);
 
-	advance_ring(rq);
+	/*
+	 * We know the GPU must have read the request to have
+	 * sent us the seqno + interrupt, so use the position
+	 * of tail of the request to update the last known position
+	 * of the GPU head.
+	 *
+	 * Note this requires that we are always called in request
+	 * completion order.
+	 */
+	GEM_BUG_ON(!list_is_first(&rq->link, &rq->timeline->requests));
+	rq->ring->head = rq->postfix;
 
 	/*
 	 * Walk through the active list, calling retire on each. This allows
@@ -320,7 +296,7 @@ static bool i915_request_retire(struct i915_request *rq)
 
 void i915_request_retire_upto(struct i915_request *rq)
 {
-	struct intel_ring *ring = rq->ring;
+	struct intel_timeline * const tl = rq->timeline;
 	struct i915_request *tmp;
 
 	GEM_TRACE("%s fence %llx:%lld, current %d\n",
@@ -328,15 +304,11 @@ void i915_request_retire_upto(struct i915_request *rq)
 		  rq->fence.context, rq->fence.seqno,
 		  hwsp_seqno(rq));
 
-	lockdep_assert_held(&rq->i915->drm.struct_mutex);
+	lockdep_assert_held(&tl->mutex);
 	GEM_BUG_ON(!i915_request_completed(rq));
 
-	if (list_empty(&rq->ring_link))
-		return;
-
 	do {
-		tmp = list_first_entry(&ring->request_list,
-				       typeof(*tmp), ring_link);
+		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
@@ -563,29 +535,28 @@ semaphore_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	return NOTIFY_DONE;
 }
 
-static void ring_retire_requests(struct intel_ring *ring)
+static void retire_requests(struct intel_timeline *tl)
 {
 	struct i915_request *rq, *rn;
 
-	list_for_each_entry_safe(rq, rn, &ring->request_list, ring_link)
+	list_for_each_entry_safe(rq, rn, &tl->requests, link)
 		if (!i915_request_retire(rq))
 			break;
 }
 
 static noinline struct i915_request *
-request_alloc_slow(struct intel_context *ce, gfp_t gfp)
+request_alloc_slow(struct intel_timeline *tl, gfp_t gfp)
 {
-	struct intel_ring *ring = ce->ring;
 	struct i915_request *rq;
 
-	if (list_empty(&ring->request_list))
+	if (list_empty(&tl->requests))
 		goto out;
 
 	if (!gfpflags_allow_blocking(gfp))
 		goto out;
 
 	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry(&ring->request_list, typeof(*rq), ring_link);
+	rq = list_first_entry(&tl->requests, typeof(*rq), link);
 	i915_request_retire(rq);
 
 	rq = kmem_cache_alloc(global.slab_requests,
@@ -594,11 +565,11 @@ request_alloc_slow(struct intel_context *ce, gfp_t gfp)
 		return rq;
 
 	/* Ratelimit ourselves to prevent oom from malicious clients */
-	rq = list_last_entry(&ring->request_list, typeof(*rq), ring_link);
+	rq = list_last_entry(&tl->requests, typeof(*rq), link);
 	cond_synchronize_rcu(rq->rcustate);
 
 	/* Retire our old requests in the hope that we free some */
-	ring_retire_requests(ring);
+	retire_requests(tl);
 
 out:
 	return kmem_cache_alloc(global.slab_requests, gfp);
@@ -649,7 +620,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	rq = kmem_cache_alloc(global.slab_requests,
 			      gfp | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
 	if (unlikely(!rq)) {
-		rq = request_alloc_slow(ce, gfp);
+		rq = request_alloc_slow(tl, gfp);
 		if (!rq) {
 			ret = -ENOMEM;
 			goto err_unreserve;
@@ -741,15 +712,15 @@ struct i915_request *
 i915_request_create(struct intel_context *ce)
 {
 	struct i915_request *rq;
-	int err;
+	struct intel_timeline *tl;
 
-	err = intel_context_timeline_lock(ce);
-	if (err)
-		return ERR_PTR(err);
+	tl = intel_context_timeline_lock(ce);
+	if (IS_ERR(tl))
+		return ERR_CAST(tl);
 
 	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
-	if (!list_is_last(&rq->ring_link, &ce->ring->request_list))
+	rq = list_first_entry(&tl->requests, typeof(*rq), link);
+	if (!list_is_last(&rq->link, &tl->requests))
 		i915_request_retire(rq);
 
 	intel_context_enter(ce);
@@ -759,22 +730,22 @@ i915_request_create(struct intel_context *ce)
 		goto err_unlock;
 
 	/* Check that we do not interrupt ourselves with a new request */
-	rq->cookie = lockdep_pin_lock(&ce->ring->timeline->mutex);
+	rq->cookie = lockdep_pin_lock(&tl->mutex);
 
 	return rq;
 
 err_unlock:
-	intel_context_timeline_unlock(ce);
+	intel_context_timeline_unlock(tl);
 	return rq;
 }
 
 static int
 i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
 {
-	if (list_is_first(&signal->ring_link, &signal->ring->request_list))
+	if (list_is_first(&signal->link, &signal->ring->timeline->requests))
 		return 0;
 
-	signal = list_prev_entry(signal, ring_link);
+	signal = list_prev_entry(signal, link);
 	if (intel_timeline_sync_is_later(rq->timeline, &signal->fence))
 		return 0;
 
@@ -1167,6 +1138,7 @@ struct i915_request *__i915_request_commit(struct i915_request *rq)
 	 */
 	GEM_BUG_ON(rq->reserved_space > ring->space);
 	rq->reserved_space = 0;
+	rq->emitted_jiffies = jiffies;
 
 	/*
 	 * Record the position of the start of the breadcrumb so that
@@ -1180,11 +1152,6 @@ struct i915_request *__i915_request_commit(struct i915_request *rq)
 
 	prev = __i915_request_add_to_timeline(rq);
 
-	list_add_tail(&rq->ring_link, &ring->request_list);
-	if (list_is_first(&rq->ring_link, &ring->request_list))
-		list_add(&ring->active_link, &rq->i915->gt.active_rings);
-	rq->emitted_jiffies = jiffies;
-
 	/*
 	 * Let the backend know a new request has arrived that may need
 	 * to adjust the existing execution schedule due to a high priority
@@ -1237,10 +1204,11 @@ struct i915_request *__i915_request_commit(struct i915_request *rq)
 
 void i915_request_add(struct i915_request *rq)
 {
+	struct intel_timeline * const tl = rq->timeline;
 	struct i915_request *prev;
 
-	lockdep_assert_held(&rq->timeline->mutex);
-	lockdep_unpin_lock(&rq->timeline->mutex, rq->cookie);
+	lockdep_assert_held(&tl->mutex);
+	lockdep_unpin_lock(&tl->mutex, rq->cookie);
 
 	trace_i915_request_add(rq);
 
@@ -1263,10 +1231,10 @@ void i915_request_add(struct i915_request *rq)
 	 * work on behalf of others -- but instead we should benefit from
 	 * improved resource management. (Well, that's the theory at least.)
 	 */
-	if (prev && i915_request_completed(prev))
+	if (prev && i915_request_completed(prev) && prev->timeline == tl)
 		i915_request_retire_upto(prev);
 
-	mutex_unlock(&rq->timeline->mutex);
+	mutex_unlock(&tl->mutex);
 }
 
 static unsigned long local_clock_us(unsigned int *cpu)
@@ -1487,18 +1455,43 @@ long i915_request_wait(struct i915_request *rq,
 
 bool i915_retire_requests(struct drm_i915_private *i915)
 {
-	struct intel_ring *ring, *tmp;
+	struct intel_gt_timelines *timelines = &i915->gt.timelines;
+	struct intel_timeline *tl, *tn;
+	LIST_HEAD(free);
+
+	spin_lock(&timelines->lock);
+	list_for_each_entry_safe(tl, tn, &timelines->active_list, link) {
+		if (!mutex_trylock(&tl->mutex))
+			continue;
+
+		intel_timeline_get(tl);
+		GEM_BUG_ON(!tl->active_count);
+		tl->active_count++; /* pin the list element */
+		spin_unlock(&timelines->lock);
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
+		retire_requests(tl);
 
-	list_for_each_entry_safe(ring, tmp,
-				 &i915->gt.active_rings, active_link) {
-		intel_ring_get(ring); /* last rq holds reference! */
-		ring_retire_requests(ring);
-		intel_ring_put(ring);
+		spin_lock(&timelines->lock);
+
+		/* Restart iteration after dropping lock */
+		list_safe_reset_next(tl, tn, link);
+		if (!--tl->active_count)
+			list_del(&tl->link);
+
+		mutex_unlock(&tl->mutex);
+
+		/* Defer the final release to after the spinlock */
+		if (refcount_dec_and_test(&tl->kref.refcount)) {
+			GEM_BUG_ON(tl->active_count);
+			list_add(&tl->link, &free);
+		}
 	}
+	spin_unlock(&timelines->lock);
+
+	list_for_each_entry_safe(tl, tn, &free, link)
+		__intel_timeline_free(&tl->kref);
 
-	return !list_empty(&i915->gt.active_rings);
+	return !list_empty(&timelines->active_list);
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index b58ceef92e20..a6b1e5f43949 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -221,9 +221,6 @@ struct i915_request {
 	/** timeline->request entry for this request */
 	struct list_head link;
 
-	/** ring->request_list entry for this request */
-	struct list_head ring_link;
-
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_link;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/8] drm/i915: Replace struct_mutex for batch pool serialisation
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (4 preceding siblings ...)
  2019-07-05  7:46 ` [PATCH 6/8] drm/i915: Protect request retirement with timeline->mutex Chris Wilson
@ 2019-07-05  7:46 ` Chris Wilson
  2019-07-05  7:46 ` [PATCH 8/8] drm/i915/gt: Use intel_gt as the primary object for handling resets Chris Wilson
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-05  7:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Matthew Auld

Switch to tracking activity via i915_active on individual nodes, only
keeping a list of retired objects in the cache, and reaping the cache
when the engine itself idles.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   2 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  58 +++---
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |   1 -
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 -
 drivers/gpu/drm/i915/gem/i915_gem_pm.c        |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |   1 -
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  11 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   2 +
 drivers/gpu/drm/i915/gt/intel_engine_pool.c   | 166 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_engine_pool.h   |  34 ++++
 .../gpu/drm/i915/gt/intel_engine_pool_types.h |  29 +++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   6 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |   3 +
 drivers/gpu/drm/i915/i915_debugfs.c           |  68 -------
 drivers/gpu/drm/i915/i915_gem_batch_pool.c    | 132 --------------
 drivers/gpu/drm/i915/i915_gem_batch_pool.h    |  26 ---
 16 files changed, 279 insertions(+), 265 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_pool.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_pool.h
 create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_pool_types.h
 delete mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c
 delete mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 5266dbeab01f..1ae546df284a 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -72,6 +72,7 @@ obj-y += gt/
 gt-y += \
 	gt/intel_breadcrumbs.o \
 	gt/intel_context.o \
+	gt/intel_engine_pool.o \
 	gt/intel_engine_cs.o \
 	gt/intel_engine_pm.o \
 	gt/intel_gt.o \
@@ -125,7 +126,6 @@ i915-y += \
 	  $(gem-y) \
 	  i915_active.o \
 	  i915_cmd_parser.o \
-	  i915_gem_batch_pool.o \
 	  i915_gem_evict.o \
 	  i915_gem_fence_reg.o \
 	  i915_gem_gtt.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 80c9c57a302f..0ea2d49bc8b9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -16,6 +16,7 @@
 
 #include "gem/i915_gem_ioctls.h"
 #include "gt/intel_context.h"
+#include "gt/intel_engine_pool.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 
@@ -1145,25 +1146,26 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 			     unsigned int len)
 {
 	struct reloc_cache *cache = &eb->reloc_cache;
-	struct drm_i915_gem_object *obj;
+	struct intel_engine_pool_node *pool;
 	struct i915_request *rq;
 	struct i915_vma *batch;
 	u32 *cmd;
 	int err;
 
-	obj = i915_gem_batch_pool_get(&eb->engine->batch_pool, PAGE_SIZE);
-	if (IS_ERR(obj))
-		return PTR_ERR(obj);
+	pool = intel_engine_pool_get(&eb->engine->pool, PAGE_SIZE);
+	if (IS_ERR(pool))
+		return PTR_ERR(pool);
 
-	cmd = i915_gem_object_pin_map(obj,
+	cmd = i915_gem_object_pin_map(pool->obj,
 				      cache->has_llc ?
 				      I915_MAP_FORCE_WB :
 				      I915_MAP_FORCE_WC);
-	i915_gem_object_unpin_pages(obj);
-	if (IS_ERR(cmd))
-		return PTR_ERR(cmd);
+	if (IS_ERR(cmd)) {
+		err = PTR_ERR(cmd);
+		goto out_pool;
+	}
 
-	batch = i915_vma_instance(obj, vma->vm, NULL);
+	batch = i915_vma_instance(pool->obj, vma->vm, NULL);
 	if (IS_ERR(batch)) {
 		err = PTR_ERR(batch);
 		goto err_unmap;
@@ -1179,6 +1181,10 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 		goto err_unpin;
 	}
 
+	err = intel_engine_pool_mark_active(pool, rq);
+	if (err)
+		goto err_request;
+
 	err = reloc_move_to_gpu(rq, vma);
 	if (err)
 		goto err_request;
@@ -1204,7 +1210,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	cache->rq_size = 0;
 
 	/* Return with batch mapping (cmd) still pinned */
-	return 0;
+	goto out_pool;
 
 skip_request:
 	i915_request_skip(rq, err);
@@ -1213,7 +1219,9 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 err_unpin:
 	i915_vma_unpin(batch);
 err_unmap:
-	i915_gem_object_unpin_map(obj);
+	i915_gem_object_unpin_map(pool->obj);
+out_pool:
+	intel_engine_pool_put(pool);
 	return err;
 }
 
@@ -1957,18 +1965,17 @@ static int i915_reset_gen7_sol_offsets(struct i915_request *rq)
 
 static struct i915_vma *eb_parse(struct i915_execbuffer *eb, bool is_master)
 {
-	struct drm_i915_gem_object *shadow_batch_obj;
+	struct intel_engine_pool_node *pool;
 	struct i915_vma *vma;
 	int err;
 
-	shadow_batch_obj = i915_gem_batch_pool_get(&eb->engine->batch_pool,
-						   PAGE_ALIGN(eb->batch_len));
-	if (IS_ERR(shadow_batch_obj))
-		return ERR_CAST(shadow_batch_obj);
+	pool = intel_engine_pool_get(&eb->engine->pool, eb->batch_len);
+	if (IS_ERR(pool))
+		return ERR_CAST(pool);
 
 	err = intel_engine_cmd_parser(eb->engine,
 				      eb->batch->obj,
-				      shadow_batch_obj,
+				      pool->obj,
 				      eb->batch_start_offset,
 				      eb->batch_len,
 				      is_master);
@@ -1977,12 +1984,12 @@ static struct i915_vma *eb_parse(struct i915_execbuffer *eb, bool is_master)
 			vma = NULL;
 		else
 			vma = ERR_PTR(err);
-		goto out;
+		goto err;
 	}
 
-	vma = i915_gem_object_ggtt_pin(shadow_batch_obj, NULL, 0, 0, 0);
+	vma = i915_gem_object_ggtt_pin(pool->obj, NULL, 0, 0, 0);
 	if (IS_ERR(vma))
-		goto out;
+		goto err;
 
 	eb->vma[eb->buffer_count] = i915_vma_get(vma);
 	eb->flags[eb->buffer_count] =
@@ -1990,8 +1997,11 @@ static struct i915_vma *eb_parse(struct i915_execbuffer *eb, bool is_master)
 	vma->exec_flags = &eb->flags[eb->buffer_count];
 	eb->buffer_count++;
 
-out:
-	i915_gem_object_unpin_pages(shadow_batch_obj);
+	vma->private = pool;
+	return vma;
+
+err:
+	intel_engine_pool_put(pool);
 	return vma;
 }
 
@@ -2615,6 +2625,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	 * to explicitly hold another reference here.
 	 */
 	eb.request->batch = eb.batch;
+	if (eb.batch->private)
+		intel_engine_pool_mark_active(eb.batch->private, eb.request);
 
 	trace_i915_request_queue(eb.request, eb.batch_flags);
 	err = eb_submit(&eb);
@@ -2639,6 +2651,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_batch_unpin:
 	if (eb.batch_flags & I915_DISPATCH_SECURE)
 		i915_vma_unpin(eb.batch);
+	if (eb.batch->private)
+		intel_engine_pool_put(eb.batch->private);
 err_vma:
 	if (eb.exec)
 		eb_release_vmas(&eb);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index d5197a2a106f..aded95375096 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -64,7 +64,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&obj->vma.list);
 
 	INIT_LIST_HEAD(&obj->lut_list);
-	INIT_LIST_HEAD(&obj->batch_pool_link);
 
 	init_rcu_head(&obj->rcu);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 34b51fad02de..d474c6ac4100 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -114,7 +114,6 @@ struct drm_i915_gem_object {
 	unsigned int userfault_count;
 	struct list_head userfault_link;
 
-	struct list_head batch_pool_link;
 	I915_SELFTEST_DECLARE(struct list_head st_link);
 
 	/*
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 93d188526457..bf085b0cb7c6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -33,10 +33,8 @@ static void i915_gem_park(struct drm_i915_private *i915)
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, i915, id)
 		call_idle_barriers(engine); /* cleanup after wedging */
-		i915_gem_batch_pool_fini(&engine->batch_pool);
-	}
 
 	i915_vma_parked(i915);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 0331e9ac2485..faaa164267f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -9,7 +9,6 @@
 #include <linux/random.h>
 #include <linux/seqlock.h>
 
-#include "i915_gem_batch_pool.h"
 #include "i915_pmu.h"
 #include "i915_reg.h"
 #include "i915_request.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 2a10daf1d379..a8f5ed1d75be 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -32,6 +32,7 @@
 
 #include "intel_engine.h"
 #include "intel_engine_pm.h"
+#include "intel_engine_pool.h"
 #include "intel_context.h"
 #include "intel_lrc.h"
 #include "intel_reset.h"
@@ -498,11 +499,6 @@ int intel_engines_init(struct drm_i915_private *i915)
 	return err;
 }
 
-static void intel_engine_init_batch_pool(struct intel_engine_cs *engine)
-{
-	i915_gem_batch_pool_init(&engine->batch_pool, engine);
-}
-
 void intel_engine_init_execlists(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -628,10 +624,11 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine)
 	intel_engine_init_breadcrumbs(engine);
 	intel_engine_init_execlists(engine);
 	intel_engine_init_hangcheck(engine);
-	intel_engine_init_batch_pool(engine);
 	intel_engine_init_cmd_parser(engine);
 	intel_engine_init__pm(engine);
 
+	intel_engine_pool_init(&engine->pool);
+
 	/* Use the whole device by default */
 	engine->sseu =
 		intel_sseu_from_device_info(&RUNTIME_INFO(engine->i915)->sseu);
@@ -885,9 +882,9 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 
 	cleanup_status_page(engine);
 
+	intel_engine_pool_fini(&engine->pool);
 	intel_engine_fini_breadcrumbs(engine);
 	intel_engine_cleanup_cmd_parser(engine);
-	i915_gem_batch_pool_fini(&engine->batch_pool);
 
 	if (engine->default_state)
 		i915_gem_object_put(engine->default_state);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 9751a02d86bc..fe9f9eaffe88 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -7,6 +7,7 @@
 #include "i915_drv.h"
 
 #include "intel_engine.h"
+#include "intel_engine_pool.h"
 #include "intel_engine_pm.h"
 #include "intel_gt_pm.h"
 
@@ -116,6 +117,7 @@ static int __engine_park(struct intel_wakeref *wf)
 	GEM_TRACE("%s\n", engine->name);
 
 	intel_engine_disarm_breadcrumbs(engine);
+	intel_engine_pool_park(&engine->pool);
 
 	/* Must be reset upon idling, or we may miss the busy wakeup. */
 	GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pool.c b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
new file mode 100644
index 000000000000..32688ca379ef
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
@@ -0,0 +1,166 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2014-2018 Intel Corporation
+ */
+
+#include "gem/i915_gem_object.h"
+
+#include "i915_drv.h"
+#include "intel_engine_pm.h"
+#include "intel_engine_pool.h"
+
+static struct intel_engine_cs *to_engine(struct intel_engine_pool *pool)
+{
+	return container_of(pool, struct intel_engine_cs, pool);
+}
+
+static struct list_head *
+bucket_for_size(struct intel_engine_pool *pool, size_t sz)
+{
+	int n;
+
+	/*
+	 * Compute a power-of-two bucket, but throw everything greater than
+	 * 16KiB into the same bucket: i.e. the buckets hold objects of
+	 * (1 page, 2 pages, 4 pages, 8+ pages).
+	 */
+	n = fls(sz >> PAGE_SHIFT) - 1;
+	if (n >= ARRAY_SIZE(pool->cache_list))
+		n = ARRAY_SIZE(pool->cache_list) - 1;
+
+	return &pool->cache_list[n];
+}
+
+static void node_free(struct intel_engine_pool_node *node)
+{
+	i915_gem_object_put(node->obj);
+	i915_active_fini(&node->active);
+	kfree(node);
+}
+
+static int pool_active(struct i915_active *ref)
+{
+	struct intel_engine_pool_node *node =
+		container_of(ref, typeof(*node), active);
+	struct reservation_object *resv = node->obj->base.resv;
+
+	if (reservation_object_trylock(resv)) {
+		reservation_object_add_excl_fence(resv, NULL);
+		reservation_object_unlock(resv);
+	}
+
+	return i915_gem_object_pin_pages(node->obj);
+}
+
+static void pool_retire(struct i915_active *ref)
+{
+	struct intel_engine_pool_node *node =
+		container_of(ref, typeof(*node), active);
+	struct intel_engine_pool *pool = node->pool;
+	struct list_head *list = bucket_for_size(pool, node->obj->base.size);
+	unsigned long flags;
+
+	GEM_BUG_ON(!intel_engine_pm_is_awake(to_engine(pool)));
+
+	i915_gem_object_unpin_pages(node->obj);
+
+	spin_lock_irqsave(&pool->lock, flags);
+	list_add(&node->link, list);
+	spin_unlock_irqrestore(&pool->lock, flags);
+}
+
+static struct intel_engine_pool_node *
+node_create(struct intel_engine_pool *pool, size_t sz)
+{
+	struct intel_engine_cs *engine = to_engine(pool);
+	struct intel_engine_pool_node *node;
+	struct drm_i915_gem_object *obj;
+
+	node = kmalloc(sizeof(*node),
+		       GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+	if (!node)
+		return ERR_PTR(-ENOMEM);
+
+	node->pool = pool;
+	i915_active_init(engine->i915, &node->active, pool_active, pool_retire);
+
+	obj = i915_gem_object_create_internal(engine->i915, sz);
+	if (IS_ERR(obj)) {
+		i915_active_fini(&node->active);
+		kfree(node);
+		return ERR_CAST(obj);
+	}
+
+	node->obj = obj;
+	return node;
+}
+
+struct intel_engine_pool_node *
+intel_engine_pool_get(struct intel_engine_pool *pool, size_t size)
+{
+	struct intel_engine_pool_node *node;
+	struct list_head *list;
+	unsigned long flags;
+	int ret;
+
+	GEM_BUG_ON(!intel_engine_pm_is_awake(to_engine(pool)));
+
+	size = PAGE_ALIGN(size);
+	list = bucket_for_size(pool, size);
+
+	spin_lock_irqsave(&pool->lock, flags);
+	list_for_each_entry(node, list, link) {
+		if (node->obj->base.size < size)
+			continue;
+		list_del(&node->link);
+		break;
+	}
+	spin_unlock_irqrestore(&pool->lock, flags);
+
+	if (&node->link == list) {
+		node = node_create(pool, size);
+		if (IS_ERR(node))
+			return node;
+	}
+
+	ret = i915_active_acquire(&node->active);
+	if (ret) {
+		node_free(node);
+		return ERR_PTR(ret);
+	}
+
+	return node;
+}
+
+void intel_engine_pool_init(struct intel_engine_pool *pool)
+{
+	int n;
+
+	spin_lock_init(&pool->lock);
+	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
+		INIT_LIST_HEAD(&pool->cache_list[n]);
+}
+
+void intel_engine_pool_park(struct intel_engine_pool *pool)
+{
+	int n;
+
+	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++) {
+		struct list_head *list = &pool->cache_list[n];
+		struct intel_engine_pool_node *node, *nn;
+
+		list_for_each_entry_safe(node, nn, list, link)
+			node_free(node);
+
+		INIT_LIST_HEAD(list);
+	}
+}
+
+void intel_engine_pool_fini(struct intel_engine_pool *pool)
+{
+	int n;
+
+	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
+		GEM_BUG_ON(!list_empty(&pool->cache_list[n]));
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pool.h b/drivers/gpu/drm/i915/gt/intel_engine_pool.h
new file mode 100644
index 000000000000..f7a0a660c1c9
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pool.h
@@ -0,0 +1,34 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2014-2018 Intel Corporation
+ */
+
+#ifndef INTEL_ENGINE_POOL_H
+#define INTEL_ENGINE_POOL_H
+
+#include "intel_engine_pool_types.h"
+#include "i915_active.h"
+#include "i915_request.h"
+
+struct intel_engine_pool_node *
+intel_engine_pool_get(struct intel_engine_pool *pool, size_t size);
+
+static inline int
+intel_engine_pool_mark_active(struct intel_engine_pool_node *node,
+			      struct i915_request *rq)
+{
+	return i915_active_ref(&node->active, rq->fence.context, rq);
+}
+
+static inline void
+intel_engine_pool_put(struct intel_engine_pool_node *node)
+{
+	i915_active_release(&node->active);
+}
+
+void intel_engine_pool_init(struct intel_engine_pool *pool);
+void intel_engine_pool_park(struct intel_engine_pool *pool);
+void intel_engine_pool_fini(struct intel_engine_pool *pool);
+
+#endif /* INTEL_ENGINE_POOL_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pool_types.h b/drivers/gpu/drm/i915/gt/intel_engine_pool_types.h
new file mode 100644
index 000000000000..e31ee361b76f
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pool_types.h
@@ -0,0 +1,29 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2014-2018 Intel Corporation
+ */
+
+#ifndef INTEL_ENGINE_POOL_TYPES_H
+#define INTEL_ENGINE_POOL_TYPES_H
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+
+#include "i915_active_types.h"
+
+struct drm_i915_gem_object;
+
+struct intel_engine_pool {
+	spinlock_t lock;
+	struct list_head cache_list[4];
+};
+
+struct intel_engine_pool_node {
+	struct i915_active active;
+	struct drm_i915_gem_object *obj;
+	struct list_head link;
+	struct intel_engine_pool *pool;
+};
+
+#endif /* INTEL_ENGINE_POOL_TYPES_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 0dde7e04b102..6d2f3e11da1c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -16,12 +16,12 @@
 #include <linux/types.h>
 
 #include "i915_gem.h"
-#include "i915_gem_batch_pool.h"
 #include "i915_pmu.h"
 #include "i915_priolist_types.h"
 #include "i915_selftest.h"
-#include "gt/intel_timeline_types.h"
+#include "intel_engine_pool_types.h"
 #include "intel_sseu.h"
+#include "intel_timeline_types.h"
 #include "intel_wakeref.h"
 #include "intel_workarounds_types.h"
 
@@ -353,7 +353,7 @@ struct intel_engine_cs {
 	 * when the command parser is enabled. Prevents the client from
 	 * modifying the batch contents after software parsing.
 	 */
-	struct i915_gem_batch_pool batch_pool;
+	struct intel_engine_pool pool;
 
 	struct intel_hw_status_page status_page;
 	struct i915_ctx_workarounds wa_ctx;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 5bcb461b8372..b94d57bf2c48 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -27,6 +27,7 @@
 #include "i915_drv.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
+#include "intel_engine_pool.h"
 
 #include "mock_engine.h"
 #include "selftests/mock_request.h"
@@ -291,6 +292,8 @@ int mock_engine_init(struct intel_engine_cs *engine)
 	intel_engine_init_execlists(engine);
 	intel_engine_init__pm(engine);
 
+	intel_engine_pool_init(&engine->pool);
+
 	engine->kernel_context =
 		i915_gem_context_get_engine(i915->kernel_context, engine->id);
 	if (IS_ERR(engine->kernel_context))
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3e4f58f19362..ce1b6568515e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -295,27 +295,6 @@ static int per_file_stats(int id, void *ptr, void *data)
 			   stats.closed); \
 } while (0)
 
-static void print_batch_pool_stats(struct seq_file *m,
-				   struct drm_i915_private *dev_priv)
-{
-	struct drm_i915_gem_object *obj;
-	struct intel_engine_cs *engine;
-	struct file_stats stats = {};
-	enum intel_engine_id id;
-	int j;
-
-	for_each_engine(engine, dev_priv, id) {
-		for (j = 0; j < ARRAY_SIZE(engine->batch_pool.cache_list); j++) {
-			list_for_each_entry(obj,
-					    &engine->batch_pool.cache_list[j],
-					    batch_pool_link)
-				per_file_stats(0, obj, &stats);
-		}
-	}
-
-	print_file_stats(m, "[k]batch pool", stats);
-}
-
 static void print_context_stats(struct seq_file *m,
 				struct drm_i915_private *i915)
 {
@@ -373,58 +352,12 @@ static int i915_gem_object_info(struct seq_file *m, void *data)
 	if (ret)
 		return ret;
 
-	print_batch_pool_stats(m, i915);
 	print_context_stats(m, i915);
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	return 0;
 }
 
-static int i915_gem_batch_pool_info(struct seq_file *m, void *data)
-{
-	struct drm_i915_private *dev_priv = node_to_i915(m->private);
-	struct drm_device *dev = &dev_priv->drm;
-	struct drm_i915_gem_object *obj;
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	int total = 0;
-	int ret, j;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
-
-	for_each_engine(engine, dev_priv, id) {
-		for (j = 0; j < ARRAY_SIZE(engine->batch_pool.cache_list); j++) {
-			int count;
-
-			count = 0;
-			list_for_each_entry(obj,
-					    &engine->batch_pool.cache_list[j],
-					    batch_pool_link)
-				count++;
-			seq_printf(m, "%s cache[%d]: %d objects\n",
-				   engine->name, j, count);
-
-			list_for_each_entry(obj,
-					    &engine->batch_pool.cache_list[j],
-					    batch_pool_link) {
-				seq_puts(m, "   ");
-				describe_obj(m, obj);
-				seq_putc(m, '\n');
-			}
-
-			total += count;
-		}
-	}
-
-	seq_printf(m, "total: %d\n", total);
-
-	mutex_unlock(&dev->struct_mutex);
-
-	return 0;
-}
-
 static void gen8_display_interrupt_info(struct seq_file *m)
 {
 	struct drm_i915_private *dev_priv = node_to_i915(m->private);
@@ -4371,7 +4304,6 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_gem_objects", i915_gem_object_info, 0},
 	{"i915_gem_fence_regs", i915_gem_fence_regs_info, 0},
 	{"i915_gem_interrupt", i915_interrupt_info, 0},
-	{"i915_gem_batch_pool", i915_gem_batch_pool_info, 0},
 	{"i915_guc_info", i915_guc_info, 0},
 	{"i915_guc_load_status", i915_guc_load_status_info, 0},
 	{"i915_guc_log_dump", i915_guc_log_dump, 0},
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
deleted file mode 100644
index b17f23991253..000000000000
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ /dev/null
@@ -1,132 +0,0 @@
-/*
- * SPDX-License-Identifier: MIT
- *
- * Copyright © 2014-2018 Intel Corporation
- */
-
-#include "i915_gem_batch_pool.h"
-#include "i915_drv.h"
-
-/**
- * DOC: batch pool
- *
- * In order to submit batch buffers as 'secure', the software command parser
- * must ensure that a batch buffer cannot be modified after parsing. It does
- * this by copying the user provided batch buffer contents to a kernel owned
- * buffer from which the hardware will actually execute, and by carefully
- * managing the address space bindings for such buffers.
- *
- * The batch pool framework provides a mechanism for the driver to manage a
- * set of scratch buffers to use for this purpose. The framework can be
- * extended to support other uses cases should they arise.
- */
-
-/**
- * i915_gem_batch_pool_init() - initialize a batch buffer pool
- * @pool: the batch buffer pool
- * @engine: the associated request submission engine
- */
-void i915_gem_batch_pool_init(struct i915_gem_batch_pool *pool,
-			      struct intel_engine_cs *engine)
-{
-	int n;
-
-	pool->engine = engine;
-
-	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
-		INIT_LIST_HEAD(&pool->cache_list[n]);
-}
-
-/**
- * i915_gem_batch_pool_fini() - clean up a batch buffer pool
- * @pool: the pool to clean up
- *
- * Note: Callers must hold the struct_mutex.
- */
-void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool)
-{
-	int n;
-
-	lockdep_assert_held(&pool->engine->i915->drm.struct_mutex);
-
-	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++) {
-		struct drm_i915_gem_object *obj, *next;
-
-		list_for_each_entry_safe(obj, next,
-					 &pool->cache_list[n],
-					 batch_pool_link)
-			i915_gem_object_put(obj);
-
-		INIT_LIST_HEAD(&pool->cache_list[n]);
-	}
-}
-
-/**
- * i915_gem_batch_pool_get() - allocate a buffer from the pool
- * @pool: the batch buffer pool
- * @size: the minimum desired size of the returned buffer
- *
- * Returns an inactive buffer from @pool with at least @size bytes,
- * with the pages pinned. The caller must i915_gem_object_unpin_pages()
- * on the returned object.
- *
- * Note: Callers must hold the struct_mutex
- *
- * Return: the buffer object or an error pointer
- */
-struct drm_i915_gem_object *
-i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
-			size_t size)
-{
-	struct drm_i915_gem_object *obj;
-	struct list_head *list;
-	int n, ret;
-
-	lockdep_assert_held(&pool->engine->i915->drm.struct_mutex);
-
-	/* Compute a power-of-two bucket, but throw everything greater than
-	 * 16KiB into the same bucket: i.e. the the buckets hold objects of
-	 * (1 page, 2 pages, 4 pages, 8+ pages).
-	 */
-	n = fls(size >> PAGE_SHIFT) - 1;
-	if (n >= ARRAY_SIZE(pool->cache_list))
-		n = ARRAY_SIZE(pool->cache_list) - 1;
-	list = &pool->cache_list[n];
-
-	list_for_each_entry(obj, list, batch_pool_link) {
-		struct reservation_object *resv = obj->base.resv;
-
-		/* The batches are strictly LRU ordered */
-		if (!reservation_object_test_signaled_rcu(resv, true))
-			break;
-
-		/*
-		 * The object is now idle, clear the array of shared
-		 * fences before we add a new request. Although, we
-		 * remain on the same engine, we may be on a different
-		 * timeline and so may continually grow the array,
-		 * trapping a reference to all the old fences, rather
-		 * than replace the existing fence.
-		 */
-		if (rcu_access_pointer(resv->fence)) {
-			reservation_object_lock(resv, NULL);
-			reservation_object_add_excl_fence(resv, NULL);
-			reservation_object_unlock(resv);
-		}
-
-		if (obj->base.size >= size)
-			goto found;
-	}
-
-	obj = i915_gem_object_create_internal(pool->engine->i915, size);
-	if (IS_ERR(obj))
-		return obj;
-
-found:
-	ret = i915_gem_object_pin_pages(obj);
-	if (ret)
-		return ERR_PTR(ret);
-
-	list_move_tail(&obj->batch_pool_link, list);
-	return obj;
-}
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.h b/drivers/gpu/drm/i915/i915_gem_batch_pool.h
deleted file mode 100644
index feeeeeaa54d8..000000000000
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.h
+++ /dev/null
@@ -1,26 +0,0 @@
-/*
- * SPDX-License-Identifier: MIT
- *
- * Copyright © 2014-2018 Intel Corporation
- */
-
-#ifndef I915_GEM_BATCH_POOL_H
-#define I915_GEM_BATCH_POOL_H
-
-#include <linux/types.h>
-
-struct drm_i915_gem_object;
-struct intel_engine_cs;
-
-struct i915_gem_batch_pool {
-	struct intel_engine_cs *engine;
-	struct list_head cache_list[4];
-};
-
-void i915_gem_batch_pool_init(struct i915_gem_batch_pool *pool,
-			      struct intel_engine_cs *engine);
-void i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool);
-struct drm_i915_gem_object *
-i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, size_t size);
-
-#endif /* I915_GEM_BATCH_POOL_H */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/8] drm/i915/gt: Use intel_gt as the primary object for handling resets
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (5 preceding siblings ...)
  2019-07-05  7:46 ` [PATCH 7/8] drm/i915: Replace struct_mutex for batch pool serialisation Chris Wilson
@ 2019-07-05  7:46 ` Chris Wilson
  2019-07-08 22:48   ` Daniele Ceraolo Spurio
  2019-07-05  9:53 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Order assert forcewake test Patchwork
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 16+ messages in thread
From: Chris Wilson @ 2019-07-05  7:46 UTC (permalink / raw)
  To: intel-gfx

Having taken the first step in encapsulating the functionality by moving
the related files under gt/, the next step is to start encapsulating by
passing around the relevant structs rather than the global
drm_i915_private. In this step, we pass intel_gt to intel_reset.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  21 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_pm.c        |  25 +-
 drivers/gpu/drm/i915/gem/i915_gem_throttle.c  |   2 +-
 .../gpu/drm/i915/gem/selftests/huge_pages.c   |  20 +-
 .../i915/gem/selftests/i915_gem_client_blt.c  |   4 +-
 .../i915/gem/selftests/i915_gem_coherency.c   |   6 +-
 .../drm/i915/gem/selftests/i915_gem_context.c |  17 +-
 .../drm/i915/gem/selftests/i915_gem_mman.c    |   2 +-
 .../i915/gem/selftests/i915_gem_object_blt.c  |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |   8 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  16 +-
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   3 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            |   7 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |  12 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |  19 +-
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  12 +
 drivers/gpu/drm/i915/gt/intel_hangcheck.c     |  67 +--
 drivers/gpu/drm/i915/gt/intel_lrc.c           |   2 +-
 drivers/gpu/drm/i915/gt/intel_reset.c         | 435 +++++++++---------
 drivers/gpu/drm/i915/gt/intel_reset.h         |  73 +--
 drivers/gpu/drm/i915/gt/intel_reset_types.h   |  50 ++
 drivers/gpu/drm/i915/gt/intel_ringbuffer.c    |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 107 +++--
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  36 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      |  50 +-
 drivers/gpu/drm/i915/gt/selftest_timeline.c   |   3 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  33 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |  63 +--
 drivers/gpu/drm/i915/i915_drv.c               |   5 +-
 drivers/gpu/drm/i915/i915_drv.h               |  35 +-
 drivers/gpu/drm/i915/i915_gem.c               |  31 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |  52 +--
 drivers/gpu/drm/i915/i915_request.c           |   5 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |   2 +-
 drivers/gpu/drm/i915/intel_uc.c               |   2 +-
 drivers/gpu/drm/i915/selftests/i915_active.c  |   3 +-
 drivers/gpu/drm/i915/selftests/i915_gem.c     |   3 +-
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |   3 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |   4 +-
 .../gpu/drm/i915/selftests/i915_selftest.c    |   2 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |   5 +-
 drivers/gpu/drm/i915/selftests/igt_reset.c    |  38 +-
 drivers/gpu/drm/i915/selftests/igt_reset.h    |  10 +-
 drivers/gpu/drm/i915/selftests/igt_wedge_me.h |  58 ---
 .../gpu/drm/i915/selftests/mock_gem_device.c  |   5 -
 48 files changed, 665 insertions(+), 709 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_reset_types.h
 delete mode 100644 drivers/gpu/drm/i915/selftests/igt_wedge_me.h

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 919f5ac844c8..4dd6856bf722 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -4249,12 +4249,12 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
 		return;
 
 	/* We have a modeset vs reset deadlock, defensively unbreak it. */
-	set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
-	wake_up_all(&dev_priv->gpu_error.wait_queue);
+	set_bit(I915_RESET_MODESET, &dev_priv->gt.reset.flags);
+	wake_up_bit(&dev_priv->gt.reset.flags, I915_RESET_MODESET);
 
 	if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) {
 		DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n");
-		i915_gem_set_wedged(dev_priv);
+		intel_gt_set_wedged(&dev_priv->gt);
 	}
 
 	/*
@@ -4300,7 +4300,7 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
 	int ret;
 
 	/* reset doesn't touch the display */
-	if (!test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags))
+	if (!test_bit(I915_RESET_MODESET, &dev_priv->gt.reset.flags))
 		return;
 
 	state = fetch_and_zero(&dev_priv->modeset_restore_state);
@@ -4340,7 +4340,7 @@ void intel_finish_reset(struct drm_i915_private *dev_priv)
 	drm_modeset_acquire_fini(ctx);
 	mutex_unlock(&dev->mode_config.mutex);
 
-	clear_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
+	clear_bit_unlock(I915_RESET_MODESET, &dev_priv->gt.reset.flags);
 }
 
 static void icl_set_pipe_chicken(struct intel_crtc *crtc)
@@ -13827,18 +13827,21 @@ static void intel_atomic_commit_fence_wait(struct intel_atomic_state *intel_stat
 	for (;;) {
 		prepare_to_wait(&intel_state->commit_ready.wait,
 				&wait_fence, TASK_UNINTERRUPTIBLE);
-		prepare_to_wait(&dev_priv->gpu_error.wait_queue,
+		prepare_to_wait(bit_waitqueue(&dev_priv->gt.reset.flags,
+					      I915_RESET_MODESET),
 				&wait_reset, TASK_UNINTERRUPTIBLE);
 
 
-		if (i915_sw_fence_done(&intel_state->commit_ready)
-		    || test_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags))
+		if (i915_sw_fence_done(&intel_state->commit_ready) ||
+		    test_bit(I915_RESET_MODESET, &dev_priv->gt.reset.flags))
 			break;
 
 		schedule();
 	}
 	finish_wait(&intel_state->commit_ready.wait, &wait_fence);
-	finish_wait(&dev_priv->gpu_error.wait_queue, &wait_reset);
+	finish_wait(bit_waitqueue(&dev_priv->gt.reset.flags,
+				  I915_RESET_MODESET),
+		    &wait_reset);
 }
 
 static void intel_atomic_cleanup_work(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index e367dce2a696..dac86884c45d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -2144,7 +2144,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (args->flags & I915_CONTEXT_CREATE_FLAGS_UNKNOWN)
 		return -EINVAL;
 
-	ret = i915_terminally_wedged(i915);
+	ret = intel_gt_terminally_wedged(&i915->gt);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 0ea2d49bc8b9..3506aafada6c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2148,7 +2148,7 @@ static int __eb_pin_engine(struct i915_execbuffer *eb, struct intel_context *ce)
 	 * ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
-	err = i915_terminally_wedged(eb->i915);
+	err = intel_gt_terminally_wedged(ce->engine->gt);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 391621ee3cbb..a564c1e4231b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -7,6 +7,8 @@
 #include <linux/mman.h>
 #include <linux/sizes.h>
 
+#include "gt/intel_gt.h"
+
 #include "i915_drv.h"
 #include "i915_gem_gtt.h"
 #include "i915_gem_ioctls.h"
@@ -246,7 +248,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 
 	wakeref = intel_runtime_pm_get(rpm);
 
-	srcu = i915_reset_trylock(i915);
+	srcu = intel_gt_reset_trylock(ggtt->vm.gt);
 	if (srcu < 0) {
 		ret = srcu;
 		goto err_rpm;
@@ -326,7 +328,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 err_unlock:
 	mutex_unlock(&dev->struct_mutex);
 err_reset:
-	i915_reset_unlock(i915, srcu);
+	intel_gt_reset_unlock(ggtt->vm.gt, srcu);
 err_rpm:
 	intel_runtime_pm_put(rpm, wakeref);
 	i915_gem_object_unpin_pages(obj);
@@ -339,7 +341,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		 * fail). But any other -EIO isn't ours (e.g. swap in failure)
 		 * and so needs to be reported.
 		 */
-		if (!i915_terminally_wedged(i915))
+		if (!intel_gt_is_wedged(ggtt->vm.gt))
 			return VM_FAULT_SIGBUS;
 		/* else: fall through */
 	case -EAGAIN:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index bf085b0cb7c6..8e2eeaec06cb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -5,6 +5,7 @@
  */
 
 #include "gem/i915_gem_pm.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_pm.h"
 
 #include "i915_drv.h"
@@ -103,18 +104,18 @@ static int pm_notifier(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
-static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
+static bool switch_to_kernel_context_sync(struct intel_gt *gt)
 {
-	bool result = !i915_terminally_wedged(i915);
+	bool result = !intel_gt_is_wedged(gt);
 
 	do {
-		if (i915_gem_wait_for_idle(i915,
+		if (i915_gem_wait_for_idle(gt->i915,
 					   I915_WAIT_LOCKED |
 					   I915_WAIT_FOR_IDLE_BOOST,
 					   I915_GEM_IDLE_TIMEOUT) == -ETIME) {
 			/* XXX hide warning from gem_eio */
 			if (i915_modparams.reset) {
-				dev_err(i915->drm.dev,
+				dev_err(gt->i915->drm.dev,
 					"Failed to idle engines, declaring wedged!\n");
 				GEM_TRACE_DUMP();
 			}
@@ -123,18 +124,18 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
 			 * Forcibly cancel outstanding work and leave
 			 * the gpu quiet.
 			 */
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(gt);
 			result = false;
 		}
-	} while (i915_retire_requests(i915) && result);
+	} while (i915_retire_requests(gt->i915) && result);
 
-	GEM_BUG_ON(i915->gt.awake);
+	GEM_BUG_ON(gt->awake);
 	return result;
 }
 
 bool i915_gem_load_power_context(struct drm_i915_private *i915)
 {
-	return switch_to_kernel_context_sync(i915);
+	return switch_to_kernel_context_sync(&i915->gt);
 }
 
 void i915_gem_suspend(struct drm_i915_private *i915)
@@ -155,7 +156,7 @@ void i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	switch_to_kernel_context_sync(i915);
+	switch_to_kernel_context_sync(&i915->gt);
 
 	mutex_unlock(&i915->drm.struct_mutex);
 
@@ -166,7 +167,7 @@ void i915_gem_suspend(struct drm_i915_private *i915)
 	GEM_BUG_ON(i915->gt.awake);
 	flush_work(&i915->gem.idle_work);
 
-	cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
+	cancel_delayed_work_sync(&i915->gt.hangcheck.work);
 
 	i915_gem_drain_freed_objects(i915);
 
@@ -274,10 +275,10 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	return;
 
 err_wedged:
-	if (!i915_reset_failed(i915)) {
+	if (!intel_gt_is_wedged(&i915->gt)) {
 		dev_err(i915->drm.dev,
 			"Failed to re-initialize GPU, declaring it wedged!\n");
-		i915_gem_set_wedged(i915);
+		intel_gt_set_wedged(&i915->gt);
 	}
 	goto out_unlock;
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_throttle.c b/drivers/gpu/drm/i915/gem/i915_gem_throttle.c
index adb3074d9ce2..1e372420771b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_throttle.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_throttle.c
@@ -41,7 +41,7 @@ i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
 	long ret;
 
 	/* ABI: return -EIO if already wedged */
-	ret = i915_terminally_wedged(to_i915(dev));
+	ret = intel_gt_terminally_wedged(&to_i915(dev)->gt);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
index 86eed4c3ae2b..6cbd4a668c9a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c
@@ -1753,7 +1753,7 @@ int i915_gem_huge_page_mock_selftests(void)
 	return err;
 }
 
-int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
+int i915_gem_huge_page_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_shrink_thp),
@@ -1768,22 +1768,22 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
 	intel_wakeref_t wakeref;
 	int err;
 
-	if (!HAS_PPGTT(dev_priv)) {
+	if (!HAS_PPGTT(i915)) {
 		pr_info("PPGTT not supported, skipping live-selftests\n");
 		return 0;
 	}
 
-	if (i915_terminally_wedged(dev_priv))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
-	file = mock_file(dev_priv);
+	file = mock_file(i915);
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
-	mutex_lock(&dev_priv->drm.struct_mutex);
-	wakeref = intel_runtime_pm_get(&dev_priv->runtime_pm);
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
 
-	ctx = live_context(dev_priv, file);
+	ctx = live_context(i915, file);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1795,10 +1795,10 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv)
 	err = i915_subtests(tests, ctx);
 
 out_unlock:
-	intel_runtime_pm_put(&dev_priv->runtime_pm, wakeref);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
 
-	mock_file_free(dev_priv, file);
+	mock_file_free(i915, file);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index fa79233093eb..275c28926067 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -5,6 +5,8 @@
 
 #include "i915_selftest.h"
 
+#include "gt/intel_gt.h"
+
 #include "selftests/igt_flush_test.h"
 #include "selftests/mock_drm.h"
 #include "mock_context.h"
@@ -101,7 +103,7 @@ int i915_gem_client_blt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_client_fill),
 	};
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	if (!HAS_ENGINE(i915, BCS0))
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
index 861f32be7d46..a1a4b53cdc4a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_coherency.c
@@ -6,6 +6,8 @@
 
 #include <linux/prime_numbers.h>
 
+#include "gt/intel_gt.h"
+
 #include "i915_selftest.h"
 #include "selftests/i915_random.h"
 
@@ -242,12 +244,12 @@ static bool always_valid(struct drm_i915_private *i915)
 
 static bool needs_fence_registers(struct drm_i915_private *i915)
 {
-	return !i915_terminally_wedged(i915);
+	return !intel_gt_is_wedged(&i915->gt);
 }
 
 static bool needs_mi_store_dword(struct drm_i915_private *i915)
 {
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return false;
 
 	if (!HAS_ENGINE(i915, RCS0))
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index 3abe15a08b6d..2e927f4566d3 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -7,6 +7,7 @@
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_pm.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_reset.h"
 #include "i915_selftest.h"
 
@@ -83,7 +84,7 @@ static int live_nop_switch(void *arg)
 		}
 		if (i915_request_wait(rq, 0, HZ / 5) < 0) {
 			pr_err("Failed to populated %d contexts\n", nctx);
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto out_unlock;
 		}
@@ -127,7 +128,7 @@ static int live_nop_switch(void *arg)
 			if (i915_request_wait(rq, 0, HZ / 5) < 0) {
 				pr_err("Switching between %ld contexts timed out\n",
 				       prime);
-				i915_gem_set_wedged(i915);
+				intel_gt_set_wedged(&i915->gt);
 				break;
 			}
 
@@ -956,7 +957,7 @@ __sseu_finish(struct drm_i915_private *i915,
 	int ret = 0;
 
 	if (flags & TEST_RESET) {
-		ret = i915_reset_engine(ce->engine, "sseu");
+		ret = intel_engine_reset(ce->engine, "sseu");
 		if (ret)
 			goto out;
 	}
@@ -1059,7 +1060,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
 		return PTR_ERR(file);
 
 	if (flags & TEST_RESET)
-		igt_global_reset_lock(i915);
+		igt_global_reset_lock(&i915->gt);
 
 	mutex_lock(&i915->drm.struct_mutex);
 
@@ -1120,7 +1121,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	if (flags & TEST_RESET)
-		igt_global_reset_unlock(i915);
+		igt_global_reset_unlock(&i915->gt);
 
 	mock_file_free(i915, file);
 
@@ -1722,7 +1723,7 @@ int i915_gem_context_mock_selftests(void)
 	return err;
 }
 
-int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
+int i915_gem_context_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(live_nop_switch),
@@ -1733,8 +1734,8 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_vm_isolation),
 	};
 
-	if (i915_terminally_wedged(dev_priv))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
-	return i915_live_subtests(tests, dev_priv);
+	return i915_live_subtests(tests, i915);
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index b95fdc2b6bfc..097181c55947 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -466,7 +466,7 @@ static int igt_mmap_offset_exhaustion(void *arg)
 
 	/* Now fill with busy dead objects that we expect to reap */
 	for (loop = 0; loop < 3; loop++) {
-		if (i915_terminally_wedged(i915))
+		if (intel_gt_is_wedged(&i915->gt))
 			break;
 
 		obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c
index 11d37238c62c..19843acc84d3 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c
@@ -3,6 +3,8 @@
  * Copyright © 2019 Intel Corporation
  */
 
+#include "gt/intel_gt.h"
+
 #include "i915_selftest.h"
 
 #include "selftests/igt_flush_test.h"
@@ -95,7 +97,7 @@ int i915_gem_object_blt_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_fill_blt),
 	};
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	if (!HAS_ENGINE(i915, BCS0))
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index faaa164267f4..a4db4dd22b4f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -410,8 +410,8 @@ gen8_emit_ggtt_write(u32 *cs, u32 value, u32 gtt_offset, u32 flags)
 	return cs;
 }
 
-static inline void intel_engine_reset(struct intel_engine_cs *engine,
-				      bool stalled)
+static inline void __intel_engine_reset(struct intel_engine_cs *engine,
+					bool stalled)
 {
 	if (engine->reset.reset)
 		engine->reset.reset(engine, stalled);
@@ -419,9 +419,9 @@ static inline void intel_engine_reset(struct intel_engine_cs *engine,
 }
 
 bool intel_engine_is_idle(struct intel_engine_cs *engine);
-bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
+bool intel_engines_are_idle(struct intel_gt *gt);
 
-void intel_engines_reset_default_submission(struct drm_i915_private *i915);
+void intel_engines_reset_default_submission(struct intel_gt *gt);
 unsigned int intel_engines_has_context_isolation(struct drm_i915_private *i915);
 
 bool intel_engine_can_store_dword(struct intel_engine_cs *engine);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index a8f5ed1d75be..8f3365bfa0c1 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1153,7 +1153,7 @@ static bool ring_is_idle(struct intel_engine_cs *engine)
 bool intel_engine_is_idle(struct intel_engine_cs *engine)
 {
 	/* More white lies, if wedged, hw state is inconsistent */
-	if (i915_reset_failed(engine->i915))
+	if (intel_gt_is_wedged(engine->gt))
 		return true;
 
 	if (!intel_engine_pm_is_awake(engine))
@@ -1189,7 +1189,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 	return ring_is_idle(engine);
 }
 
-bool intel_engines_are_idle(struct drm_i915_private *i915)
+bool intel_engines_are_idle(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
@@ -1198,14 +1198,14 @@ bool intel_engines_are_idle(struct drm_i915_private *i915)
 	 * If the driver is wedged, HW state may be very inconsistent and
 	 * report that it is still busy, even though we have stopped using it.
 	 */
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(gt))
 		return true;
 
 	/* Already parked (and passed an idleness test); must still be idle */
-	if (!READ_ONCE(i915->gt.awake))
+	if (!READ_ONCE(gt->awake))
 		return true;
 
-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt->i915, id) {
 		if (!intel_engine_is_idle(engine))
 			return false;
 	}
@@ -1213,12 +1213,12 @@ bool intel_engines_are_idle(struct drm_i915_private *i915)
 	return true;
 }
 
-void intel_engines_reset_default_submission(struct drm_i915_private *i915)
+void intel_engines_reset_default_submission(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, i915, id)
+	for_each_engine(engine, gt->i915, id)
 		engine->set_default_submission(engine);
 }
 
@@ -1495,7 +1495,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		va_end(ap);
 	}
 
-	if (i915_reset_failed(engine->i915))
+	if (intel_gt_is_wedged(engine->gt))
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count));
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index fe9f9eaffe88..76c828d11b05 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -9,6 +9,7 @@
 #include "intel_engine.h"
 #include "intel_engine_pool.h"
 #include "intel_engine_pm.h"
+#include "intel_gt.h"
 #include "intel_gt_pm.h"
 
 static int __engine_unpark(struct intel_wakeref *wf)
@@ -67,7 +68,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 		return true;
 
 	/* GPU is pointing to the void, as good as in the kernel context. */
-	if (i915_reset_failed(engine->i915))
+	if (intel_gt_is_wedged(engine->gt))
 		return true;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 46d24d9d62ac..90ed79285ff8 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -18,6 +18,8 @@ void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)
 
 	spin_lock_init(&gt->closed_lock);
 
+	intel_gt_init_hangcheck(gt);
+	intel_gt_init_reset(gt);
 	intel_gt_pm_init_early(gt);
 }
 
@@ -240,3 +242,8 @@ void intel_gt_fini_scratch(struct intel_gt *gt)
 {
 	i915_vma_unpin_and_release(&gt->scratch, 0);
 }
+
+void intel_gt_cleanup_early(struct intel_gt *gt)
+{
+	intel_gt_fini_reset(gt);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index cf3c6cecc8ee..f30d04eceb1a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -8,12 +8,15 @@
 
 #include "intel_engine_types.h"
 #include "intel_gt_types.h"
+#include "intel_reset.h"
 
 struct drm_i915_private;
 
 void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915);
 void intel_gt_init_hw(struct drm_i915_private *i915);
 
+void intel_gt_cleanup_early(struct intel_gt *gt);
+
 void intel_gt_check_and_clear_faults(struct intel_gt *gt);
 void intel_gt_clear_error_registers(struct intel_gt *gt,
 				    intel_engine_mask_t engine_mask);
@@ -21,6 +24,8 @@ void intel_gt_clear_error_registers(struct intel_gt *gt,
 void intel_gt_flush_ggtt_writes(struct intel_gt *gt);
 void intel_gt_chipset_flush(struct intel_gt *gt);
 
+void intel_gt_init_hangcheck(struct intel_gt *gt);
+
 int intel_gt_init_scratch(struct intel_gt *gt, unsigned int size);
 void intel_gt_fini_scratch(struct intel_gt *gt);
 
@@ -29,4 +34,11 @@ static inline u32 intel_gt_scratch_offset(const struct intel_gt *gt)
 	return i915_ggtt_offset(gt->scratch);
 }
 
+static inline bool intel_gt_is_wedged(struct intel_gt *gt)
+{
+	return __intel_reset_failed(&gt->reset);
+}
+
+void intel_gt_queue_hangcheck(struct intel_gt *gt);
+
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 36ba80e6a0b7..9047e28e75ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -5,7 +5,9 @@
  */
 
 #include "i915_drv.h"
+#include "i915_params.h"
 #include "intel_engine_pm.h"
+#include "intel_gt.h"
 #include "intel_gt_pm.h"
 #include "intel_pm.h"
 #include "intel_wakeref.h"
@@ -17,6 +19,7 @@ static void pm_notify(struct drm_i915_private *i915, int state)
 
 static int intel_gt_unpark(struct intel_wakeref *wf)
 {
+	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
 	struct drm_i915_private *i915 =
 		container_of(wf, typeof(*i915), gt.wakeref);
 
@@ -33,8 +36,8 @@ static int intel_gt_unpark(struct intel_wakeref *wf)
 	 * Work around it by grabbing a GT IRQ power domain whilst there is any
 	 * GT activity, preventing any DC state transitions.
 	 */
-	i915->gt.awake = intel_display_power_get(i915, POWER_DOMAIN_GT_IRQ);
-	GEM_BUG_ON(!i915->gt.awake);
+	gt->awake = intel_display_power_get(i915, POWER_DOMAIN_GT_IRQ);
+	GEM_BUG_ON(!gt->awake);
 
 	intel_enable_gt_powersave(i915);
 
@@ -44,7 +47,7 @@ static int intel_gt_unpark(struct intel_wakeref *wf)
 
 	i915_pmu_gt_unparked(i915);
 
-	i915_queue_hangcheck(i915);
+	intel_gt_queue_hangcheck(gt);
 
 	pm_notify(i915, INTEL_GT_UNPARK);
 
@@ -91,12 +94,12 @@ void intel_gt_pm_init_early(struct intel_gt *gt)
 	BLOCKING_INIT_NOTIFIER_HEAD(&gt->pm_notifications);
 }
 
-static bool reset_engines(struct drm_i915_private *i915)
+static bool reset_engines(struct intel_gt *gt)
 {
-	if (INTEL_INFO(i915)->gpu_reset_clobbers_display)
+	if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
 		return false;
 
-	return intel_gpu_reset(i915, ALL_ENGINES) == 0;
+	return intel_gpu_reset(gt, ALL_ENGINES) == 0;
 }
 
 /**
@@ -116,11 +119,11 @@ void intel_gt_sanitize(struct intel_gt *gt, bool force)
 
 	GEM_TRACE("\n");
 
-	if (!reset_engines(gt->i915) && !force)
+	if (!reset_engines(gt) && !force)
 		return;
 
 	for_each_engine(engine, gt->i915, id)
-		intel_engine_reset(engine, false);
+		__intel_engine_reset(engine, false);
 }
 
 int intel_gt_resume(struct intel_gt *gt)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index f43ea830b1e8..70d723e801bf 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -14,12 +14,21 @@
 #include <linux/types.h>
 
 #include "i915_vma.h"
+#include "intel_reset_types.h"
 #include "intel_wakeref.h"
 
 struct drm_i915_private;
 struct i915_ggtt;
 struct intel_uncore;
 
+struct intel_hangcheck {
+	/* For hangcheck timer */
+#define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
+#define DRM_I915_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_I915_HANGCHECK_PERIOD)
+
+	struct delayed_work work;
+};
+
 struct intel_gt {
 	struct drm_i915_private *i915;
 	struct intel_uncore *uncore;
@@ -39,6 +48,9 @@ struct intel_gt {
 	struct list_head closed_vma;
 	spinlock_t closed_lock; /* guards the list of closed_vma */
 
+	struct intel_hangcheck hangcheck;
+	struct intel_reset reset;
+
 	/**
 	 * Is the GPU currently considered idle, or busy executing
 	 * userspace requests? Whilst idle, we allow runtime power
diff --git a/drivers/gpu/drm/i915/gt/intel_hangcheck.c b/drivers/gpu/drm/i915/gt/intel_hangcheck.c
index 797d8ef0969c..88177f47b4d8 100644
--- a/drivers/gpu/drm/i915/gt/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/intel_hangcheck.c
@@ -22,8 +22,10 @@
  *
  */
 
-#include "intel_reset.h"
 #include "i915_drv.h"
+#include "intel_engine.h"
+#include "intel_gt.h"
+#include "intel_reset.h"
 
 struct hangcheck {
 	u64 acthd;
@@ -100,7 +102,6 @@ head_stuck(struct intel_engine_cs *engine, u64 acthd)
 static enum intel_engine_hangcheck_action
 engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 {
-	struct drm_i915_private *dev_priv = engine->i915;
 	enum intel_engine_hangcheck_action ha;
 	u32 tmp;
 
@@ -108,7 +109,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 	if (ha != ENGINE_DEAD)
 		return ha;
 
-	if (IS_GEN(dev_priv, 2))
+	if (IS_GEN(engine->i915, 2))
 		return ENGINE_DEAD;
 
 	/* Is the chip hanging on a WAIT_FOR_EVENT?
@@ -118,8 +119,8 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 	 */
 	tmp = ENGINE_READ(engine, RING_CTL);
 	if (tmp & RING_WAIT) {
-		i915_handle_error(dev_priv, engine->mask, 0,
-				  "stuck wait on %s", engine->name);
+		intel_gt_handle_error(engine->gt, engine->mask, 0,
+				      "stuck wait on %s", engine->name);
 		ENGINE_WRITE(engine, RING_CTL, tmp);
 		return ENGINE_WAIT_KICK;
 	}
@@ -219,7 +220,7 @@ static void hangcheck_accumulate_sample(struct intel_engine_cs *engine,
 				 I915_ENGINE_WEDGED_TIMEOUT);
 }
 
-static void hangcheck_declare_hang(struct drm_i915_private *i915,
+static void hangcheck_declare_hang(struct intel_gt *gt,
 				   intel_engine_mask_t hung,
 				   intel_engine_mask_t stuck)
 {
@@ -235,12 +236,12 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
 		hung &= ~stuck;
 	len = scnprintf(msg, sizeof(msg),
 			"%s on ", stuck == hung ? "no progress" : "hang");
-	for_each_engine_masked(engine, i915, hung, tmp)
+	for_each_engine_masked(engine, gt->i915, hung, tmp)
 		len += scnprintf(msg + len, sizeof(msg) - len,
 				 "%s, ", engine->name);
 	msg[len-2] = '\0';
 
-	return i915_handle_error(i915, hung, I915_ERROR_CAPTURE, "%s", msg);
+	return intel_gt_handle_error(gt, hung, I915_ERROR_CAPTURE, "%s", msg);
 }
 
 /*
@@ -251,11 +252,10 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
  * we kick the ring. If we see no progress on three subsequent calls
  * we assume chip is wedged and try to fix it by resetting the chip.
  */
-static void i915_hangcheck_elapsed(struct work_struct *work)
+static void hangcheck_elapsed(struct work_struct *work)
 {
-	struct drm_i915_private *dev_priv =
-		container_of(work, typeof(*dev_priv),
-			     gpu_error.hangcheck_work.work);
+	struct intel_gt *gt =
+		container_of(work, typeof(*gt), hangcheck.work.work);
 	intel_engine_mask_t hung = 0, stuck = 0, wedged = 0;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
@@ -264,13 +264,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (!i915_modparams.enable_hangcheck)
 		return;
 
-	if (!READ_ONCE(dev_priv->gt.awake))
+	if (!READ_ONCE(gt->awake))
 		return;
 
-	if (i915_terminally_wedged(dev_priv))
+	if (intel_gt_is_wedged(gt))
 		return;
 
-	wakeref = intel_runtime_pm_get_if_in_use(&dev_priv->runtime_pm);
+	wakeref = intel_runtime_pm_get_if_in_use(&gt->i915->runtime_pm);
 	if (!wakeref)
 		return;
 
@@ -278,9 +278,9 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	 * periodically arm the mmio checker to see if we are triggering
 	 * any invalid access.
 	 */
-	intel_uncore_arm_unclaimed_mmio_detection(&dev_priv->uncore);
+	intel_uncore_arm_unclaimed_mmio_detection(gt->uncore);
 
-	for_each_engine(engine, dev_priv, id) {
+	for_each_engine(engine, gt->i915, id) {
 		struct hangcheck hc;
 
 		intel_engine_signal_breadcrumbs(engine);
@@ -302,7 +302,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	if (GEM_SHOW_DEBUG() && (hung | stuck)) {
 		struct drm_printer p = drm_debug_printer("hangcheck");
 
-		for_each_engine(engine, dev_priv, id) {
+		for_each_engine(engine, gt->i915, id) {
 			if (intel_engine_is_idle(engine))
 				continue;
 
@@ -311,20 +311,36 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 	}
 
 	if (wedged) {
-		dev_err(dev_priv->drm.dev,
+		dev_err(gt->i915->drm.dev,
 			"GPU recovery timed out,"
 			" cancelling all in-flight rendering.\n");
 		GEM_TRACE_DUMP();
-		i915_gem_set_wedged(dev_priv);
+		intel_gt_set_wedged(gt);
 	}
 
 	if (hung)
-		hangcheck_declare_hang(dev_priv, hung, stuck);
+		hangcheck_declare_hang(gt, hung, stuck);
 
-	intel_runtime_pm_put(&dev_priv->runtime_pm, wakeref);
+	intel_runtime_pm_put(&gt->i915->runtime_pm, wakeref);
 
 	/* Reset timer in case GPU hangs without another request being added */
-	i915_queue_hangcheck(dev_priv);
+	intel_gt_queue_hangcheck(gt);
+}
+
+void intel_gt_queue_hangcheck(struct intel_gt *gt)
+{
+	unsigned long delay;
+
+	if (unlikely(!i915_modparams.enable_hangcheck))
+		return;
+
+	/* Don't continually defer the hangcheck so that it is always run at
+	 * least once after work has been scheduled on any ring. Otherwise,
+	 * we will ignore a hung ring if a second ring is kept busy.
+	 */
+
+	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
+	queue_delayed_work(system_long_wq, &gt->hangcheck.work, delay);
 }
 
 void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
@@ -333,10 +349,9 @@ void intel_engine_init_hangcheck(struct intel_engine_cs *engine)
 	engine->hangcheck.action_timestamp = jiffies;
 }
 
-void intel_hangcheck_init(struct drm_i915_private *i915)
+void intel_gt_init_hangcheck(struct intel_gt *gt)
 {
-	INIT_DELAYED_WORK(&i915->gpu_error.hangcheck_work,
-			  i915_hangcheck_elapsed);
+	INIT_DELAYED_WORK(&gt->hangcheck.work, hangcheck_elapsed);
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 4f5195d1bc8b..a3aff55a7db6 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2269,7 +2269,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * and have to at least restore the RING register in the context
 	 * image back to the expected values to skip over the guilty request.
 	 */
-	i915_reset_request(rq, stalled);
+	__i915_request_reset(rq, stalled);
 	if (!stalled)
 		goto out_replay;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 72002c0f9698..67c06231b004 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -114,7 +114,7 @@ static void context_mark_innocent(struct i915_gem_context *ctx)
 	atomic_inc(&ctx->active_count);
 }
 
-void i915_reset_request(struct i915_request *rq, bool guilty)
+void __i915_request_reset(struct i915_request *rq, bool guilty)
 {
 	GEM_TRACE("%s rq=%llx:%lld, guilty? %s\n",
 		  rq->engine->name,
@@ -164,16 +164,15 @@ static void gen3_stop_engine(struct intel_engine_cs *engine)
 			  intel_uncore_read_fw(uncore, RING_HEAD(base)));
 }
 
-static void i915_stop_engines(struct drm_i915_private *i915,
-			      intel_engine_mask_t engine_mask)
+static void stop_engines(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 {
 	struct intel_engine_cs *engine;
 	intel_engine_mask_t tmp;
 
-	if (INTEL_GEN(i915) < 3)
+	if (INTEL_GEN(gt->i915) < 3)
 		return;
 
-	for_each_engine_masked(engine, i915, engine_mask, tmp)
+	for_each_engine_masked(engine, gt->i915, engine_mask, tmp)
 		gen3_stop_engine(engine);
 }
 
@@ -185,11 +184,11 @@ static bool i915_in_reset(struct pci_dev *pdev)
 	return gdrst & GRDOM_RESET_STATUS;
 }
 
-static int i915_do_reset(struct drm_i915_private *i915,
+static int i915_do_reset(struct intel_gt *gt,
 			 intel_engine_mask_t engine_mask,
 			 unsigned int retry)
 {
-	struct pci_dev *pdev = i915->drm.pdev;
+	struct pci_dev *pdev = gt->i915->drm.pdev;
 	int err;
 
 	/* Assert reset for at least 20 usec, and wait for acknowledgement. */
@@ -214,22 +213,22 @@ static bool g4x_reset_complete(struct pci_dev *pdev)
 	return (gdrst & GRDOM_RESET_ENABLE) == 0;
 }
 
-static int g33_do_reset(struct drm_i915_private *i915,
+static int g33_do_reset(struct intel_gt *gt,
 			intel_engine_mask_t engine_mask,
 			unsigned int retry)
 {
-	struct pci_dev *pdev = i915->drm.pdev;
+	struct pci_dev *pdev = gt->i915->drm.pdev;
 
 	pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE);
 	return wait_for_atomic(g4x_reset_complete(pdev), 50);
 }
 
-static int g4x_do_reset(struct drm_i915_private *i915,
+static int g4x_do_reset(struct intel_gt *gt,
 			intel_engine_mask_t engine_mask,
 			unsigned int retry)
 {
-	struct pci_dev *pdev = i915->drm.pdev;
-	struct intel_uncore *uncore = &i915->uncore;
+	struct pci_dev *pdev = gt->i915->drm.pdev;
+	struct intel_uncore *uncore = gt->uncore;
 	int ret;
 
 	/* WaVcpClkGateDisableForMediaReset:ctg,elk */
@@ -261,11 +260,11 @@ static int g4x_do_reset(struct drm_i915_private *i915,
 	return ret;
 }
 
-static int ironlake_do_reset(struct drm_i915_private *i915,
+static int ironlake_do_reset(struct intel_gt *gt,
 			     intel_engine_mask_t engine_mask,
 			     unsigned int retry)
 {
-	struct intel_uncore *uncore = &i915->uncore;
+	struct intel_uncore *uncore = gt->uncore;
 	int ret;
 
 	intel_uncore_write_fw(uncore, ILK_GDSR,
@@ -297,10 +296,9 @@ static int ironlake_do_reset(struct drm_i915_private *i915,
 }
 
 /* Reset the hardware domains (GENX_GRDOM_*) specified by mask */
-static int gen6_hw_domain_reset(struct drm_i915_private *i915,
-				u32 hw_domain_mask)
+static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
 {
-	struct intel_uncore *uncore = &i915->uncore;
+	struct intel_uncore *uncore = gt->uncore;
 	int err;
 
 	/*
@@ -322,7 +320,7 @@ static int gen6_hw_domain_reset(struct drm_i915_private *i915,
 	return err;
 }
 
-static int gen6_reset_engines(struct drm_i915_private *i915,
+static int gen6_reset_engines(struct intel_gt *gt,
 			      intel_engine_mask_t engine_mask,
 			      unsigned int retry)
 {
@@ -342,13 +340,13 @@ static int gen6_reset_engines(struct drm_i915_private *i915,
 		intel_engine_mask_t tmp;
 
 		hw_mask = 0;
-		for_each_engine_masked(engine, i915, engine_mask, tmp) {
+		for_each_engine_masked(engine, gt->i915, engine_mask, tmp) {
 			GEM_BUG_ON(engine->id >= ARRAY_SIZE(hw_engine_mask));
 			hw_mask |= hw_engine_mask[engine->id];
 		}
 	}
 
-	return gen6_hw_domain_reset(i915, hw_mask);
+	return gen6_hw_domain_reset(gt, hw_mask);
 }
 
 static u32 gen11_lock_sfc(struct intel_engine_cs *engine)
@@ -446,7 +444,7 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine)
 	rmw_clear_fw(uncore, sfc_forced_lock, sfc_forced_lock_bit);
 }
 
-static int gen11_reset_engines(struct drm_i915_private *i915,
+static int gen11_reset_engines(struct intel_gt *gt,
 			       intel_engine_mask_t engine_mask,
 			       unsigned int retry)
 {
@@ -469,17 +467,17 @@ static int gen11_reset_engines(struct drm_i915_private *i915,
 		hw_mask = GEN11_GRDOM_FULL;
 	} else {
 		hw_mask = 0;
-		for_each_engine_masked(engine, i915, engine_mask, tmp) {
+		for_each_engine_masked(engine, gt->i915, engine_mask, tmp) {
 			GEM_BUG_ON(engine->id >= ARRAY_SIZE(hw_engine_mask));
 			hw_mask |= hw_engine_mask[engine->id];
 			hw_mask |= gen11_lock_sfc(engine);
 		}
 	}
 
-	ret = gen6_hw_domain_reset(i915, hw_mask);
+	ret = gen6_hw_domain_reset(gt, hw_mask);
 
 	if (engine_mask != ALL_ENGINES)
-		for_each_engine_masked(engine, i915, engine_mask, tmp)
+		for_each_engine_masked(engine, gt->i915, engine_mask, tmp)
 			gen11_unlock_sfc(engine);
 
 	return ret;
@@ -529,7 +527,7 @@ static void gen8_engine_reset_cancel(struct intel_engine_cs *engine)
 			      _MASKED_BIT_DISABLE(RESET_CTL_REQUEST_RESET));
 }
 
-static int gen8_reset_engines(struct drm_i915_private *i915,
+static int gen8_reset_engines(struct intel_gt *gt,
 			      intel_engine_mask_t engine_mask,
 			      unsigned int retry)
 {
@@ -538,7 +536,7 @@ static int gen8_reset_engines(struct drm_i915_private *i915,
 	intel_engine_mask_t tmp;
 	int ret;
 
-	for_each_engine_masked(engine, i915, engine_mask, tmp) {
+	for_each_engine_masked(engine, gt->i915, engine_mask, tmp) {
 		ret = gen8_engine_reset_prepare(engine);
 		if (ret && !reset_non_ready)
 			goto skip_reset;
@@ -554,23 +552,23 @@ static int gen8_reset_engines(struct drm_i915_private *i915,
 		 * We rather take context corruption instead of
 		 * failed reset with a wedged driver/gpu. And
 		 * active bb execution case should be covered by
-		 * i915_stop_engines we have before the reset.
+		 * stop_engines() we have before the reset.
 		 */
 	}
 
-	if (INTEL_GEN(i915) >= 11)
-		ret = gen11_reset_engines(i915, engine_mask, retry);
+	if (INTEL_GEN(gt->i915) >= 11)
+		ret = gen11_reset_engines(gt, engine_mask, retry);
 	else
-		ret = gen6_reset_engines(i915, engine_mask, retry);
+		ret = gen6_reset_engines(gt, engine_mask, retry);
 
 skip_reset:
-	for_each_engine_masked(engine, i915, engine_mask, tmp)
+	for_each_engine_masked(engine, gt->i915, engine_mask, tmp)
 		gen8_engine_reset_cancel(engine);
 
 	return ret;
 }
 
-typedef int (*reset_func)(struct drm_i915_private *,
+typedef int (*reset_func)(struct intel_gt *,
 			  intel_engine_mask_t engine_mask,
 			  unsigned int retry);
 
@@ -592,15 +590,14 @@ static reset_func intel_get_gpu_reset(struct drm_i915_private *i915)
 		return NULL;
 }
 
-int intel_gpu_reset(struct drm_i915_private *i915,
-		    intel_engine_mask_t engine_mask)
+int intel_gpu_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 {
 	const int retries = engine_mask == ALL_ENGINES ? RESET_MAX_RETRIES : 1;
 	reset_func reset;
 	int ret = -ETIMEDOUT;
 	int retry;
 
-	reset = intel_get_gpu_reset(i915);
+	reset = intel_get_gpu_reset(gt->i915);
 	if (!reset)
 		return -ENODEV;
 
@@ -608,7 +605,7 @@ int intel_gpu_reset(struct drm_i915_private *i915,
 	 * If the power well sleeps during the reset, the reset
 	 * request may be dropped and never completes (causing -EIO).
 	 */
-	intel_uncore_forcewake_get(&i915->uncore, FORCEWAKE_ALL);
+	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
 	for (retry = 0; ret == -ETIMEDOUT && retry < retries; retry++) {
 		/*
 		 * We stop engines, otherwise we might get failed reset and a
@@ -625,14 +622,14 @@ int intel_gpu_reset(struct drm_i915_private *i915,
 		 * FIXME: Wa for more modern gens needs to be validated
 		 */
 		if (retry)
-			i915_stop_engines(i915, engine_mask);
+			stop_engines(gt, engine_mask);
 
 		GEM_TRACE("engine_mask=%x\n", engine_mask);
 		preempt_disable();
-		ret = reset(i915, engine_mask, retry);
+		ret = reset(gt, engine_mask, retry);
 		preempt_enable();
 	}
-	intel_uncore_forcewake_put(&i915->uncore, FORCEWAKE_ALL);
+	intel_uncore_forcewake_put(gt->uncore, FORCEWAKE_ALL);
 
 	return ret;
 }
@@ -650,17 +647,17 @@ bool intel_has_reset_engine(struct drm_i915_private *i915)
 	return INTEL_INFO(i915)->has_reset_engine && i915_modparams.reset >= 2;
 }
 
-int intel_reset_guc(struct drm_i915_private *i915)
+int intel_reset_guc(struct intel_gt *gt)
 {
 	u32 guc_domain =
-		INTEL_GEN(i915) >= 11 ? GEN11_GRDOM_GUC : GEN9_GRDOM_GUC;
+		INTEL_GEN(gt->i915) >= 11 ? GEN11_GRDOM_GUC : GEN9_GRDOM_GUC;
 	int ret;
 
-	GEM_BUG_ON(!HAS_GUC(i915));
+	GEM_BUG_ON(!HAS_GUC(gt->i915));
 
-	intel_uncore_forcewake_get(&i915->uncore, FORCEWAKE_ALL);
-	ret = gen6_hw_domain_reset(i915, guc_domain);
-	intel_uncore_forcewake_put(&i915->uncore, FORCEWAKE_ALL);
+	intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL);
+	ret = gen6_hw_domain_reset(gt, guc_domain);
+	intel_uncore_forcewake_put(gt->uncore, FORCEWAKE_ALL);
 
 	return ret;
 }
@@ -682,56 +679,55 @@ static void reset_prepare_engine(struct intel_engine_cs *engine)
 	engine->reset.prepare(engine);
 }
 
-static void revoke_mmaps(struct drm_i915_private *i915)
+static void revoke_mmaps(struct intel_gt *gt)
 {
 	int i;
 
-	for (i = 0; i < i915->ggtt.num_fences; i++) {
+	for (i = 0; i < gt->ggtt->num_fences; i++) {
 		struct drm_vma_offset_node *node;
 		struct i915_vma *vma;
 		u64 vma_offset;
 
-		vma = READ_ONCE(i915->ggtt.fence_regs[i].vma);
+		vma = READ_ONCE(gt->ggtt->fence_regs[i].vma);
 		if (!vma)
 			continue;
 
 		if (!i915_vma_has_userfault(vma))
 			continue;
 
-		GEM_BUG_ON(vma->fence != &i915->ggtt.fence_regs[i]);
+		GEM_BUG_ON(vma->fence != &gt->ggtt->fence_regs[i]);
 		node = &vma->obj->base.vma_node;
 		vma_offset = vma->ggtt_view.partial.offset << PAGE_SHIFT;
-		unmap_mapping_range(i915->drm.anon_inode->i_mapping,
+		unmap_mapping_range(gt->i915->drm.anon_inode->i_mapping,
 				    drm_vma_node_offset_addr(node) + vma_offset,
 				    vma->size,
 				    1);
 	}
 }
 
-static intel_engine_mask_t reset_prepare(struct drm_i915_private *i915)
+static intel_engine_mask_t reset_prepare(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	intel_engine_mask_t awake = 0;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt->i915, id) {
 		if (intel_engine_pm_get_if_awake(engine))
 			awake |= engine->mask;
 		reset_prepare_engine(engine);
 	}
 
-	intel_uc_reset_prepare(i915);
+	intel_uc_reset_prepare(gt->i915);
 
 	return awake;
 }
 
-static void gt_revoke(struct drm_i915_private *i915)
+static void gt_revoke(struct intel_gt *gt)
 {
-	revoke_mmaps(i915);
+	revoke_mmaps(gt);
 }
 
-static int gt_reset(struct drm_i915_private *i915,
-		    intel_engine_mask_t stalled_mask)
+static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
@@ -741,14 +737,14 @@ static int gt_reset(struct drm_i915_private *i915,
 	 * Everything depends on having the GTT running, so we need to start
 	 * there.
 	 */
-	err = i915_ggtt_enable_hw(i915);
+	err = i915_ggtt_enable_hw(gt->i915);
 	if (err)
 		return err;
 
-	for_each_engine(engine, i915, id)
-		intel_engine_reset(engine, stalled_mask & engine->mask);
+	for_each_engine(engine, gt->i915, id)
+		__intel_engine_reset(engine, stalled_mask & engine->mask);
 
-	i915_gem_restore_fences(i915);
+	i915_gem_restore_fences(gt->i915);
 
 	return err;
 }
@@ -761,13 +757,12 @@ static void reset_finish_engine(struct intel_engine_cs *engine)
 	intel_engine_signal_breadcrumbs(engine);
 }
 
-static void reset_finish(struct drm_i915_private *i915,
-			 intel_engine_mask_t awake)
+static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt->i915, id) {
 		reset_finish_engine(engine);
 		if (awake & engine->mask)
 			intel_engine_pm_put(engine);
@@ -791,20 +786,19 @@ static void nop_submit_request(struct i915_request *request)
 	intel_engine_queue_breadcrumbs(engine);
 }
 
-static void __i915_gem_set_wedged(struct drm_i915_private *i915)
+static void __intel_gt_set_wedged(struct intel_gt *gt)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
 	struct intel_engine_cs *engine;
 	intel_engine_mask_t awake;
 	enum intel_engine_id id;
 
-	if (test_bit(I915_WEDGED, &error->flags))
+	if (test_bit(I915_WEDGED, &gt->reset.flags))
 		return;
 
-	if (GEM_SHOW_DEBUG() && !intel_engines_are_idle(i915)) {
+	if (GEM_SHOW_DEBUG() && !intel_engines_are_idle(gt)) {
 		struct drm_printer p = drm_debug_printer(__func__);
 
-		for_each_engine(engine, i915, id)
+		for_each_engine(engine, gt->i915, id)
 			intel_engine_dump(engine, &p, "%s\n", engine->name);
 	}
 
@@ -815,17 +809,17 @@ static void __i915_gem_set_wedged(struct drm_i915_private *i915)
 	 * rolling the global seqno forward (since this would complete requests
 	 * for which we haven't set the fence error to EIO yet).
 	 */
-	awake = reset_prepare(i915);
+	awake = reset_prepare(gt);
 
 	/* Even if the GPU reset fails, it should still stop the engines */
-	if (!INTEL_INFO(i915)->gpu_reset_clobbers_display)
-		intel_gpu_reset(i915, ALL_ENGINES);
+	if (!INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+		intel_gpu_reset(gt, ALL_ENGINES);
 
-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt->i915, id) {
 		engine->submit_request = nop_submit_request;
 		engine->schedule = NULL;
 	}
-	i915->caps.scheduler = 0;
+	gt->i915->caps.scheduler = 0;
 
 	/*
 	 * Make sure no request can slip through without getting completed by
@@ -833,38 +827,36 @@ static void __i915_gem_set_wedged(struct drm_i915_private *i915)
 	 * in nop_submit_request.
 	 */
 	synchronize_rcu_expedited();
-	set_bit(I915_WEDGED, &error->flags);
+	set_bit(I915_WEDGED, &gt->reset.flags);
 
 	/* Mark all executing requests as skipped */
-	for_each_engine(engine, i915, id)
+	for_each_engine(engine, gt->i915, id)
 		engine->cancel_requests(engine);
 
-	reset_finish(i915, awake);
+	reset_finish(gt, awake);
 
 	GEM_TRACE("end\n");
 }
 
-void i915_gem_set_wedged(struct drm_i915_private *i915)
+void intel_gt_set_wedged(struct intel_gt *gt)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
 	intel_wakeref_t wakeref;
 
-	mutex_lock(&error->wedge_mutex);
-	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
-		__i915_gem_set_wedged(i915);
-	mutex_unlock(&error->wedge_mutex);
+	mutex_lock(&gt->reset.mutex);
+	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
+		__intel_gt_set_wedged(gt);
+	mutex_unlock(&gt->reset.mutex);
 }
 
-static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
+static bool __intel_gt_unset_wedged(struct intel_gt *gt)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
-	struct intel_gt_timelines *timelines = &i915->gt.timelines;
+	struct intel_gt_timelines *timelines = &gt->timelines;
 	struct intel_timeline *tl;
 
-	if (!test_bit(I915_WEDGED, &error->flags))
+	if (!test_bit(I915_WEDGED, &gt->reset.flags))
 		return true;
 
-	if (!i915->gt.scratch) /* Never full initialised, recovery impossible */
+	if (!gt->scratch) /* Never full initialised, recovery impossible */
 		return false;
 
 	GEM_TRACE("start\n");
@@ -905,7 +897,7 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 	}
 	spin_unlock(&timelines->lock);
 
-	intel_gt_sanitize(&i915->gt, false);
+	intel_gt_sanitize(gt, false);
 
 	/*
 	 * Undo nop_submit_request. We prevent all new i915 requests from
@@ -916,53 +908,51 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 	 * the nop_submit_request on reset, we can do this from normal
 	 * context and do not require stop_machine().
 	 */
-	intel_engines_reset_default_submission(i915);
+	intel_engines_reset_default_submission(gt);
 
 	GEM_TRACE("end\n");
 
 	smp_mb__before_atomic(); /* complete takeover before enabling execbuf */
-	clear_bit(I915_WEDGED, &i915->gpu_error.flags);
+	clear_bit(I915_WEDGED, &gt->reset.flags);
 
 	return true;
 }
 
-bool i915_gem_unset_wedged(struct drm_i915_private *i915)
+bool intel_gt_unset_wedged(struct intel_gt *gt)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
 	bool result;
 
-	mutex_lock(&error->wedge_mutex);
-	result = __i915_gem_unset_wedged(i915);
-	mutex_unlock(&error->wedge_mutex);
+	mutex_lock(&gt->reset.mutex);
+	result = __intel_gt_unset_wedged(gt);
+	mutex_unlock(&gt->reset.mutex);
 
 	return result;
 }
 
-static int do_reset(struct drm_i915_private *i915,
-		    intel_engine_mask_t stalled_mask)
+static int do_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 {
 	int err, i;
 
-	gt_revoke(i915);
+	gt_revoke(gt);
 
-	err = intel_gpu_reset(i915, ALL_ENGINES);
+	err = intel_gpu_reset(gt, ALL_ENGINES);
 	for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
 		msleep(10 * (i + 1));
-		err = intel_gpu_reset(i915, ALL_ENGINES);
+		err = intel_gpu_reset(gt, ALL_ENGINES);
 	}
 	if (err)
 		return err;
 
-	return gt_reset(i915, stalled_mask);
+	return gt_reset(gt, stalled_mask);
 }
 
-static int resume(struct drm_i915_private *i915)
+static int resume(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 	int ret;
 
-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt->i915, id) {
 		ret = engine->resume(engine);
 		if (ret)
 			return ret;
@@ -972,8 +962,8 @@ static int resume(struct drm_i915_private *i915)
 }
 
 /**
- * i915_reset - reset chip after a hang
- * @i915: #drm_i915_private to reset
+ * intel_gt_reset - reset chip after a hang
+ * @gt: #intel_gt to reset
  * @stalled_mask: mask of the stalled engines with the guilty requests
  * @reason: user error message for why we are resetting
  *
@@ -988,50 +978,50 @@ static int resume(struct drm_i915_private *i915)
  *   - re-init interrupt state
  *   - re-init display
  */
-void i915_reset(struct drm_i915_private *i915,
-		intel_engine_mask_t stalled_mask,
-		const char *reason)
+void intel_gt_reset(struct intel_gt *gt,
+		    intel_engine_mask_t stalled_mask,
+		    const char *reason)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
 	intel_engine_mask_t awake;
 	int ret;
 
-	GEM_TRACE("flags=%lx\n", error->flags);
+	GEM_TRACE("flags=%lx\n", gt->reset.flags);
 
 	might_sleep();
-	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
-	mutex_lock(&error->wedge_mutex);
+	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &gt->reset.flags));
+	mutex_lock(&gt->reset.mutex);
 
 	/* Clear any previous failed attempts at recovery. Time to try again. */
-	if (!__i915_gem_unset_wedged(i915))
+	if (!__intel_gt_unset_wedged(gt))
 		goto unlock;
 
 	if (reason)
-		dev_notice(i915->drm.dev, "Resetting chip for %s\n", reason);
-	error->reset_count++;
+		dev_notice(gt->i915->drm.dev,
+			   "Resetting chip for %s\n", reason);
+	atomic_inc(&gt->i915->gpu_error.reset_count);
 
-	awake = reset_prepare(i915);
+	awake = reset_prepare(gt);
 
-	if (!intel_has_gpu_reset(i915)) {
+	if (!intel_has_gpu_reset(gt->i915)) {
 		if (i915_modparams.reset)
-			dev_err(i915->drm.dev, "GPU reset not supported\n");
+			dev_err(gt->i915->drm.dev, "GPU reset not supported\n");
 		else
 			DRM_DEBUG_DRIVER("GPU reset disabled\n");
 		goto error;
 	}
 
-	if (INTEL_INFO(i915)->gpu_reset_clobbers_display)
-		intel_runtime_pm_disable_interrupts(i915);
+	if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+		intel_runtime_pm_disable_interrupts(gt->i915);
 
-	if (do_reset(i915, stalled_mask)) {
-		dev_err(i915->drm.dev, "Failed to reset chip\n");
+	if (do_reset(gt, stalled_mask)) {
+		dev_err(gt->i915->drm.dev, "Failed to reset chip\n");
 		goto taint;
 	}
 
-	if (INTEL_INFO(i915)->gpu_reset_clobbers_display)
-		intel_runtime_pm_enable_interrupts(i915);
+	if (INTEL_INFO(gt->i915)->gpu_reset_clobbers_display)
+		intel_runtime_pm_enable_interrupts(gt->i915);
 
-	intel_overlay_reset(i915);
+	intel_overlay_reset(gt->i915);
 
 	/*
 	 * Next we need to restore the context, but we don't use those
@@ -1041,23 +1031,23 @@ void i915_reset(struct drm_i915_private *i915,
 	 * was running at the time of the reset (i.e. we weren't VT
 	 * switched away).
 	 */
-	ret = i915_gem_init_hw(i915);
+	ret = i915_gem_init_hw(gt->i915);
 	if (ret) {
 		DRM_ERROR("Failed to initialise HW following reset (%d)\n",
 			  ret);
 		goto taint;
 	}
 
-	ret = resume(i915);
+	ret = resume(gt);
 	if (ret)
 		goto taint;
 
-	i915_queue_hangcheck(i915);
+	intel_gt_queue_hangcheck(gt);
 
 finish:
-	reset_finish(i915, awake);
+	reset_finish(gt, awake);
 unlock:
-	mutex_unlock(&error->wedge_mutex);
+	mutex_unlock(&gt->reset.mutex);
 	return;
 
 taint:
@@ -1075,18 +1065,17 @@ void i915_reset(struct drm_i915_private *i915,
 	 */
 	add_taint_for_CI(TAINT_WARN);
 error:
-	__i915_gem_set_wedged(i915);
+	__intel_gt_set_wedged(gt);
 	goto finish;
 }
 
-static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
-					struct intel_engine_cs *engine)
+static inline int intel_gt_reset_engine(struct intel_engine_cs *engine)
 {
-	return intel_gpu_reset(i915, engine->mask);
+	return intel_gpu_reset(engine->gt, engine->mask);
 }
 
 /**
- * i915_reset_engine - reset GPU engine to recover from a hang
+ * intel_engine_reset - reset GPU engine to recover from a hang
  * @engine: engine to reset
  * @msg: reason for GPU reset; or NULL for no dev_notice()
  *
@@ -1098,13 +1087,13 @@ static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
  *  - reset engine (which will force the engine to idle)
  *  - re-init/configure engine
  */
-int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
+int intel_engine_reset(struct intel_engine_cs *engine, const char *msg)
 {
-	struct i915_gpu_error *error = &engine->i915->gpu_error;
+	struct intel_gt *gt = engine->gt;
 	int ret;
 
-	GEM_TRACE("%s flags=%lx\n", engine->name, error->flags);
-	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
+	GEM_TRACE("%s flags=%lx\n", engine->name, gt->reset.flags);
+	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
 
 	if (!intel_engine_pm_get_if_awake(engine))
 		return 0;
@@ -1114,10 +1103,10 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
 	if (msg)
 		dev_notice(engine->i915->drm.dev,
 			   "Resetting %s for %s\n", engine->name, msg);
-	error->reset_engine_count[engine->id]++;
+	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
 
 	if (!engine->i915->guc.execbuf_client)
-		ret = intel_gt_reset_engine(engine->i915, engine);
+		ret = intel_gt_reset_engine(engine);
 	else
 		ret = intel_guc_reset_engine(&engine->i915->guc, engine);
 	if (ret) {
@@ -1133,7 +1122,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
 	 * active request and can drop it, adjust head to skip the offending
 	 * request to resume executing remaining requests in the queue.
 	 */
-	intel_engine_reset(engine, true);
+	__intel_engine_reset(engine, true);
 
 	/*
 	 * The engine and its registers (and workarounds in case of render)
@@ -1149,16 +1138,15 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg)
 	return ret;
 }
 
-static void i915_reset_device(struct drm_i915_private *i915,
-			      u32 engine_mask,
-			      const char *reason)
+static void intel_gt_reset_global(struct intel_gt *gt,
+				  u32 engine_mask,
+				  const char *reason)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
-	struct kobject *kobj = &i915->drm.primary->kdev->kobj;
+	struct kobject *kobj = &gt->i915->drm.primary->kdev->kobj;
 	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
 	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
 	char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL };
-	struct i915_wedge_me w;
+	struct intel_wedge_me w;
 
 	kobject_uevent_env(kobj, KOBJ_CHANGE, error_event);
 
@@ -1166,24 +1154,24 @@ static void i915_reset_device(struct drm_i915_private *i915,
 	kobject_uevent_env(kobj, KOBJ_CHANGE, reset_event);
 
 	/* Use a watchdog to ensure that our reset completes */
-	i915_wedge_on_timeout(&w, i915, 5 * HZ) {
-		intel_prepare_reset(i915);
+	intel_wedge_on_timeout(&w, gt, 5 * HZ) {
+		intel_prepare_reset(gt->i915);
 
 		/* Flush everyone using a resource about to be clobbered */
-		synchronize_srcu_expedited(&error->reset_backoff_srcu);
+		synchronize_srcu_expedited(&gt->reset.backoff_srcu);
 
-		i915_reset(i915, engine_mask, reason);
+		intel_gt_reset(gt, engine_mask, reason);
 
-		intel_finish_reset(i915);
+		intel_finish_reset(gt->i915);
 	}
 
-	if (!test_bit(I915_WEDGED, &error->flags))
+	if (!test_bit(I915_WEDGED, &gt->reset.flags))
 		kobject_uevent_env(kobj, KOBJ_CHANGE, reset_done_event);
 }
 
 /**
- * i915_handle_error - handle a gpu error
- * @i915: i915 device private
+ * intel_gt_handle_error - handle a gpu error
+ * @gt: the intel_gt
  * @engine_mask: mask representing engines that are hung
  * @flags: control flags
  * @fmt: Error message format string
@@ -1194,12 +1182,11 @@ static void i915_reset_device(struct drm_i915_private *i915,
  * so userspace knows something bad happened (should trigger collection
  * of a ring dump etc.).
  */
-void i915_handle_error(struct drm_i915_private *i915,
-		       intel_engine_mask_t engine_mask,
-		       unsigned long flags,
-		       const char *fmt, ...)
+void intel_gt_handle_error(struct intel_gt *gt,
+			   intel_engine_mask_t engine_mask,
+			   unsigned long flags,
+			   const char *fmt, ...)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
 	intel_engine_mask_t tmp;
@@ -1223,33 +1210,31 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * isn't the case at least when we get here by doing a
 	 * simulated reset via debugfs, so get an RPM reference.
 	 */
-	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
+	wakeref = intel_runtime_pm_get(&gt->i915->runtime_pm);
 
-	engine_mask &= INTEL_INFO(i915)->engine_mask;
+	engine_mask &= INTEL_INFO(gt->i915)->engine_mask;
 
 	if (flags & I915_ERROR_CAPTURE) {
-		i915_capture_error_state(i915, engine_mask, msg);
-		intel_gt_clear_error_registers(&i915->gt, engine_mask);
+		i915_capture_error_state(gt->i915, engine_mask, msg);
+		intel_gt_clear_error_registers(gt, engine_mask);
 	}
 
 	/*
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) && !__i915_wedged(error)) {
-		for_each_engine_masked(engine, i915, engine_mask, tmp) {
+	if (intel_has_reset_engine(gt->i915) && !intel_gt_is_wedged(gt)) {
+		for_each_engine_masked(engine, gt->i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-					     &error->flags))
+					     &gt->reset.flags))
 				continue;
 
-			if (i915_reset_engine(engine, msg) == 0)
+			if (intel_engine_reset(engine, msg) == 0)
 				engine_mask &= ~engine->mask;
 
-			clear_bit(I915_RESET_ENGINE + engine->id,
-				  &error->flags);
-			wake_up_bit(&error->flags,
-				    I915_RESET_ENGINE + engine->id);
+			clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
+					      &gt->reset.flags);
 		}
 	}
 
@@ -1257,9 +1242,9 @@ void i915_handle_error(struct drm_i915_private *i915,
 		goto out;
 
 	/* Full reset needs the mutex, stop any other user trying to do so. */
-	if (test_and_set_bit(I915_RESET_BACKOFF, &error->flags)) {
-		wait_event(error->reset_queue,
-			   !test_bit(I915_RESET_BACKOFF, &error->flags));
+	if (test_and_set_bit(I915_RESET_BACKOFF, &gt->reset.flags)) {
+		wait_event(gt->reset.queue,
+			   !test_bit(I915_RESET_BACKOFF, &gt->reset.flags));
 		goto out; /* piggy-back on the other reset */
 	}
 
@@ -1267,113 +1252,119 @@ void i915_handle_error(struct drm_i915_private *i915,
 	synchronize_rcu_expedited();
 
 	/* Prevent any other reset-engine attempt. */
-	for_each_engine(engine, i915, tmp) {
+	for_each_engine(engine, gt->i915, tmp) {
 		while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-					&error->flags))
-			wait_on_bit(&error->flags,
+					&gt->reset.flags))
+			wait_on_bit(&gt->reset.flags,
 				    I915_RESET_ENGINE + engine->id,
 				    TASK_UNINTERRUPTIBLE);
 	}
 
-	i915_reset_device(i915, engine_mask, msg);
+	intel_gt_reset_global(gt, engine_mask, msg);
 
-	for_each_engine(engine, i915, tmp) {
-		clear_bit(I915_RESET_ENGINE + engine->id,
-			  &error->flags);
-	}
-
-	clear_bit(I915_RESET_BACKOFF, &error->flags);
-	wake_up_all(&error->reset_queue);
+	for_each_engine(engine, gt->i915, tmp)
+		clear_bit_unlock(I915_RESET_ENGINE + engine->id,
+				 &gt->reset.flags);
+	clear_bit_unlock(I915_RESET_BACKOFF, &gt->reset.flags);
+	smp_mb__after_atomic();
+	wake_up_all(&gt->reset.queue);
 
 out:
-	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
+	intel_runtime_pm_put(&gt->i915->runtime_pm, wakeref);
 }
 
-int i915_reset_trylock(struct drm_i915_private *i915)
+int intel_gt_reset_trylock(struct intel_gt *gt)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
 	int srcu;
 
-	might_lock(&error->reset_backoff_srcu);
+	might_lock(&gt->reset.backoff_srcu);
 	might_sleep();
 
 	rcu_read_lock();
-	while (test_bit(I915_RESET_BACKOFF, &error->flags)) {
+	while (test_bit(I915_RESET_BACKOFF, &gt->reset.flags)) {
 		rcu_read_unlock();
 
-		if (wait_event_interruptible(error->reset_queue,
+		if (wait_event_interruptible(gt->reset.queue,
 					     !test_bit(I915_RESET_BACKOFF,
-						       &error->flags)))
+						       &gt->reset.flags)))
 			return -EINTR;
 
 		rcu_read_lock();
 	}
-	srcu = srcu_read_lock(&error->reset_backoff_srcu);
+	srcu = srcu_read_lock(&gt->reset.backoff_srcu);
 	rcu_read_unlock();
 
 	return srcu;
 }
 
-void i915_reset_unlock(struct drm_i915_private *i915, int tag)
-__releases(&i915->gpu_error.reset_backoff_srcu)
+void intel_gt_reset_unlock(struct intel_gt *gt, int tag)
+__releases(&gt->reset.backoff_srcu)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
-
-	srcu_read_unlock(&error->reset_backoff_srcu, tag);
+	srcu_read_unlock(&gt->reset.backoff_srcu, tag);
 }
 
-int i915_terminally_wedged(struct drm_i915_private *i915)
+int intel_gt_terminally_wedged(struct intel_gt *gt)
 {
-	struct i915_gpu_error *error = &i915->gpu_error;
-
 	might_sleep();
 
-	if (!__i915_wedged(error))
+	if (!intel_gt_is_wedged(gt))
 		return 0;
 
 	/* Reset still in progress? Maybe we will recover? */
-	if (!test_bit(I915_RESET_BACKOFF, &error->flags))
+	if (!test_bit(I915_RESET_BACKOFF, &gt->reset.flags))
 		return -EIO;
 
 	/* XXX intel_reset_finish() still takes struct_mutex!!! */
-	if (mutex_is_locked(&i915->drm.struct_mutex))
+	if (mutex_is_locked(&gt->i915->drm.struct_mutex))
 		return -EAGAIN;
 
-	if (wait_event_interruptible(error->reset_queue,
+	if (wait_event_interruptible(gt->reset.queue,
 				     !test_bit(I915_RESET_BACKOFF,
-					       &error->flags)))
+					       &gt->reset.flags)))
 		return -EINTR;
 
-	return __i915_wedged(error) ? -EIO : 0;
+	return intel_gt_is_wedged(gt) ? -EIO : 0;
+}
+
+void intel_gt_init_reset(struct intel_gt *gt)
+{
+	init_waitqueue_head(&gt->reset.queue);
+	mutex_init(&gt->reset.mutex);
+	init_srcu_struct(&gt->reset.backoff_srcu);
+}
+
+void intel_gt_fini_reset(struct intel_gt *gt)
+{
+	cleanup_srcu_struct(&gt->reset.backoff_srcu);
 }
 
-static void i915_wedge_me(struct work_struct *work)
+static void intel_wedge_me(struct work_struct *work)
 {
-	struct i915_wedge_me *w = container_of(work, typeof(*w), work.work);
+	struct intel_wedge_me *w = container_of(work, typeof(*w), work.work);
 
-	dev_err(w->i915->drm.dev,
+	dev_err(w->gt->i915->drm.dev,
 		"%s timed out, cancelling all in-flight rendering.\n",
 		w->name);
-	i915_gem_set_wedged(w->i915);
+	intel_gt_set_wedged(w->gt);
 }
 
-void __i915_init_wedge(struct i915_wedge_me *w,
-		       struct drm_i915_private *i915,
-		       long timeout,
-		       const char *name)
+void __intel_init_wedge(struct intel_wedge_me *w,
+			struct intel_gt *gt,
+			long timeout,
+			const char *name)
 {
-	w->i915 = i915;
+	w->gt = gt;
 	w->name = name;
 
-	INIT_DELAYED_WORK_ONSTACK(&w->work, i915_wedge_me);
+	INIT_DELAYED_WORK_ONSTACK(&w->work, intel_wedge_me);
 	schedule_delayed_work(&w->work, timeout);
 }
 
-void __i915_fini_wedge(struct i915_wedge_me *w)
+void __intel_fini_wedge(struct intel_wedge_me *w)
 {
 	cancel_delayed_work_sync(&w->work);
 	destroy_delayed_work_on_stack(&w->work);
-	w->i915 = NULL;
+	w->gt = NULL;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h
index 03fba0ab3868..62f6cb520f96 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.h
+++ b/drivers/gpu/drm/i915/gt/intel_reset.h
@@ -11,56 +11,67 @@
 #include <linux/types.h>
 #include <linux/srcu.h>
 
-#include "gt/intel_engine_types.h"
+#include "intel_engine_types.h"
+#include "intel_reset_types.h"
 
 struct drm_i915_private;
 struct i915_request;
 struct intel_engine_cs;
+struct intel_gt;
 struct intel_guc;
 
+void intel_gt_init_reset(struct intel_gt *gt);
+void intel_gt_fini_reset(struct intel_gt *gt);
+
 __printf(4, 5)
-void i915_handle_error(struct drm_i915_private *i915,
-		       intel_engine_mask_t engine_mask,
-		       unsigned long flags,
-		       const char *fmt, ...);
+void intel_gt_handle_error(struct intel_gt *gt,
+			   intel_engine_mask_t engine_mask,
+			   unsigned long flags,
+			   const char *fmt, ...);
 #define I915_ERROR_CAPTURE BIT(0)
 
-void i915_reset(struct drm_i915_private *i915,
-		intel_engine_mask_t stalled_mask,
-		const char *reason);
-int i915_reset_engine(struct intel_engine_cs *engine,
-		      const char *reason);
-
-void i915_reset_request(struct i915_request *rq, bool guilty);
+void intel_gt_reset(struct intel_gt *gt,
+		    intel_engine_mask_t stalled_mask,
+		    const char *reason);
+int intel_engine_reset(struct intel_engine_cs *engine,
+		       const char *reason);
 
-int __must_check i915_reset_trylock(struct drm_i915_private *i915);
-void i915_reset_unlock(struct drm_i915_private *i915, int tag);
+void __i915_request_reset(struct i915_request *rq, bool guilty);
 
-int i915_terminally_wedged(struct drm_i915_private *i915);
+int __must_check intel_gt_reset_trylock(struct intel_gt *gt);
+void intel_gt_reset_unlock(struct intel_gt *gt, int tag);
 
-bool intel_has_gpu_reset(struct drm_i915_private *i915);
-bool intel_has_reset_engine(struct drm_i915_private *i915);
+void intel_gt_set_wedged(struct intel_gt *gt);
+bool intel_gt_unset_wedged(struct intel_gt *gt);
+int intel_gt_terminally_wedged(struct intel_gt *gt);
 
-int intel_gpu_reset(struct drm_i915_private *i915,
-		    intel_engine_mask_t engine_mask);
+int intel_gpu_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask);
 
-int intel_reset_guc(struct drm_i915_private *i915);
+int intel_reset_guc(struct intel_gt *gt);
 
-struct i915_wedge_me {
+struct intel_wedge_me {
 	struct delayed_work work;
-	struct drm_i915_private *i915;
+	struct intel_gt *gt;
 	const char *name;
 };
 
-void __i915_init_wedge(struct i915_wedge_me *w,
-		       struct drm_i915_private *i915,
-		       long timeout,
-		       const char *name);
-void __i915_fini_wedge(struct i915_wedge_me *w);
+void __intel_init_wedge(struct intel_wedge_me *w,
+			struct intel_gt *gt,
+			long timeout,
+			const char *name);
+void __intel_fini_wedge(struct intel_wedge_me *w);
 
-#define i915_wedge_on_timeout(W, DEV, TIMEOUT)				\
-	for (__i915_init_wedge((W), (DEV), (TIMEOUT), __func__);	\
-	     (W)->i915;							\
-	     __i915_fini_wedge((W)))
+#define intel_wedge_on_timeout(W, GT, TIMEOUT)				\
+	for (__intel_init_wedge((W), (GT), (TIMEOUT), __func__);	\
+	     (W)->gt;							\
+	     __intel_fini_wedge((W)))
+
+static inline bool __intel_reset_failed(const struct intel_reset *reset)
+{
+	return unlikely(test_bit(I915_WEDGED, &reset->flags));
+}
+
+bool intel_has_gpu_reset(struct drm_i915_private *i915);
+bool intel_has_reset_engine(struct drm_i915_private *i915);
 
 #endif /* I915_RESET_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_reset_types.h b/drivers/gpu/drm/i915/gt/intel_reset_types.h
new file mode 100644
index 000000000000..31968356e0c0
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_reset_types.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_RESET_TYPES_H_
+#define __INTEL_RESET_TYPES_H_
+
+#include <linux/mutex.h>
+#include <linux/wait.h>
+#include <linux/srcu.h>
+
+struct intel_reset {
+	/**
+	 * flags: Control various stages of the GPU reset
+	 *
+	 * #I915_RESET_BACKOFF - When we start a global reset, we need to
+	 * serialise with any other users attempting to do the same, and
+	 * any global resources that may be clobber by the reset (such as
+	 * FENCE registers).
+	 *
+	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
+	 * acquire the struct_mutex to reset an engine, we need an explicit
+	 * flag to prevent two concurrent reset attempts in the same engine.
+	 * As the number of engines continues to grow, allocate the flags from
+	 * the most significant bits.
+	 *
+	 * #I915_WEDGED - If reset fails and we can no longer use the GPU,
+	 * we set the #I915_WEDGED bit. Prior to command submission, e.g.
+	 * i915_request_alloc(), this bit is checked and the sequence
+	 * aborted (with -EIO reported to userspace) if set.
+	 */
+	unsigned long flags;
+#define I915_RESET_BACKOFF	0
+#define I915_RESET_MODESET	1
+#define I915_RESET_ENGINE	2
+#define I915_WEDGED		(BITS_PER_LONG - 1)
+
+	struct mutex mutex; /* serialises wedging/unwedging */
+
+	/**
+	 * Waitqueue to signal when the reset has completed. Used by clients
+	 * that wait for dev_priv->mm.wedged to settle.
+	 */
+	wait_queue_head_t queue;
+
+	struct srcu_struct backoff_srcu;
+};
+
+#endif /* _INTEL_RESET_TYPES_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
index c46ce676d28f..7cf42b477227 100644
--- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c
@@ -788,7 +788,7 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
 		 * If the request was innocent, we try to replay the request
 		 * with the restored context.
 		 */
-		i915_reset_request(rq, stalled);
+		__i915_request_reset(rq, stalled);
 
 		GEM_BUG_ON(rq->ring != engine->buffer);
 		head = rq->head;
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 2d9cc3cd1f27..8caad19683a1 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -32,7 +32,6 @@
 #include "selftests/i915_random.h"
 #include "selftests/igt_flush_test.h"
 #include "selftests/igt_reset.h"
-#include "selftests/igt_wedge_me.h"
 #include "selftests/igt_atomic.h"
 
 #include "selftests/mock_drm.h"
@@ -316,7 +315,7 @@ static int igt_hang_sanitycheck(void *arg)
 		goto unlock;
 
 	for_each_engine(engine, i915, id) {
-		struct igt_wedge_me w;
+		struct intel_wedge_me w;
 		long timeout;
 
 		if (!intel_engine_can_store_dword(engine))
@@ -338,10 +337,10 @@ static int igt_hang_sanitycheck(void *arg)
 		i915_request_add(rq);
 
 		timeout = 0;
-		igt_wedge_on_timeout(&w, i915, HZ / 10 /* 100ms timeout*/)
+		intel_wedge_on_timeout(&w, &i915->gt, HZ / 10 /* 100ms */)
 			timeout = i915_request_wait(rq, 0,
 						    MAX_SCHEDULE_TIMEOUT);
-		if (i915_reset_failed(i915))
+		if (intel_gt_is_wedged(&i915->gt))
 			timeout = -EIO;
 
 		i915_request_put(rq);
@@ -413,12 +412,12 @@ static int igt_reset_nop(void *arg)
 			}
 		}
 
-		igt_global_reset_lock(i915);
-		i915_reset(i915, ALL_ENGINES, NULL);
-		igt_global_reset_unlock(i915);
+		igt_global_reset_lock(&i915->gt);
+		intel_gt_reset(&i915->gt, ALL_ENGINES, NULL);
+		igt_global_reset_unlock(&i915->gt);
 
 		mutex_unlock(&i915->drm.struct_mutex);
-		if (i915_reset_failed(i915)) {
+		if (intel_gt_is_wedged(&i915->gt)) {
 			err = -EIO;
 			break;
 		}
@@ -442,7 +441,7 @@ static int igt_reset_nop(void *arg)
 
 out:
 	mock_file_free(i915, file);
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		err = -EIO;
 	return err;
 }
@@ -484,7 +483,7 @@ static int igt_reset_nop_engine(void *arg)
 							     engine);
 		count = 0;
 
-		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		set_bit(I915_RESET_ENGINE + id, &i915->gt.reset.flags);
 		do {
 			int i;
 
@@ -507,7 +506,7 @@ static int igt_reset_nop_engine(void *arg)
 
 				i915_request_add(rq);
 			}
-			err = i915_reset_engine(engine, NULL);
+			err = intel_engine_reset(engine, NULL);
 			mutex_unlock(&i915->drm.struct_mutex);
 			if (err) {
 				pr_err("i915_reset_engine failed\n");
@@ -528,7 +527,7 @@ static int igt_reset_nop_engine(void *arg)
 				break;
 			}
 		} while (time_before(jiffies, end_time));
-		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		clear_bit(I915_RESET_ENGINE + id, &i915->gt.reset.flags);
 		pr_info("%s(%s): %d resets\n", __func__, engine->name, count);
 
 		if (err)
@@ -545,7 +544,7 @@ static int igt_reset_nop_engine(void *arg)
 
 out:
 	mock_file_free(i915, file);
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		err = -EIO;
 	return err;
 }
@@ -589,7 +588,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 							     engine);
 
 		intel_engine_pm_get(engine);
-		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		set_bit(I915_RESET_ENGINE + id, &i915->gt.reset.flags);
 		do {
 			if (active) {
 				struct i915_request *rq;
@@ -622,7 +621,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 				i915_request_put(rq);
 			}
 
-			err = i915_reset_engine(engine, NULL);
+			err = intel_engine_reset(engine, NULL);
 			if (err) {
 				pr_err("i915_reset_engine failed\n");
 				break;
@@ -642,7 +641,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 				break;
 			}
 		} while (time_before(jiffies, end_time));
-		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		clear_bit(I915_RESET_ENGINE + id, &i915->gt.reset.flags);
 		intel_engine_pm_put(engine);
 
 		if (err)
@@ -653,7 +652,7 @@ static int __igt_reset_engine(struct drm_i915_private *i915, bool active)
 			break;
 	}
 
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		err = -EIO;
 
 	if (active) {
@@ -701,7 +700,7 @@ static int active_request_put(struct i915_request *rq)
 			  rq->fence.seqno);
 		GEM_TRACE_DUMP();
 
-		i915_gem_set_wedged(rq->i915);
+		intel_gt_set_wedged(rq->engine->gt);
 		err = -EIO;
 	}
 
@@ -851,7 +850,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 		}
 
 		intel_engine_pm_get(engine);
-		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		set_bit(I915_RESET_ENGINE + id, &i915->gt.reset.flags);
 		do {
 			struct i915_request *rq = NULL;
 
@@ -882,7 +881,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 				}
 			}
 
-			err = i915_reset_engine(engine, NULL);
+			err = intel_engine_reset(engine, NULL);
 			if (err) {
 				pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
 				       engine->name, test_name, err);
@@ -904,7 +903,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 					i915_request_put(rq);
 
 					GEM_TRACE_DUMP();
-					i915_gem_set_wedged(i915);
+					intel_gt_set_wedged(&i915->gt);
 					err = -EIO;
 					break;
 				}
@@ -926,7 +925,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 				break;
 			}
 		} while (time_before(jiffies, end_time));
-		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		clear_bit(I915_RESET_ENGINE + id, &i915->gt.reset.flags);
 		intel_engine_pm_put(engine);
 		pr_info("i915_reset_engine(%s:%s): %lu resets\n",
 			engine->name, test_name, count);
@@ -956,7 +955,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 			}
 			put_task_struct(threads[tmp].task);
 
-			if (other != engine &&
+			if (other->uabi_class != engine->uabi_class &&
 			    threads[tmp].resets !=
 			    i915_reset_engine_count(&i915->gpu_error, other)) {
 				pr_err("Innocent engine %s was reset (count=%ld)\n",
@@ -986,7 +985,7 @@ static int __igt_reset_engines(struct drm_i915_private *i915,
 			break;
 	}
 
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		err = -EIO;
 
 	if (flags & TEST_ACTIVE) {
@@ -1041,7 +1040,7 @@ static u32 fake_hangcheck(struct drm_i915_private *i915,
 {
 	u32 count = i915_reset_count(&i915->gpu_error);
 
-	i915_reset(i915, mask, NULL);
+	intel_gt_reset(&i915->gt, mask, NULL);
 
 	return count;
 }
@@ -1060,7 +1059,7 @@ static int igt_reset_wait(void *arg)
 
 	/* Check that we detect a stuck waiter and issue a reset */
 
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	err = hang_init(&h, i915);
@@ -1083,7 +1082,7 @@ static int igt_reset_wait(void *arg)
 		       __func__, rq->fence.seqno, hws_seqno(&h, rq));
 		intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
 
-		i915_gem_set_wedged(i915);
+		intel_gt_set_wedged(&i915->gt);
 
 		err = -EIO;
 		goto out_rq;
@@ -1111,9 +1110,9 @@ static int igt_reset_wait(void *arg)
 	hang_fini(&h);
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return -EIO;
 
 	return err;
@@ -1261,7 +1260,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 		       __func__, rq->fence.seqno, hws_seqno(&h, rq));
 		intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
 
-		i915_gem_set_wedged(i915);
+		intel_gt_set_wedged(&i915->gt);
 		goto out_reset;
 	}
 
@@ -1283,20 +1282,20 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 		pr_err("igt/evict_vma kthread did not wait\n");
 		intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
 
-		i915_gem_set_wedged(i915);
+		intel_gt_set_wedged(&i915->gt);
 		goto out_reset;
 	}
 
 out_reset:
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 	fake_hangcheck(rq->i915, rq->engine->mask);
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 
 	if (tsk) {
-		struct igt_wedge_me w;
+		struct intel_wedge_me w;
 
 		/* The reset, even indirectly, should take less than 10ms. */
-		igt_wedge_on_timeout(&w, i915, HZ / 10 /* 100ms timeout*/)
+		intel_wedge_on_timeout(&w, &i915->gt, HZ / 10 /* 100ms */)
 			err = kthread_stop(tsk);
 
 		put_task_struct(tsk);
@@ -1312,7 +1311,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return -EIO;
 
 	return err;
@@ -1390,7 +1389,7 @@ static int igt_reset_queue(void *arg)
 
 	/* Check that we replay pending requests following a hang */
 
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	err = hang_init(&h, i915);
@@ -1446,7 +1445,7 @@ static int igt_reset_queue(void *arg)
 				i915_request_put(prev);
 
 				GEM_TRACE_DUMP();
-				i915_gem_set_wedged(i915);
+				intel_gt_set_wedged(&i915->gt);
 				goto fini;
 			}
 
@@ -1462,7 +1461,7 @@ static int igt_reset_queue(void *arg)
 				i915_request_put(rq);
 				i915_request_put(prev);
 
-				i915_gem_set_wedged(i915);
+				intel_gt_set_wedged(&i915->gt);
 
 				err = -EIO;
 				goto fini;
@@ -1516,9 +1515,9 @@ static int igt_reset_queue(void *arg)
 	hang_fini(&h);
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return -EIO;
 
 	return err;
@@ -1563,7 +1562,7 @@ static int igt_handle_error(void *arg)
 		       __func__, rq->fence.seqno, hws_seqno(&h, rq));
 		intel_engine_dump(rq->engine, &p, "%s\n", rq->engine->name);
 
-		i915_gem_set_wedged(i915);
+		intel_gt_set_wedged(&i915->gt);
 
 		err = -EIO;
 		goto err_request;
@@ -1574,7 +1573,7 @@ static int igt_handle_error(void *arg)
 	/* Temporarily disable error capture */
 	error = xchg(&i915->gpu_error.first_error, (void *)-1);
 
-	i915_handle_error(i915, engine->mask, 0, NULL);
+	intel_gt_handle_error(&i915->gt, engine->mask, 0, NULL);
 
 	xchg(&i915->gpu_error.first_error, error);
 
@@ -1608,7 +1607,7 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine,
 	tasklet_disable_nosync(t);
 	p->critical_section_begin();
 
-	err = i915_reset_engine(engine, NULL);
+	err = intel_engine_reset(engine, NULL);
 
 	p->critical_section_end();
 	tasklet_enable(t);
@@ -1651,16 +1650,16 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 		pr_err("%s(%s): Failed to start request %llx, at %x\n",
 		       __func__, engine->name,
 		       rq->fence.seqno, hws_seqno(&h, rq));
-		i915_gem_set_wedged(i915);
+		intel_gt_set_wedged(&i915->gt);
 		err = -EIO;
 	}
 
 	if (err == 0) {
-		struct igt_wedge_me w;
+		struct intel_wedge_me w;
 
-		igt_wedge_on_timeout(&w, i915, HZ / 20 /* 50ms timeout*/)
+		intel_wedge_on_timeout(&w, &i915->gt, HZ / 20 /* 50ms timeout*/)
 			i915_request_wait(rq, 0, MAX_SCHEDULE_TIMEOUT);
-		if (i915_reset_failed(i915))
+		if (intel_gt_is_wedged(&i915->gt))
 			err = -EIO;
 	}
 
@@ -1684,11 +1683,11 @@ static int igt_reset_engines_atomic(void *arg)
 	if (USES_GUC_SUBMISSION(i915))
 		return 0;
 
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 	mutex_lock(&i915->drm.struct_mutex);
 
 	/* Flush any requests before we get started and check basics */
-	if (!igt_force_reset(i915))
+	if (!igt_force_reset(&i915->gt))
 		goto unlock;
 
 	for (p = igt_atomic_phases; p->name; p++) {
@@ -1704,11 +1703,11 @@ static int igt_reset_engines_atomic(void *arg)
 
 out:
 	/* As we poke around the guts, do a full reset before continuing. */
-	igt_force_reset(i915);
+	igt_force_reset(&i915->gt);
 
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 
 	return err;
 }
@@ -1737,12 +1736,12 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return -EIO; /* we're long past hope of a successful reset */
 
 	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
 	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
-	drain_delayed_work(&i915->gpu_error.hangcheck_work); /* flush param */
+	drain_delayed_work(&i915->gt.hangcheck.work); /* flush param */
 
 	err = i915_live_subtests(tests, i915);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 11f490502ca6..720aa13dd2f0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -55,7 +55,7 @@ static int live_sanitycheck(void *arg)
 		if (!igt_wait_for_spinner(&spin, rq)) {
 			GEM_TRACE("spinner failed to start\n");
 			GEM_TRACE_DUMP();
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto err_ctx;
 		}
@@ -211,7 +211,7 @@ slice_semaphore_queue(struct intel_engine_cs *outer,
 		pr_err("Failed to slice along semaphore chain of length (%d, %d)!\n",
 		       count, n);
 		GEM_TRACE_DUMP();
-		i915_gem_set_wedged(outer->i915);
+		intel_gt_set_wedged(outer->gt);
 		err = -EIO;
 	}
 
@@ -445,7 +445,7 @@ static int live_busywait_preempt(void *arg)
 			intel_engine_dump(engine, &p, "%s\n", engine->name);
 			GEM_TRACE_DUMP();
 
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto err_vma;
 		}
@@ -534,7 +534,7 @@ static int live_preempt(void *arg)
 		if (!igt_wait_for_spinner(&spin_lo, rq)) {
 			GEM_TRACE("lo spinner failed to start\n");
 			GEM_TRACE_DUMP();
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto err_ctx_lo;
 		}
@@ -551,7 +551,7 @@ static int live_preempt(void *arg)
 		if (!igt_wait_for_spinner(&spin_hi, rq)) {
 			GEM_TRACE("hi spinner failed to start\n");
 			GEM_TRACE_DUMP();
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto err_ctx_lo;
 		}
@@ -688,7 +688,7 @@ static int live_late_preempt(void *arg)
 err_wedged:
 	igt_spinner_end(&spin_hi);
 	igt_spinner_end(&spin_lo);
-	i915_gem_set_wedged(i915);
+	intel_gt_set_wedged(&i915->gt);
 	err = -EIO;
 	goto err_ctx_lo;
 }
@@ -826,7 +826,7 @@ static int live_suppress_self_preempt(void *arg)
 err_wedged:
 	igt_spinner_end(&b.spin);
 	igt_spinner_end(&a.spin);
-	i915_gem_set_wedged(i915);
+	intel_gt_set_wedged(&i915->gt);
 	err = -EIO;
 	goto err_client_b;
 }
@@ -993,7 +993,7 @@ static int live_suppress_wait_preempt(void *arg)
 err_wedged:
 	for (i = 0; i < ARRAY_SIZE(client); i++)
 		igt_spinner_end(&client[i].spin);
-	i915_gem_set_wedged(i915);
+	intel_gt_set_wedged(&i915->gt);
 	err = -EIO;
 	goto err_client_3;
 }
@@ -1139,7 +1139,7 @@ static int live_chain_preempt(void *arg)
 err_wedged:
 	igt_spinner_end(&hi.spin);
 	igt_spinner_end(&lo.spin);
-	i915_gem_set_wedged(i915);
+	intel_gt_set_wedged(&i915->gt);
 	err = -EIO;
 	goto err_client_lo;
 }
@@ -1198,7 +1198,7 @@ static int live_preempt_hang(void *arg)
 		if (!igt_wait_for_spinner(&spin_lo, rq)) {
 			GEM_TRACE("lo spinner failed to start\n");
 			GEM_TRACE_DUMP();
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto err_ctx_lo;
 		}
@@ -1220,21 +1220,21 @@ static int live_preempt_hang(void *arg)
 						 HZ / 10)) {
 			pr_err("Preemption did not occur within timeout!");
 			GEM_TRACE_DUMP();
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto err_ctx_lo;
 		}
 
-		set_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
-		i915_reset_engine(engine, NULL);
-		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+		set_bit(I915_RESET_ENGINE + id, &i915->gt.reset.flags);
+		intel_engine_reset(engine, NULL);
+		clear_bit(I915_RESET_ENGINE + id, &i915->gt.reset.flags);
 
 		engine->execlists.preempt_hang.inject_hang = false;
 
 		if (!igt_wait_for_spinner(&spin_hi, rq)) {
 			GEM_TRACE("hi spinner failed to start\n");
 			GEM_TRACE_DUMP();
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto err_ctx_lo;
 		}
@@ -1614,7 +1614,7 @@ static int nop_virtual_engine(struct drm_i915_private *i915,
 					  request[nc]->fence.context,
 					  request[nc]->fence.seqno);
 				GEM_TRACE_DUMP();
-				i915_gem_set_wedged(i915);
+				intel_gt_set_wedged(&i915->gt);
 				break;
 			}
 		}
@@ -1761,7 +1761,7 @@ static int mask_virtual_engine(struct drm_i915_private *i915,
 				  request[n]->fence.context,
 				  request[n]->fence.seqno);
 			GEM_TRACE_DUMP();
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			err = -EIO;
 			goto out;
 		}
@@ -2037,7 +2037,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 	if (!HAS_EXECLISTS(i915))
 		return 0;
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	return i915_live_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index 672e32e1ef95..e91c1cc5479b 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -15,20 +15,20 @@ static int igt_global_reset(void *arg)
 
 	/* Check that we can issue a global GPU reset */
 
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 
 	reset_count = i915_reset_count(&i915->gpu_error);
 
-	i915_reset(i915, ALL_ENGINES, NULL);
+	intel_gt_reset(&i915->gt, ALL_ENGINES, NULL);
 
 	if (i915_reset_count(&i915->gpu_error) == reset_count) {
 		pr_err("No GPU reset recorded!\n");
 		err = -EINVAL;
 	}
 
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 
-	if (i915_reset_failed(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		err = -EIO;
 
 	return err;
@@ -41,18 +41,18 @@ static int igt_wedged_reset(void *arg)
 
 	/* Check that we can recover a wedged device with a GPU reset */
 
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
 
-	i915_gem_set_wedged(i915);
+	intel_gt_set_wedged(&i915->gt);
 
-	GEM_BUG_ON(!i915_reset_failed(i915));
-	i915_reset(i915, ALL_ENGINES, NULL);
+	GEM_BUG_ON(!intel_gt_is_wedged(&i915->gt));
+	intel_gt_reset(&i915->gt, ALL_ENGINES, NULL);
 
 	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 
-	return i915_reset_failed(i915) ? -EIO : 0;
+	return intel_gt_is_wedged(&i915->gt) ? -EIO : 0;
 }
 
 static int igt_atomic_reset(void *arg)
@@ -64,10 +64,10 @@ static int igt_atomic_reset(void *arg)
 	/* Check that the resets are usable from atomic context */
 
 	intel_gt_pm_get(&i915->gt);
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 
 	/* Flush any requests before we get started and check basics */
-	if (!igt_force_reset(i915))
+	if (!igt_force_reset(&i915->gt))
 		goto unlock;
 
 	for (p = igt_atomic_phases; p->name; p++) {
@@ -75,13 +75,13 @@ static int igt_atomic_reset(void *arg)
 
 		GEM_TRACE("intel_gpu_reset under %s\n", p->name);
 
-		awake = reset_prepare(i915);
+		awake = reset_prepare(&i915->gt);
 		p->critical_section_begin();
 
-		err = intel_gpu_reset(i915, ALL_ENGINES);
+		err = intel_gpu_reset(&i915->gt, ALL_ENGINES);
 
 		p->critical_section_end();
-		reset_finish(i915, awake);
+		reset_finish(&i915->gt, awake);
 
 		if (err) {
 			pr_err("intel_gpu_reset failed under %s\n", p->name);
@@ -90,10 +90,10 @@ static int igt_atomic_reset(void *arg)
 	}
 
 	/* As we poke around the guts, do a full reset before continuing. */
-	igt_force_reset(i915);
+	igt_force_reset(&i915->gt);
 
 unlock:
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 	intel_gt_pm_put(&i915->gt);
 
 	return err;
@@ -116,10 +116,10 @@ static int igt_atomic_engine_reset(void *arg)
 		return 0;
 
 	intel_gt_pm_get(&i915->gt);
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 
 	/* Flush any requests before we get started and check basics */
-	if (!igt_force_reset(i915))
+	if (!igt_force_reset(&i915->gt))
 		goto out_unlock;
 
 	for_each_engine(engine, i915, id) {
@@ -127,15 +127,15 @@ static int igt_atomic_engine_reset(void *arg)
 		intel_engine_pm_get(engine);
 
 		for (p = igt_atomic_phases; p->name; p++) {
-			GEM_TRACE("i915_reset_engine(%s) under %s\n",
+			GEM_TRACE("intel_engine_reset(%s) under %s\n",
 				  engine->name, p->name);
 
 			p->critical_section_begin();
-			err = i915_reset_engine(engine, NULL);
+			err = intel_engine_reset(engine, NULL);
 			p->critical_section_end();
 
 			if (err) {
-				pr_err("i915_reset_engine(%s) failed under %s\n",
+				pr_err("intel_engine_reset(%s) failed under %s\n",
 				       engine->name, p->name);
 				break;
 			}
@@ -148,10 +148,10 @@ static int igt_atomic_engine_reset(void *arg)
 	}
 
 	/* As we poke around the guts, do a full reset before continuing. */
-	igt_force_reset(i915);
+	igt_force_reset(&i915->gt);
 
 out_unlock:
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 	intel_gt_pm_put(&i915->gt);
 
 	return err;
@@ -171,7 +171,7 @@ int intel_reset_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return -EIO; /* we're long past hope of a successful reset */
 
 	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
index 9f3100135590..d54113697745 100644
--- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
+++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
@@ -7,6 +7,7 @@
 #include <linux/prime_numbers.h>
 
 #include "gem/i915_gem_pm.h"
+#include "intel_gt.h"
 
 #include "../selftests/i915_random.h"
 #include "../i915_selftest.h"
@@ -834,7 +835,7 @@ int intel_timeline_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_hwsp_wrap),
 	};
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	return i915_live_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index b933d831eeb1..b2231094fb2d 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -12,7 +12,6 @@
 #include "selftests/igt_flush_test.h"
 #include "selftests/igt_reset.h"
 #include "selftests/igt_spinner.h"
-#include "selftests/igt_wedge_me.h"
 #include "selftests/mock_drm.h"
 
 #include "gem/selftests/igt_gem_utils.h"
@@ -185,7 +184,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
 			   struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_object *results;
-	struct igt_wedge_me wedge;
+	struct intel_wedge_me wedge;
 	u32 *vaddr;
 	int err;
 	int i;
@@ -196,10 +195,10 @@ static int check_whitelist(struct i915_gem_context *ctx,
 
 	err = 0;
 	i915_gem_object_lock(results);
-	igt_wedge_on_timeout(&wedge, ctx->i915, HZ / 5) /* a safety net! */
+	intel_wedge_on_timeout(&wedge, &ctx->i915->gt, HZ / 5) /* safety net! */
 		err = i915_gem_object_set_to_cpu_domain(results, false);
 	i915_gem_object_unlock(results);
-	if (i915_terminally_wedged(ctx->i915))
+	if (intel_gt_is_wedged(&ctx->i915->gt))
 		err = -EIO;
 	if (err)
 		goto out_put;
@@ -232,13 +231,13 @@ static int check_whitelist(struct i915_gem_context *ctx,
 
 static int do_device_reset(struct intel_engine_cs *engine)
 {
-	i915_reset(engine->i915, engine->mask, "live_workarounds");
+	intel_gt_reset(engine->gt, engine->mask, "live_workarounds");
 	return 0;
 }
 
 static int do_engine_reset(struct intel_engine_cs *engine)
 {
-	return i915_reset_engine(engine, "live_workarounds");
+	return intel_engine_reset(engine, "live_workarounds");
 }
 
 static int
@@ -571,7 +570,7 @@ static int check_dirty_whitelist(struct i915_gem_context *ctx,
 		if (i915_request_wait(rq, 0, HZ / 5) < 0) {
 			pr_err("%s: Futzing %x timedout; cancelling test\n",
 			       engine->name, reg);
-			i915_gem_set_wedged(ctx->i915);
+			intel_gt_set_wedged(&ctx->i915->gt);
 			err = -EIO;
 			goto out_batch;
 		}
@@ -708,7 +707,7 @@ static int live_reset_whitelist(void *arg)
 	if (!engine || engine->whitelist.count == 0)
 		return 0;
 
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 
 	if (intel_has_reset_engine(i915)) {
 		err = check_whitelist_across_reset(engine,
@@ -727,7 +726,7 @@ static int live_reset_whitelist(void *arg)
 	}
 
 out:
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 	return err;
 }
 
@@ -1095,7 +1094,7 @@ live_gpu_reset_workarounds(void *arg)
 
 	pr_info("Verifying after GPU reset...\n");
 
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
 
 	reference_lists_init(i915, &lists);
@@ -1104,7 +1103,7 @@ live_gpu_reset_workarounds(void *arg)
 	if (!ok)
 		goto out;
 
-	i915_reset(i915, ALL_ENGINES, "live_workarounds");
+	intel_gt_reset(&i915->gt, ALL_ENGINES, "live_workarounds");
 
 	ok = verify_wa_lists(ctx, &lists, "after reset");
 
@@ -1112,7 +1111,7 @@ live_gpu_reset_workarounds(void *arg)
 	kernel_context_close(ctx);
 	reference_lists_fini(i915, &lists);
 	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 
 	return ok ? 0 : -ESRCH;
 }
@@ -1137,7 +1136,7 @@ live_engine_reset_workarounds(void *arg)
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
 
-	igt_global_reset_lock(i915);
+	igt_global_reset_lock(&i915->gt);
 	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
 
 	reference_lists_init(i915, &lists);
@@ -1153,7 +1152,7 @@ live_engine_reset_workarounds(void *arg)
 			goto err;
 		}
 
-		i915_reset_engine(engine, "live_workarounds");
+		intel_engine_reset(engine, "live_workarounds");
 
 		ok = verify_wa_lists(ctx, &lists, "after idle reset");
 		if (!ok) {
@@ -1181,7 +1180,7 @@ live_engine_reset_workarounds(void *arg)
 			goto err;
 		}
 
-		i915_reset_engine(engine, "live_workarounds");
+		intel_engine_reset(engine, "live_workarounds");
 
 		igt_spinner_end(&spin);
 		igt_spinner_fini(&spin);
@@ -1196,7 +1195,7 @@ live_engine_reset_workarounds(void *arg)
 err:
 	reference_lists_fini(i915, &lists);
 	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
-	igt_global_reset_unlock(i915);
+	igt_global_reset_unlock(&i915->gt);
 	kernel_context_close(ctx);
 
 	igt_flush_test(i915, I915_WAIT_LOCKED);
@@ -1215,7 +1214,7 @@ int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 	};
 	int err;
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	mutex_lock(&i915->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ce1b6568515e..cf5155646927 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1007,15 +1007,16 @@ static void i915_instdone_info(struct drm_i915_private *dev_priv,
 
 static int i915_hangcheck_info(struct seq_file *m, void *unused)
 {
-	struct drm_i915_private *dev_priv = node_to_i915(m->private);
+	struct drm_i915_private *i915 = node_to_i915(m->private);
+	struct intel_gt *gt = &i915->gt;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
 	enum intel_engine_id id;
 
-	seq_printf(m, "Reset flags: %lx\n", dev_priv->gpu_error.flags);
-	if (test_bit(I915_WEDGED, &dev_priv->gpu_error.flags))
+	seq_printf(m, "Reset flags: %lx\n", gt->reset.flags);
+	if (test_bit(I915_WEDGED, &gt->reset.flags))
 		seq_puts(m, "\tWedged\n");
-	if (test_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags))
+	if (test_bit(I915_RESET_BACKOFF, &gt->reset.flags))
 		seq_puts(m, "\tDevice (global) reset in progress\n");
 
 	if (!i915_modparams.enable_hangcheck) {
@@ -1023,19 +1024,19 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 		return 0;
 	}
 
-	if (timer_pending(&dev_priv->gpu_error.hangcheck_work.timer))
+	if (timer_pending(&gt->hangcheck.work.timer))
 		seq_printf(m, "Hangcheck active, timer fires in %dms\n",
-			   jiffies_to_msecs(dev_priv->gpu_error.hangcheck_work.timer.expires -
+			   jiffies_to_msecs(gt->hangcheck.work.timer.expires -
 					    jiffies));
-	else if (delayed_work_pending(&dev_priv->gpu_error.hangcheck_work))
+	else if (delayed_work_pending(&gt->hangcheck.work))
 		seq_puts(m, "Hangcheck active, work pending\n");
 	else
 		seq_puts(m, "Hangcheck inactive\n");
 
-	seq_printf(m, "GT active? %s\n", yesno(dev_priv->gt.awake));
+	seq_printf(m, "GT active? %s\n", yesno(gt->awake));
 
-	with_intel_runtime_pm(&dev_priv->runtime_pm, wakeref) {
-		for_each_engine(engine, dev_priv, id) {
+	with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
+		for_each_engine(engine, i915, id) {
 			struct intel_instdone instdone;
 
 			seq_printf(m, "%s: %d ms ago\n",
@@ -1050,10 +1051,10 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 			intel_engine_get_instdone(engine, &instdone);
 
 			seq_puts(m, "\tinstdone read =\n");
-			i915_instdone_info(dev_priv, m, &instdone);
+			i915_instdone_info(i915, m, &instdone);
 
 			seq_puts(m, "\tinstdone accu =\n");
-			i915_instdone_info(dev_priv, m,
+			i915_instdone_info(i915, m,
 					   &engine->hangcheck.instdone);
 		}
 	}
@@ -1061,23 +1062,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	return 0;
 }
 
-static int i915_reset_info(struct seq_file *m, void *unused)
-{
-	struct drm_i915_private *dev_priv = node_to_i915(m->private);
-	struct i915_gpu_error *error = &dev_priv->gpu_error;
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-
-	seq_printf(m, "full gpu reset = %u\n", i915_reset_count(error));
-
-	for_each_engine(engine, dev_priv, id) {
-		seq_printf(m, "%s = %u\n", engine->name,
-			   i915_reset_engine_count(error, engine));
-	}
-
-	return 0;
-}
-
 static int ironlake_drpc_info(struct seq_file *m)
 {
 	struct drm_i915_private *i915 = node_to_i915(m->private);
@@ -3554,7 +3538,8 @@ static const struct file_operations i915_cur_wm_latency_fops = {
 static int
 i915_wedged_get(void *data, u64 *val)
 {
-	int ret = i915_terminally_wedged(data);
+	struct drm_i915_private *i915 = data;
+	int ret = intel_gt_terminally_wedged(&i915->gt);
 
 	switch (ret) {
 	case -EIO:
@@ -3574,11 +3559,11 @@ i915_wedged_set(void *data, u64 val)
 	struct drm_i915_private *i915 = data;
 
 	/* Flush any previous reset before applying for a new one */
-	wait_event(i915->gpu_error.reset_queue,
-		   !test_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags));
+	wait_event(i915->gt.reset.queue,
+		   !test_bit(I915_RESET_BACKOFF, &i915->gt.reset.flags));
 
-	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
-			  "Manually set wedged engine mask = %llx", val);
+	intel_gt_handle_error(&i915->gt, val, I915_ERROR_CAPTURE,
+			      "Manually set wedged engine mask = %llx", val);
 	return 0;
 }
 
@@ -3621,8 +3606,9 @@ i915_drop_caches_set(void *data, u64 val)
 		  val, val & DROP_ALL);
 
 	if (val & DROP_RESET_ACTIVE &&
-	    wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT))
-		i915_gem_set_wedged(i915);
+	    wait_for(intel_engines_are_idle(&i915->gt),
+		     I915_IDLE_ENGINES_TIMEOUT))
+		intel_gt_set_wedged(&i915->gt);
 
 	/* No need to check and wait for gpu resets, only libdrm auto-restarts
 	 * on ioctls on -EAGAIN. */
@@ -3657,8 +3643,8 @@ i915_drop_caches_set(void *data, u64 val)
 		mutex_unlock(&i915->drm.struct_mutex);
 	}
 
-	if (val & DROP_RESET_ACTIVE && i915_terminally_wedged(i915))
-		i915_handle_error(i915, ALL_ENGINES, 0, NULL);
+	if (val & DROP_RESET_ACTIVE && intel_gt_terminally_wedged(&i915->gt))
+		intel_gt_handle_error(&i915->gt, ALL_ENGINES, 0, NULL);
 
 	fs_reclaim_acquire(GFP_KERNEL);
 	if (val & DROP_BOUND)
@@ -4312,7 +4298,6 @@ static const struct drm_info_list i915_debugfs_list[] = {
 	{"i915_huc_load_status", i915_huc_load_status_info, 0},
 	{"i915_frequency_info", i915_frequency_info, 0},
 	{"i915_hangcheck_info", i915_hangcheck_info, 0},
-	{"i915_reset_info", i915_reset_info, 0},
 	{"i915_drpc_info", i915_drpc_info, 0},
 	{"i915_emon_status", i915_emon_status, 0},
 	{"i915_ring_freq_table", i915_ring_freq_table, 0},
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 794c6814a6d0..4b9860986a93 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -941,7 +941,6 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv)
 	if (ret < 0)
 		goto err_uc;
 	intel_irq_init(dev_priv);
-	intel_hangcheck_init(dev_priv);
 	intel_init_display_hooks(dev_priv);
 	intel_init_clock_gating_hooks(dev_priv);
 	intel_init_audio_hooks(dev_priv);
@@ -1960,7 +1959,7 @@ void i915_driver_unload(struct drm_device *dev)
 	 * all in-flight requests so that we can quickly unbind the active
 	 * resources.
 	 */
-	i915_gem_set_wedged(dev_priv);
+	intel_gt_set_wedged(&dev_priv->gt);
 
 	/* Flush any external code that still may be under the RCU lock */
 	synchronize_rcu();
@@ -1981,7 +1980,7 @@ void i915_driver_unload(struct drm_device *dev)
 	intel_csr_ucode_fini(dev_priv);
 
 	/* Free error state after interrupts are fully disabled. */
-	cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
+	cancel_delayed_work_sync(&dev_priv->gt.hangcheck.work);
 	i915_reset_error_state(dev_priv);
 
 	i915_gem_fini_hw(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3e1b565cb1b6..79d8a793a228 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2402,28 +2402,10 @@ extern int i915_driver_load(struct pci_dev *pdev,
 extern void i915_driver_unload(struct drm_device *dev);
 
 extern void intel_engine_init_hangcheck(struct intel_engine_cs *engine);
-extern void intel_hangcheck_init(struct drm_i915_private *dev_priv);
 int vlv_force_gfx_clock(struct drm_i915_private *dev_priv, bool on);
 
 u32 intel_calculate_mcr_s_ss_select(struct drm_i915_private *dev_priv);
 
-static inline void i915_queue_hangcheck(struct drm_i915_private *dev_priv)
-{
-	unsigned long delay;
-
-	if (unlikely(!i915_modparams.enable_hangcheck))
-		return;
-
-	/* Don't continually defer the hangcheck so that it is always run at
-	 * least once after work has been scheduled on any ring. Otherwise,
-	 * we will ignore a hung ring if a second ring is kept busy.
-	 */
-
-	delay = round_jiffies_up_relative(DRM_I915_HANGCHECK_JIFFIES);
-	queue_delayed_work(system_long_wq,
-			   &dev_priv->gpu_error.hangcheck_work, delay);
-}
-
 static inline bool intel_gvt_active(struct drm_i915_private *dev_priv)
 {
 	return dev_priv->gvt;
@@ -2512,30 +2494,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 
 int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 
-static inline bool __i915_wedged(struct i915_gpu_error *error)
-{
-	return unlikely(test_bit(I915_WEDGED, &error->flags));
-}
-
-static inline bool i915_reset_failed(struct drm_i915_private *i915)
-{
-	return __i915_wedged(&i915->gpu_error);
-}
-
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
-	return READ_ONCE(error->reset_count);
+	return atomic_read(&error->reset_count);
 }
 
 static inline u32 i915_reset_engine_count(struct i915_gpu_error *error,
 					  struct intel_engine_cs *engine)
 {
-	return READ_ONCE(error->reset_engine_count[engine->id]);
+	return atomic_read(&error->reset_engine_count[engine->uabi_class]);
 }
 
-void i915_gem_set_wedged(struct drm_i915_private *dev_priv);
-bool i915_gem_unset_wedged(struct drm_i915_private *dev_priv);
-
 void i915_gem_init_mmio(struct drm_i915_private *i915);
 int __must_check i915_gem_init(struct drm_i915_private *dev_priv);
 int __must_check i915_gem_init_hw(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b6f3baa74da4..cec3cea7d86f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -894,13 +894,13 @@ void i915_gem_runtime_suspend(struct drm_i915_private *i915)
 	}
 }
 
-static int wait_for_engines(struct drm_i915_private *i915)
+static int wait_for_engines(struct intel_gt *gt)
 {
-	if (wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT)) {
-		dev_err(i915->drm.dev,
+	if (wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT)) {
+		dev_err(gt->i915->drm.dev,
 			"Failed to idle engines, declaring wedged!\n");
 		GEM_TRACE_DUMP();
-		i915_gem_set_wedged(i915);
+		intel_gt_set_wedged(gt);
 		return -EIO;
 	}
 
@@ -971,7 +971,7 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 
 		lockdep_assert_held(&i915->drm.struct_mutex);
 
-		err = wait_for_engines(i915);
+		err = wait_for_engines(&i915->gt);
 		if (err)
 			return err;
 
@@ -1149,8 +1149,8 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 * back to defaults, recovering from whatever wedged state we left it
 	 * in and so worth trying to use the device once more.
 	 */
-	if (i915_terminally_wedged(i915))
-		i915_gem_unset_wedged(i915);
+	if (intel_gt_is_wedged(&i915->gt))
+		intel_gt_unset_wedged(&i915->gt);
 
 	/*
 	 * If we inherit context state from the BIOS or earlier occupants
@@ -1202,7 +1202,7 @@ int i915_gem_init_hw(struct drm_i915_private *i915)
 	int ret;
 
 	BUG_ON(!i915->kernel_context);
-	ret = i915_terminally_wedged(i915);
+	ret = intel_gt_terminally_wedged(gt);
 	if (ret)
 		return ret;
 
@@ -1384,7 +1384,7 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 	 * and ready to be torn-down. The quickest way we can accomplish
 	 * this is by declaring ourselves wedged.
 	 */
-	i915_gem_set_wedged(i915);
+	intel_gt_set_wedged(&i915->gt);
 	goto out_ctx;
 }
 
@@ -1539,7 +1539,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 err_gt:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
-	i915_gem_set_wedged(dev_priv);
+	intel_gt_set_wedged(&dev_priv->gt);
 	i915_gem_suspend(dev_priv);
 	i915_gem_suspend_late(dev_priv);
 
@@ -1581,10 +1581,10 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
-		if (!i915_reset_failed(dev_priv)) {
+		if (!intel_gt_is_wedged(&dev_priv->gt)) {
 			i915_load_error(dev_priv,
 					"Failed to initialize GPU, declaring it wedged!\n");
-			i915_gem_set_wedged(dev_priv);
+			intel_gt_set_wedged(&dev_priv->gt);
 		}
 
 		/* Minimal basic recovery for KMS */
@@ -1666,11 +1666,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	i915_gem_init__mm(dev_priv);
 	i915_gem_init__pm(dev_priv);
 
-	init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
-	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
-	mutex_init(&dev_priv->gpu_error.wedge_mutex);
-	init_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
-
 	atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
 
 	spin_lock_init(&dev_priv->fb_tracking.lock);
@@ -1689,7 +1684,7 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
 	WARN_ON(dev_priv->mm.shrink_count);
 
-	cleanup_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
+	intel_gt_cleanup_early(&dev_priv->gt);
 
 	i915_gemfs_fini(dev_priv);
 }
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 2ecd0c6a1c94..7dfbfda48733 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -7,6 +7,7 @@
 #ifndef _I915_GPU_ERROR_H_
 #define _I915_GPU_ERROR_H_
 
+#include <linux/atomic.h>
 #include <linux/kref.h>
 #include <linux/ktime.h>
 #include <linux/sched.h>
@@ -180,12 +181,6 @@ struct i915_gpu_state {
 };
 
 struct i915_gpu_error {
-	/* For hangcheck timer */
-#define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
-#define DRM_I915_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_I915_HANGCHECK_PERIOD)
-
-	struct delayed_work hangcheck_work;
-
 	/* For reset and error_state handling. */
 	spinlock_t lock;
 	/* Protected by the above dev->gpu_error.lock. */
@@ -193,52 +188,11 @@ struct i915_gpu_error {
 
 	atomic_t pending_fb_pin;
 
-	/**
-	 * flags: Control various stages of the GPU reset
-	 *
-	 * #I915_RESET_BACKOFF - When we start a global reset, we need to
-	 * serialise with any other users attempting to do the same, and
-	 * any global resources that may be clobber by the reset (such as
-	 * FENCE registers).
-	 *
-	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
-	 * acquire the struct_mutex to reset an engine, we need an explicit
-	 * flag to prevent two concurrent reset attempts in the same engine.
-	 * As the number of engines continues to grow, allocate the flags from
-	 * the most significant bits.
-	 *
-	 * #I915_WEDGED - If reset fails and we can no longer use the GPU,
-	 * we set the #I915_WEDGED bit. Prior to command submission, e.g.
-	 * i915_request_alloc(), this bit is checked and the sequence
-	 * aborted (with -EIO reported to userspace) if set.
-	 */
-	unsigned long flags;
-#define I915_RESET_BACKOFF	0
-#define I915_RESET_MODESET	1
-#define I915_RESET_ENGINE	2
-#define I915_WEDGED		(BITS_PER_LONG - 1)
-
 	/** Number of times the device has been reset (global) */
-	u32 reset_count;
+	atomic_t reset_count;
 
 	/** Number of times an engine has been reset */
-	u32 reset_engine_count[I915_NUM_ENGINES];
-
-	struct mutex wedge_mutex; /* serialises wedging/unwedging */
-
-	/**
-	 * Waitqueue to signal when a hang is detected. Used to for waiters
-	 * to release the struct_mutex for the reset to procede.
-	 */
-	wait_queue_head_t wait_queue;
-
-	/**
-	 * Waitqueue to signal when the reset has completed. Used by clients
-	 * that wait for dev_priv->mm.wedged to settle.
-	 */
-	wait_queue_head_t reset_queue;
-
-	struct srcu_struct reset_backoff_srcu;
+	atomic_t reset_engine_count[I915_NUM_ENGINES];
 };
 
 struct drm_i915_error_state_buf {
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2aabed8ed594..d98f3d24ef3c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1369,8 +1369,7 @@ long i915_request_wait(struct i915_request *rq,
 	 * serialise wait/reset with an explicit lock, we do want
 	 * lockdep to detect potential dependency cycles.
 	 */
-	mutex_acquire(&rq->i915->gpu_error.wedge_mutex.dep_map,
-		      0, 0, _THIS_IP_);
+	mutex_acquire(&rq->engine->gt->reset.mutex.dep_map, 0, 0, _THIS_IP_);
 
 	/*
 	 * Optimistic spin before touching IRQs.
@@ -1448,7 +1447,7 @@ long i915_request_wait(struct i915_request *rq,
 	dma_fence_remove_callback(&rq->fence, &wait.cb);
 
 out:
-	mutex_release(&rq->i915->gpu_error.wedge_mutex.dep_map, 0, _THIS_IP_);
+	mutex_release(&rq->engine->gt->reset.mutex.dep_map, 0, _THIS_IP_);
 	trace_i915_request_wait_end(rq);
 	return timeout;
 }
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 12c22359fdac..ae29f73d2a0f 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -876,7 +876,7 @@ static void guc_reset(struct intel_engine_cs *engine, bool stalled)
 	if (!i915_request_started(rq))
 		stalled = false;
 
-	i915_reset_request(rq, stalled);
+	__i915_request_reset(rq, stalled);
 	intel_lr_context_reset(engine, rq->hw_context, rq->head, stalled);
 
 out_unlock:
diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c
index fdf00f1ebb57..b0e83e91ea6f 100644
--- a/drivers/gpu/drm/i915/intel_uc.c
+++ b/drivers/gpu/drm/i915/intel_uc.c
@@ -38,7 +38,7 @@ static int __intel_uc_reset_hw(struct drm_i915_private *dev_priv)
 	int ret;
 	u32 guc_status;
 
-	ret = intel_reset_guc(dev_priv);
+	ret = intel_reset_guc(&dev_priv->gt);
 	if (ret) {
 		DRM_ERROR("Failed to reset GuC, ret = %d\n", ret);
 		return ret;
diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c
index 84fce379c0de..e5cd5d47e380 100644
--- a/drivers/gpu/drm/i915/selftests/i915_active.c
+++ b/drivers/gpu/drm/i915/selftests/i915_active.c
@@ -7,6 +7,7 @@
 #include <linux/kref.h>
 
 #include "gem/i915_gem_pm.h"
+#include "gt/intel_gt.h"
 
 #include "i915_selftest.h"
 
@@ -221,7 +222,7 @@ int i915_active_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_active_retire),
 	};
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
index b8ffae481730..bb6dd54a6ff3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -8,6 +8,7 @@
 
 #include "gem/selftests/igt_gem_utils.h"
 #include "gem/selftests/mock_context.h"
+#include "gt/intel_gt.h"
 
 #include "i915_selftest.h"
 
@@ -206,7 +207,7 @@ int i915_gem_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_gem_hibernate),
 	};
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	return i915_live_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index a3cb0aade6f1..b6449d0a8c17 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -25,6 +25,7 @@
 #include "gem/i915_gem_pm.h"
 #include "gem/selftests/igt_gem_utils.h"
 #include "gem/selftests/mock_context.h"
+#include "gt/intel_gt.h"
 
 #include "i915_selftest.h"
 
@@ -557,7 +558,7 @@ int i915_gem_evict_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(igt_evict_contexts),
 	};
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 1bbfc43d4a9e..86c299663934 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -380,7 +380,7 @@ static int __igt_breadcrumbs_smoketest(void *arg)
 			       t->engine->name);
 			GEM_TRACE_DUMP();
 
-			i915_gem_set_wedged(t->engine->i915);
+			intel_gt_set_wedged(t->engine->gt);
 			GEM_BUG_ON(!i915_request_completed(rq));
 			i915_sw_fence_wait(wait);
 			err = -EIO;
@@ -1234,7 +1234,7 @@ int i915_request_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_breadcrumbs_smoketest),
 	};
 
-	if (i915_terminally_wedged(i915))
+	if (intel_gt_is_wedged(&i915->gt))
 		return 0;
 
 	return i915_subtests(tests, i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_selftest.c b/drivers/gpu/drm/i915/selftests/i915_selftest.c
index f46ccf817ad5..5b87ff73b49a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_selftest.c
+++ b/drivers/gpu/drm/i915/selftests/i915_selftest.c
@@ -256,7 +256,7 @@ int __i915_live_setup(void *data)
 {
 	struct drm_i915_private *i915 = data;
 
-	return i915_terminally_wedged(i915);
+	return intel_gt_terminally_wedged(&i915->gt);
 }
 
 int __i915_live_teardown(int err, void *data)
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index 5bfd1b2626a2..d3b5eb402d33 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -5,6 +5,7 @@
  */
 
 #include "gem/i915_gem_context.h"
+#include "gt/intel_gt.h"
 
 #include "i915_drv.h"
 #include "i915_selftest.h"
@@ -13,7 +14,7 @@
 
 int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 {
-	int ret = i915_terminally_wedged(i915) ? -EIO : 0;
+	int ret = intel_gt_is_wedged(&i915->gt) ? -EIO : 0;
 	int repeat = !!(flags & I915_WAIT_LOCKED);
 
 	cond_resched();
@@ -27,7 +28,7 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 				  __builtin_return_address(0));
 			GEM_TRACE_DUMP();
 
-			i915_gem_set_wedged(i915);
+			intel_gt_set_wedged(&i915->gt);
 			repeat = 0;
 			ret = -EIO;
 		}
diff --git a/drivers/gpu/drm/i915/selftests/igt_reset.c b/drivers/gpu/drm/i915/selftests/igt_reset.c
index 587df6fd4ffe..7ec8f8b049c6 100644
--- a/drivers/gpu/drm/i915/selftests/igt_reset.c
+++ b/drivers/gpu/drm/i915/selftests/igt_reset.c
@@ -7,47 +7,45 @@
 #include "igt_reset.h"
 
 #include "gt/intel_engine.h"
+#include "gt/intel_gt.h"
 
 #include "../i915_drv.h"
 
-void igt_global_reset_lock(struct drm_i915_private *i915)
+void igt_global_reset_lock(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	pr_debug("%s: current gpu_error=%08lx\n",
-		 __func__, i915->gpu_error.flags);
+	pr_debug("%s: current gpu_error=%08lx\n", __func__, gt->reset.flags);
 
-	while (test_and_set_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags))
-		wait_event(i915->gpu_error.reset_queue,
-			   !test_bit(I915_RESET_BACKOFF,
-				     &i915->gpu_error.flags));
+	while (test_and_set_bit(I915_RESET_BACKOFF, &gt->reset.flags))
+		wait_event(gt->reset.queue,
+			   !test_bit(I915_RESET_BACKOFF, &gt->reset.flags));
 
-	for_each_engine(engine, i915, id) {
+	for_each_engine(engine, gt->i915, id) {
 		while (test_and_set_bit(I915_RESET_ENGINE + id,
-					&i915->gpu_error.flags))
-			wait_on_bit(&i915->gpu_error.flags,
-				    I915_RESET_ENGINE + id,
+					&gt->reset.flags))
+			wait_on_bit(&gt->reset.flags, I915_RESET_ENGINE + id,
 				    TASK_UNINTERRUPTIBLE);
 	}
 }
 
-void igt_global_reset_unlock(struct drm_i915_private *i915)
+void igt_global_reset_unlock(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	for_each_engine(engine, i915, id)
-		clear_bit(I915_RESET_ENGINE + id, &i915->gpu_error.flags);
+	for_each_engine(engine, gt->i915, id)
+		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
 
-	clear_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags);
-	wake_up_all(&i915->gpu_error.reset_queue);
+	clear_bit(I915_RESET_BACKOFF, &gt->reset.flags);
+	wake_up_all(&gt->reset.queue);
 }
 
-bool igt_force_reset(struct drm_i915_private *i915)
+bool igt_force_reset(struct intel_gt *gt)
 {
-	i915_gem_set_wedged(i915);
-	i915_reset(i915, 0, NULL);
+	intel_gt_set_wedged(gt);
+	intel_gt_reset(gt, 0, NULL);
 
-	return !i915_reset_failed(i915);
+	return !intel_gt_is_wedged(gt);
 }
diff --git a/drivers/gpu/drm/i915/selftests/igt_reset.h b/drivers/gpu/drm/i915/selftests/igt_reset.h
index 363bd853e50f..851873b67ab3 100644
--- a/drivers/gpu/drm/i915/selftests/igt_reset.h
+++ b/drivers/gpu/drm/i915/selftests/igt_reset.h
@@ -7,10 +7,12 @@
 #ifndef __I915_SELFTESTS_IGT_RESET_H__
 #define __I915_SELFTESTS_IGT_RESET_H__
 
-#include "../i915_drv.h"
+#include <linux/types.h>
 
-void igt_global_reset_lock(struct drm_i915_private *i915);
-void igt_global_reset_unlock(struct drm_i915_private *i915);
-bool igt_force_reset(struct drm_i915_private *i915);
+struct intel_gt;
+
+void igt_global_reset_lock(struct intel_gt *gt);
+void igt_global_reset_unlock(struct intel_gt *gt);
+bool igt_force_reset(struct intel_gt *gt);
 
 #endif
diff --git a/drivers/gpu/drm/i915/selftests/igt_wedge_me.h b/drivers/gpu/drm/i915/selftests/igt_wedge_me.h
deleted file mode 100644
index 08e5ff11bbd9..000000000000
--- a/drivers/gpu/drm/i915/selftests/igt_wedge_me.h
+++ /dev/null
@@ -1,58 +0,0 @@
-/*
- * SPDX-License-Identifier: MIT
- *
- * Copyright © 2018 Intel Corporation
- */
-
-#ifndef IGT_WEDGE_ME_H
-#define IGT_WEDGE_ME_H
-
-#include <linux/workqueue.h>
-
-#include "../i915_gem.h"
-
-struct drm_i915_private;
-
-struct igt_wedge_me {
-	struct delayed_work work;
-	struct drm_i915_private *i915;
-	const char *name;
-};
-
-static void __igt_wedge_me(struct work_struct *work)
-{
-	struct igt_wedge_me *w = container_of(work, typeof(*w), work.work);
-
-	pr_err("%s timed out, cancelling test.\n", w->name);
-
-	GEM_TRACE("%s timed out.\n", w->name);
-	GEM_TRACE_DUMP();
-
-	i915_gem_set_wedged(w->i915);
-}
-
-static void __igt_init_wedge(struct igt_wedge_me *w,
-			     struct drm_i915_private *i915,
-			     long timeout,
-			     const char *name)
-{
-	w->i915 = i915;
-	w->name = name;
-
-	INIT_DELAYED_WORK_ONSTACK(&w->work, __igt_wedge_me);
-	schedule_delayed_work(&w->work, timeout);
-}
-
-static void __igt_fini_wedge(struct igt_wedge_me *w)
-{
-	cancel_delayed_work_sync(&w->work);
-	destroy_delayed_work_on_stack(&w->work);
-	w->i915 = NULL;
-}
-
-#define igt_wedge_on_timeout(W, DEV, TIMEOUT)				\
-	for (__igt_init_wedge((W), (DEV), (TIMEOUT), __func__);		\
-	     (W)->i915;							\
-	     __igt_fini_wedge((W)))
-
-#endif /* IGT_WEDGE_ME_H */
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 2741805b56c2..fd4cc4809eb8 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -183,11 +183,6 @@ struct drm_i915_private *mock_gem_device(void)
 	intel_gt_init_early(&i915->gt, i915);
 	atomic_inc(&i915->gt.wakeref.count); /* disable; no hw support */
 
-	init_waitqueue_head(&i915->gpu_error.wait_queue);
-	init_waitqueue_head(&i915->gpu_error.reset_queue);
-	init_srcu_struct(&i915->gpu_error.reset_backoff_srcu);
-	mutex_init(&i915->gpu_error.wedge_mutex);
-
 	i915->wq = alloc_ordered_workqueue("mock", 0);
 	if (!i915->wq)
 		goto err_drv;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Order assert forcewake test
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (6 preceding siblings ...)
  2019-07-05  7:46 ` [PATCH 8/8] drm/i915/gt: Use intel_gt as the primary object for handling resets Chris Wilson
@ 2019-07-05  9:53 ` Patchwork
  2019-07-05  9:57 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2019-07-05  9:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Order assert forcewake test
URL   : https://patchwork.freedesktop.org/series/63256/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
1601f21b3929 drm/i915: Order assert forcewake test
477c5b107129 drm/i915: Teach execbuffer to take the engine wakeref not GT
6a8be28690b9 drm/i915/gt: Track timeline activeness in enter/exit
396a6d039cdc drm/i915/gt: Convert timeline tracking to spinlock
68c0019139cc drm/i915/gt: Guard timeline pinning with its own mutex
1c1420202474 drm/i915: Protect request retirement with timeline->mutex
9b040a14b98f drm/i915: Replace struct_mutex for batch pool serialisation
-:305: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#305: 
new file mode 100644

-:310: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#310: FILE: drivers/gpu/drm/i915/gt/intel_engine_pool.c:1:
+/*

-:311: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#311: FILE: drivers/gpu/drm/i915/gt/intel_engine_pool.c:2:
+ * SPDX-License-Identifier: MIT

-:482: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#482: FILE: drivers/gpu/drm/i915/gt/intel_engine_pool.h:1:
+/*

-:483: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#483: FILE: drivers/gpu/drm/i915/gt/intel_engine_pool.h:2:
+ * SPDX-License-Identifier: MIT

-:522: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#522: FILE: drivers/gpu/drm/i915/gt/intel_engine_pool_types.h:1:
+/*

-:523: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#523: FILE: drivers/gpu/drm/i915/gt/intel_engine_pool_types.h:2:
+ * SPDX-License-Identifier: MIT

-:539: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#539: FILE: drivers/gpu/drm/i915/gt/intel_engine_pool_types.h:18:
+	spinlock_t lock;

total: 0 errors, 7 warnings, 1 checks, 595 lines checked
92549228548f drm/i915/gt: Use intel_gt as the primary object for handling resets
-:1822: WARNING:MEMORY_BARRIER: memory barrier without comment
#1822: FILE: drivers/gpu/drm/i915/gt/intel_reset.c:1269:
+	smp_mb__after_atomic();

-:2047: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'W' - possible side-effects?
#2047: FILE: drivers/gpu/drm/i915/gt/intel_reset.h:64:
+#define intel_wedge_on_timeout(W, GT, TIMEOUT)				\
+	for (__intel_init_wedge((W), (GT), (TIMEOUT), __func__);	\
+	     (W)->gt;							\
+	     __intel_fini_wedge((W)))

-:2062: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#2062: 
new file mode 100644

total: 0 errors, 2 warnings, 1 checks, 3276 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [1/8] drm/i915: Order assert forcewake test
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (7 preceding siblings ...)
  2019-07-05  9:53 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Order assert forcewake test Patchwork
@ 2019-07-05  9:57 ` Patchwork
  2019-07-05 10:16 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2019-07-05  9:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Order assert forcewake test
URL   : https://patchwork.freedesktop.org/series/63256/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Order assert forcewake test
Okay!

Commit: drm/i915: Teach execbuffer to take the engine wakeref not GT
Okay!

Commit: drm/i915/gt: Track timeline activeness in enter/exit
Okay!

Commit: drm/i915/gt: Convert timeline tracking to spinlock
Okay!

Commit: drm/i915/gt: Guard timeline pinning with its own mutex
Okay!

Commit: drm/i915: Protect request retirement with timeline->mutex
Okay!

Commit: drm/i915: Replace struct_mutex for batch pool serialisation
+./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915/gt: Use intel_gt as the primary object for handling resets
-O:drivers/gpu/drm/i915/gt/intel_reset.c:1292:5: warning: context imbalance in 'i915_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/gt/intel_reset.c:1276:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
-./drivers/gpu/drm/i915/gt/selftest_timeline.c:91:38: warning: expression using sizeof(void)
-./drivers/gpu/drm/i915/gt/selftest_timeline.c:91:38: warning: expression using sizeof(void)
-./drivers/gpu/drm/i915/gt/selftest_timeline.c:94:44: warning: expression using sizeof(void)
-./drivers/gpu/drm/i915/gt/selftest_timeline.c:94:44: warning: expression using sizeof(void)
+./drivers/gpu/drm/i915/gt/selftest_timeline.c:92:38: warning: expression using sizeof(void)
+./drivers/gpu/drm/i915/gt/selftest_timeline.c:92:38: warning: expression using sizeof(void)
+./drivers/gpu/drm/i915/gt/selftest_timeline.c:95:44: warning: expression using sizeof(void)
+./drivers/gpu/drm/i915/gt/selftest_timeline.c:95:44: warning: expression using sizeof(void)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [1/8] drm/i915: Order assert forcewake test
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (8 preceding siblings ...)
  2019-07-05  9:57 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-07-05 10:16 ` Patchwork
  2019-07-05 10:36 ` [PATCH 1/8] " Chris Wilson
  2019-07-06 12:54 ` ✓ Fi.CI.IGT: success for series starting with [1/8] " Patchwork
  11 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2019-07-05 10:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Order assert forcewake test
URL   : https://patchwork.freedesktop.org/series/63256/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_6423 -> Patchwork_13542
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/

Known issues
------------

  Here are the changes found in Patchwork_13542 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_create@basic-files:
    - fi-icl-dsi:         [PASS][1] -> [INCOMPLETE][2] ([fdo#107713] / [fdo#109100])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-icl-dsi/igt@gem_ctx_create@basic-files.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-icl-dsi/igt@gem_ctx_create@basic-files.html

  * igt@gem_exec_suspend@basic-s3:
    - fi-ilk-650:         [PASS][3] -> [DMESG-WARN][4] ([fdo#106387])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-ilk-650/igt@gem_exec_suspend@basic-s3.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-ilk-650/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_mmap_gtt@basic-copy:
    - fi-icl-u3:          [PASS][5] -> [DMESG-WARN][6] ([fdo#107724]) +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-icl-u3/igt@gem_mmap_gtt@basic-copy.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-icl-u3/igt@gem_mmap_gtt@basic-copy.html

  * igt@i915_pm_rpm@module-reload:
    - fi-skl-6600u:       [PASS][7] -> [INCOMPLETE][8] ([fdo#107807])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-skl-6600u/igt@i915_pm_rpm@module-reload.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-skl-6600u/igt@i915_pm_rpm@module-reload.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       [PASS][9] -> [FAIL][10] ([fdo#109485])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-icl-u2:          [PASS][11] -> [FAIL][12] ([fdo#103167])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-icl-u2/igt@kms_frontbuffer_tracking@basic.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-icl-u2/igt@kms_frontbuffer_tracking@basic.html

  
#### Possible fixes ####

  * {igt@gem_ctx_switch@rcs0}:
    - fi-icl-guc:         [INCOMPLETE][13] ([fdo#107713]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-icl-guc/igt@gem_ctx_switch@rcs0.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-icl-guc/igt@gem_ctx_switch@rcs0.html

  * igt@gem_exec_suspend@basic-s3:
    - fi-blb-e6850:       [INCOMPLETE][15] ([fdo#107718]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-blb-e6850/igt@gem_exec_suspend@basic-s3.html

  * igt@i915_selftest@live_contexts:
    - fi-skl-iommu:       [INCOMPLETE][17] ([fdo#111050]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-skl-iommu/igt@i915_selftest@live_contexts.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-skl-iommu/igt@i915_selftest@live_contexts.html

  * igt@i915_selftest@live_hangcheck:
    - fi-kbl-7500u:       [DMESG-WARN][19] ([fdo#111074]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-kbl-7500u/igt@i915_selftest@live_hangcheck.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-kbl-7500u/igt@i915_selftest@live_hangcheck.html

  * igt@kms_busy@basic-flip-c:
    - fi-skl-6770hq:      [SKIP][21] ([fdo#109271] / [fdo#109278]) -> [PASS][22] +2 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-skl-6770hq/igt@kms_busy@basic-flip-c.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-skl-6770hq/igt@kms_busy@basic-flip-c.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7567u:       [FAIL][23] ([fdo#109485]) -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-kbl-7567u/igt@kms_chamelium@hdmi-hpd-fast.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-kbl-7567u/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_flip@basic-flip-vs-dpms:
    - fi-skl-6770hq:      [SKIP][25] ([fdo#109271]) -> [PASS][26] +23 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/fi-skl-6770hq/igt@kms_flip@basic-flip-vs-dpms.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/fi-skl-6770hq/igt@kms_flip@basic-flip-vs-dpms.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#106387]: https://bugs.freedesktop.org/show_bug.cgi?id=106387
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#107807]: https://bugs.freedesktop.org/show_bug.cgi?id=107807
  [fdo#109100]: https://bugs.freedesktop.org/show_bug.cgi?id=109100
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109485]: https://bugs.freedesktop.org/show_bug.cgi?id=109485
  [fdo#111050]: https://bugs.freedesktop.org/show_bug.cgi?id=111050
  [fdo#111074]: https://bugs.freedesktop.org/show_bug.cgi?id=111074


Participating hosts (54 -> 46)
------------------------------

  Missing    (8): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-icl-y fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_6423 -> Patchwork_13542

  CI_DRM_6423: 25885394cd77f6efa2503c90f4a2da9a4bbf6185 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5086: f2985a03e223679c24ae085f247a430e55229e64 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_13542: 92549228548fecc31d48e8721f387e833968ee37 @ git://anongit.freedesktop.org/gfx-ci/linux


== Kernel 32bit build ==

Warning: Kernel 32bit buildtest failed:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/build_32bit.log

  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  CHK     include/generated/compile.h
Kernel: arch/x86/boot/bzImage is ready  (#1)
  Building modules, stage 2.
  MODPOST 112 modules
ERROR: "__udivdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: "__divdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
make[1]: *** [__modpost] Error 1
Makefile:1287: recipe for target 'modules' failed
make: *** [modules] Error 2


== Linux commits ==

92549228548f drm/i915/gt: Use intel_gt as the primary object for handling resets
9b040a14b98f drm/i915: Replace struct_mutex for batch pool serialisation
1c1420202474 drm/i915: Protect request retirement with timeline->mutex
68c0019139cc drm/i915/gt: Guard timeline pinning with its own mutex
396a6d039cdc drm/i915/gt: Convert timeline tracking to spinlock
6a8be28690b9 drm/i915/gt: Track timeline activeness in enter/exit
477c5b107129 drm/i915: Teach execbuffer to take the engine wakeref not GT
1601f21b3929 drm/i915: Order assert forcewake test

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] drm/i915: Order assert forcewake test
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (9 preceding siblings ...)
  2019-07-05 10:16 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-07-05 10:36 ` Chris Wilson
  2019-07-06 12:54 ` ✓ Fi.CI.IGT: success for series starting with [1/8] " Patchwork
  11 siblings, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-05 10:36 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2019-07-05 08:45:57)
> Read the current value before computing the expected to ensure that if
> the timer does complete early (against our will), it should not cause a
> false positive.
> 
> v2: The local irq disable did not prevent the timer from running.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

irc r-b,
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [1/8] drm/i915: Order assert forcewake test
  2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
                   ` (10 preceding siblings ...)
  2019-07-05 10:36 ` [PATCH 1/8] " Chris Wilson
@ 2019-07-06 12:54 ` Patchwork
  11 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2019-07-06 12:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Order assert forcewake test
URL   : https://patchwork.freedesktop.org/series/63256/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_6423_full -> Patchwork_13542_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_13542_full:

### Piglit changes ###

#### Possible regressions ####

  * spec@glsl-1.30@execution@tex-miplevel-selection texturelod 2dshadow (NEW):
    - {pig-snb-2600}:     NOTRUN -> [FAIL][1]
   [1]: None

  
New tests
---------

  New tests have been introduced between CI_DRM_6423_full and Patchwork_13542_full:

### New Piglit tests (1) ###

  * spec@glsl-1.30@execution@tex-miplevel-selection texturelod 2dshadow:
    - Statuses : 1 fail(s)
    - Exec time: [8.54] s

  

Known issues
------------

  Here are the changes found in Patchwork_13542_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_eio@reset-stress:
    - shard-snb:          [PASS][2] -> [FAIL][3] ([fdo#109661])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-snb1/igt@gem_eio@reset-stress.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-snb7/igt@gem_eio@reset-stress.html

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [PASS][4] -> [SKIP][5] ([fdo#110854])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-iclb4/igt@gem_exec_balancer@smoke.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-iclb8/igt@gem_exec_balancer@smoke.html

  * igt@gem_tiled_swapping@non-threaded:
    - shard-apl:          [PASS][6] -> [DMESG-WARN][7] ([fdo#108686])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-apl3/igt@gem_tiled_swapping@non-threaded.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-apl4/igt@gem_tiled_swapping@non-threaded.html

  * igt@i915_pm_rpm@modeset-stress-extra-wait:
    - shard-iclb:         [PASS][8] -> [INCOMPLETE][9] ([fdo#107713] / [fdo#108840])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-iclb2/igt@i915_pm_rpm@modeset-stress-extra-wait.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-iclb7/igt@i915_pm_rpm@modeset-stress-extra-wait.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - shard-skl:          [PASS][10] -> [INCOMPLETE][11] ([fdo#104108])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-skl5/igt@i915_suspend@fence-restore-tiled2untiled.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-skl7/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@kms_cursor_crc@pipe-c-cursor-64x64-sliding:
    - shard-apl:          [PASS][12] -> [INCOMPLETE][13] ([fdo#103927])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-apl3/igt@kms_cursor_crc@pipe-c-cursor-64x64-sliding.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-apl8/igt@kms_cursor_crc@pipe-c-cursor-64x64-sliding.html

  * igt@kms_cursor_crc@pipe-c-cursor-suspend:
    - shard-apl:          [PASS][14] -> [DMESG-WARN][15] ([fdo#108566]) +4 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-apl2/igt@kms_cursor_crc@pipe-c-cursor-suspend.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-apl6/igt@kms_cursor_crc@pipe-c-cursor-suspend.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-pri-indfb-multidraw:
    - shard-iclb:         [PASS][16] -> [FAIL][17] ([fdo#103167]) +3 similar issues
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-iclb4/igt@kms_frontbuffer_tracking@fbcpsr-1p-pri-indfb-multidraw.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-iclb8/igt@kms_frontbuffer_tracking@fbcpsr-1p-pri-indfb-multidraw.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
    - shard-iclb:         [PASS][18] -> [INCOMPLETE][19] ([fdo#107713])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-iclb2/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-iclb7/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html

  * igt@tools_test@tools_test:
    - shard-skl:          [PASS][20] -> [SKIP][21] ([fdo#109271])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-skl9/igt@tools_test@tools_test.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-skl7/igt@tools_test@tools_test.html

  
#### Possible fixes ####

  * igt@kms_cursor_crc@pipe-c-cursor-64x21-sliding:
    - shard-hsw:          [SKIP][22] ([fdo#109271]) -> [PASS][23]
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-hsw4/igt@kms_cursor_crc@pipe-c-cursor-64x21-sliding.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-hsw4/igt@kms_cursor_crc@pipe-c-cursor-64x21-sliding.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-glk:          [FAIL][24] ([fdo#105363]) -> [PASS][25] +1 similar issue
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-glk1/igt@kms_flip@flip-vs-expired-vblank-interruptible.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-glk6/igt@kms_flip@flip-vs-expired-vblank-interruptible.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
    - shard-iclb:         [FAIL][26] ([fdo#103167]) -> [PASS][27] +2 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-iclb5/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
    - shard-apl:          [DMESG-WARN][28] ([fdo#108566]) -> [PASS][29] +4 similar issues
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-apl8/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-apl6/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [FAIL][30] ([fdo#108145] / [fdo#110403]) -> [PASS][31]
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-skl7/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-skl9/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_lowres@pipe-a-tiling-x:
    - shard-iclb:         [FAIL][32] ([fdo#103166]) -> [PASS][33] +1 similar issue
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-iclb4/igt@kms_plane_lowres@pipe-a-tiling-x.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-iclb2/igt@kms_plane_lowres@pipe-a-tiling-x.html

  * igt@kms_psr@psr2_cursor_render:
    - shard-iclb:         [SKIP][34] ([fdo#109441]) -> [PASS][35] +5 similar issues
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-iclb5/igt@kms_psr@psr2_cursor_render.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-iclb2/igt@kms_psr@psr2_cursor_render.html

  * igt@kms_setmode@basic:
    - shard-apl:          [FAIL][36] ([fdo#99912]) -> [PASS][37]
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-apl5/igt@kms_setmode@basic.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-apl4/igt@kms_setmode@basic.html
    - shard-kbl:          [FAIL][38] ([fdo#99912]) -> [PASS][39]
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-kbl3/igt@kms_setmode@basic.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-kbl2/igt@kms_setmode@basic.html

  * igt@perf@blocking:
    - shard-skl:          [FAIL][40] ([fdo#110728]) -> [PASS][41]
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-skl8/igt@perf@blocking.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-skl8/igt@perf@blocking.html

  * igt@perf_pmu@idle-vcs0:
    - shard-hsw:          [TIMEOUT][42] -> [PASS][43]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6423/shard-hsw4/igt@perf_pmu@idle-vcs0.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/shard-hsw4/igt@perf_pmu@idle-vcs0.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#108686]: https://bugs.freedesktop.org/show_bug.cgi?id=108686
  [fdo#108840]: https://bugs.freedesktop.org/show_bug.cgi?id=108840
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109661]: https://bugs.freedesktop.org/show_bug.cgi?id=109661
  [fdo#110403]: https://bugs.freedesktop.org/show_bug.cgi?id=110403
  [fdo#110728]: https://bugs.freedesktop.org/show_bug.cgi?id=110728
  [fdo#110854]: https://bugs.freedesktop.org/show_bug.cgi?id=110854
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912


Participating hosts (10 -> 11)
------------------------------

  Additional (1): pig-snb-2600 


Build changes
-------------

  * Linux: CI_DRM_6423 -> Patchwork_13542

  CI_DRM_6423: 25885394cd77f6efa2503c90f4a2da9a4bbf6185 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5086: f2985a03e223679c24ae085f247a430e55229e64 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_13542: 92549228548fecc31d48e8721f387e833968ee37 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13542/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 8/8] drm/i915/gt: Use intel_gt as the primary object for handling resets
  2019-07-05  7:46 ` [PATCH 8/8] drm/i915/gt: Use intel_gt as the primary object for handling resets Chris Wilson
@ 2019-07-08 22:48   ` Daniele Ceraolo Spurio
  2019-07-09  8:11     ` Chris Wilson
  2019-07-09  8:15     ` Chris Wilson
  0 siblings, 2 replies; 16+ messages in thread
From: Daniele Ceraolo Spurio @ 2019-07-08 22:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 7/5/19 12:46 AM, Chris Wilson wrote:
> Having taken the first step in encapsulating the functionality by moving
> the related files under gt/, the next step is to start encapsulating by
> passing around the relevant structs rather than the global
> drm_i915_private. In this step, we pass intel_gt to intel_reset.c
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/display/intel_display.c  |  21 +-
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   8 +-
>   drivers/gpu/drm/i915/gem/i915_gem_pm.c        |  25 +-
>   drivers/gpu/drm/i915/gem/i915_gem_throttle.c  |   2 +-
>   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  20 +-
>   .../i915/gem/selftests/i915_gem_client_blt.c  |   4 +-
>   .../i915/gem/selftests/i915_gem_coherency.c   |   6 +-
>   .../drm/i915/gem/selftests/i915_gem_context.c |  17 +-
>   .../drm/i915/gem/selftests/i915_gem_mman.c    |   2 +-
>   .../i915/gem/selftests/i915_gem_object_blt.c  |   4 +-
>   drivers/gpu/drm/i915/gt/intel_engine.h        |   8 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  16 +-
>   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   3 +-
>   drivers/gpu/drm/i915/gt/intel_gt.c            |   7 +
>   drivers/gpu/drm/i915/gt/intel_gt.h            |  12 +
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |  19 +-
>   drivers/gpu/drm/i915/gt/intel_gt_types.h      |  12 +
>   drivers/gpu/drm/i915/gt/intel_hangcheck.c     |  67 +--
>   drivers/gpu/drm/i915/gt/intel_lrc.c           |   2 +-
>   drivers/gpu/drm/i915/gt/intel_reset.c         | 435 +++++++++---------
>   drivers/gpu/drm/i915/gt/intel_reset.h         |  73 +--
>   drivers/gpu/drm/i915/gt/intel_reset_types.h   |  50 ++
>   drivers/gpu/drm/i915/gt/intel_ringbuffer.c    |   2 +-
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 107 +++--
>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  36 +-
>   drivers/gpu/drm/i915/gt/selftest_reset.c      |  50 +-
>   drivers/gpu/drm/i915/gt/selftest_timeline.c   |   3 +-
>   .../gpu/drm/i915/gt/selftest_workarounds.c    |  33 +-
>   drivers/gpu/drm/i915/i915_debugfs.c           |  63 +--
>   drivers/gpu/drm/i915/i915_drv.c               |   5 +-
>   drivers/gpu/drm/i915/i915_drv.h               |  35 +-
>   drivers/gpu/drm/i915/i915_gem.c               |  31 +-
>   drivers/gpu/drm/i915/i915_gpu_error.h         |  52 +--
>   drivers/gpu/drm/i915/i915_request.c           |   5 +-
>   drivers/gpu/drm/i915/intel_guc_submission.c   |   2 +-
>   drivers/gpu/drm/i915/intel_uc.c               |   2 +-
>   drivers/gpu/drm/i915/selftests/i915_active.c  |   3 +-
>   drivers/gpu/drm/i915/selftests/i915_gem.c     |   3 +-
>   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   3 +-
>   drivers/gpu/drm/i915/selftests/i915_request.c |   4 +-
>   .../gpu/drm/i915/selftests/i915_selftest.c    |   2 +-
>   .../gpu/drm/i915/selftests/igt_flush_test.c   |   5 +-
>   drivers/gpu/drm/i915/selftests/igt_reset.c    |  38 +-
>   drivers/gpu/drm/i915/selftests/igt_reset.h    |  10 +-
>   drivers/gpu/drm/i915/selftests/igt_wedge_me.h |  58 ---
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |   5 -
>   48 files changed, 665 insertions(+), 709 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/gt/intel_reset_types.h
>   delete mode 100644 drivers/gpu/drm/i915/selftests/igt_wedge_me.h
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> index 919f5ac844c8..4dd6856bf722 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -4249,12 +4249,12 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
>   		return;
>   
>   	/* We have a modeset vs reset deadlock, defensively unbreak it. */
> -	set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
> -	wake_up_all(&dev_priv->gpu_error.wait_queue);
> +	set_bit(I915_RESET_MODESET, &dev_priv->gt.reset.flags);
> +	wake_up_bit(&dev_priv->gt.reset.flags, I915_RESET_MODESET);

The doc for wake_up_bit says that we need a memory barrier before 
calling it. Do we have it implicitly somewhere or is it missing?

>   
>   	if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) {
>   		DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n");
> -		i915_gem_set_wedged(dev_priv);
> +		intel_gt_set_wedged(&dev_priv->gt);
>   	}
>   
>   	/*

<snip>

> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> index bf085b0cb7c6..8e2eeaec06cb 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c

> @@ -123,18 +124,18 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
>   			 * Forcibly cancel outstanding work and leave
>   			 * the gpu quiet.
>   			 */
> -			i915_gem_set_wedged(i915);
> +			intel_gt_set_wedged(gt);
>   			result = false;
>   		}
> -	} while (i915_retire_requests(i915) && result);
> +	} while (i915_retire_requests(gt->i915) && result);
>   
> -	GEM_BUG_ON(i915->gt.awake);
> +	GEM_BUG_ON(gt->awake);
>   	return result;
>   }
>   
>   bool i915_gem_load_power_context(struct drm_i915_private *i915)

This should probably move to intel_gt as well since it is the gt power 
context that we're loading.

>   {
> -	return switch_to_kernel_context_sync(i915);
> +	return switch_to_kernel_context_sync(&i915->gt);
>   }
>   
>   void i915_gem_suspend(struct drm_i915_private *i915)

<snip>

> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index 36ba80e6a0b7..9047e28e75ae 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -5,7 +5,9 @@
>    */
>   
>   #include "i915_drv.h"
> +#include "i915_params.h"
>   #include "intel_engine_pm.h"
> +#include "intel_gt.h"
>   #include "intel_gt_pm.h"
>   #include "intel_pm.h"
>   #include "intel_wakeref.h"
> @@ -17,6 +19,7 @@ static void pm_notify(struct drm_i915_private *i915, int state)
>   
>   static int intel_gt_unpark(struct intel_wakeref *wf)
>   {
> +	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
>   	struct drm_i915_private *i915 =
>   		container_of(wf, typeof(*i915), gt.wakeref);

gt->i915 ?

>   

<snip>

> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index f43ea830b1e8..70d723e801bf 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -14,12 +14,21 @@
>   #include <linux/types.h>
>   
>   #include "i915_vma.h"
> +#include "intel_reset_types.h"
>   #include "intel_wakeref.h"
>   
>   struct drm_i915_private;
>   struct i915_ggtt;
>   struct intel_uncore;
>   
> +struct intel_hangcheck {
> +	/* For hangcheck timer */
> +#define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
> +#define DRM_I915_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_I915_HANGCHECK_PERIOD)
> +
> +	struct delayed_work work;
> +};

should intel_hangcheck be part of intel_reset_types?

> +
>   struct intel_gt {
>   	struct drm_i915_private *i915;
>   	struct intel_uncore *uncore;
> @@ -39,6 +48,9 @@ struct intel_gt {
>   	struct list_head closed_vma;
>   	spinlock_t closed_lock; /* guards the list of closed_vma */
>   
> +	struct intel_hangcheck hangcheck;
> +	struct intel_reset reset;
> +
>   	/**
>   	 * Is the GPU currently considered idle, or busy executing
>   	 * userspace requests? Whilst idle, we allow runtime power

<snip>

> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 72002c0f9698..67c06231b004 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c

> @@ -1267,113 +1252,119 @@ void i915_handle_error(struct drm_i915_private *i915,
>   	synchronize_rcu_expedited();
>   
>   	/* Prevent any other reset-engine attempt. */
> -	for_each_engine(engine, i915, tmp) {
> +	for_each_engine(engine, gt->i915, tmp) {
>   		while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> -					&error->flags))
> -			wait_on_bit(&error->flags,
> +					&gt->reset.flags))
> +			wait_on_bit(&gt->reset.flags,
>   				    I915_RESET_ENGINE + engine->id,
>   				    TASK_UNINTERRUPTIBLE);
>   	}
>   
> -	i915_reset_device(i915, engine_mask, msg);
> +	intel_gt_reset_global(gt, engine_mask, msg);
>   
> -	for_each_engine(engine, i915, tmp) {
> -		clear_bit(I915_RESET_ENGINE + engine->id,
> -			  &error->flags);
> -	}
> -
> -	clear_bit(I915_RESET_BACKOFF, &error->flags);
> -	wake_up_all(&error->reset_queue);
> +	for_each_engine(engine, gt->i915, tmp)
> +		clear_bit_unlock(I915_RESET_ENGINE + engine->id,
> +				 &gt->reset.flags);
> +	clear_bit_unlock(I915_RESET_BACKOFF, &gt->reset.flags);
> +	smp_mb__after_atomic();

Why do we now need this barrier here?

> +	wake_up_all(&gt->reset.queue);
>   
>   out:
> -	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
> +	intel_runtime_pm_put(&gt->i915->runtime_pm, wakeref);
>   }
>   

<snip>

> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h
> index 03fba0ab3868..62f6cb520f96 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.h
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.h
> @@ -11,56 +11,67 @@
>   #include <linux/types.h>
>   #include <linux/srcu.h>
>   
> -#include "gt/intel_engine_types.h"
> +#include "intel_engine_types.h"
> +#include "intel_reset_types.h"
>   
>   struct drm_i915_private;
>   struct i915_request;
>   struct intel_engine_cs;
> +struct intel_gt;
>   struct intel_guc;
>   
> +void intel_gt_init_reset(struct intel_gt *gt);
> +void intel_gt_fini_reset(struct intel_gt *gt);
> +
>   __printf(4, 5)
> -void i915_handle_error(struct drm_i915_private *i915,
> -		       intel_engine_mask_t engine_mask,
> -		       unsigned long flags,
> -		       const char *fmt, ...);
> +void intel_gt_handle_error(struct intel_gt *gt,
> +			   intel_engine_mask_t engine_mask,
> +			   unsigned long flags,
> +			   const char *fmt, ...);
>   #define I915_ERROR_CAPTURE BIT(0)
>   
> -void i915_reset(struct drm_i915_private *i915,
> -		intel_engine_mask_t stalled_mask,
> -		const char *reason);
> -int i915_reset_engine(struct intel_engine_cs *engine,
> -		      const char *reason);
> -
> -void i915_reset_request(struct i915_request *rq, bool guilty);
> +void intel_gt_reset(struct intel_gt *gt,
> +		    intel_engine_mask_t stalled_mask,
> +		    const char *reason);
> +int intel_engine_reset(struct intel_engine_cs *engine,
> +		       const char *reason);
>   
> -int __must_check i915_reset_trylock(struct drm_i915_private *i915);
> -void i915_reset_unlock(struct drm_i915_private *i915, int tag);
> +void __i915_request_reset(struct i915_request *rq, bool guilty);
>   
> -int i915_terminally_wedged(struct drm_i915_private *i915);
> +int __must_check intel_gt_reset_trylock(struct intel_gt *gt);
> +void intel_gt_reset_unlock(struct intel_gt *gt, int tag);
>   
> -bool intel_has_gpu_reset(struct drm_i915_private *i915);
> -bool intel_has_reset_engine(struct drm_i915_private *i915);
> +void intel_gt_set_wedged(struct intel_gt *gt);
> +bool intel_gt_unset_wedged(struct intel_gt *gt);
> +int intel_gt_terminally_wedged(struct intel_gt *gt);
>   
> -int intel_gpu_reset(struct drm_i915_private *i915,
> -		    intel_engine_mask_t engine_mask);
> +int intel_gpu_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask);

time to rename to intel_gt_reset?

>   
> -int intel_reset_guc(struct drm_i915_private *i915);
> +int intel_reset_guc(struct intel_gt *gt);
>   
> -struct i915_wedge_me {
> +struct intel_wedge_me {
>   	struct delayed_work work;
> -	struct drm_i915_private *i915;
> +	struct intel_gt *gt;
>   	const char *name;
>   };
>   
> -void __i915_init_wedge(struct i915_wedge_me *w,
> -		       struct drm_i915_private *i915,
> -		       long timeout,
> -		       const char *name);
> -void __i915_fini_wedge(struct i915_wedge_me *w);
> +void __intel_init_wedge(struct intel_wedge_me *w,
> +			struct intel_gt *gt,
> +			long timeout,
> +			const char *name);
> +void __intel_fini_wedge(struct intel_wedge_me *w);
>   
> -#define i915_wedge_on_timeout(W, DEV, TIMEOUT)				\
> -	for (__i915_init_wedge((W), (DEV), (TIMEOUT), __func__);	\
> -	     (W)->i915;							\
> -	     __i915_fini_wedge((W)))
> +#define intel_wedge_on_timeout(W, GT, TIMEOUT)				\
> +	for (__intel_init_wedge((W), (GT), (TIMEOUT), __func__);	\
> +	     (W)->gt;							\
> +	     __intel_fini_wedge((W)))
> +
> +static inline bool __intel_reset_failed(const struct intel_reset *reset)
> +{
> +	return unlikely(test_bit(I915_WEDGED, &reset->flags));
> +}
> +
> +bool intel_has_gpu_reset(struct drm_i915_private *i915);
> +bool intel_has_reset_engine(struct drm_i915_private *i915);
>   
>   #endif /* I915_RESET_H */

<snip>

> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 2d9cc3cd1f27..8caad19683a1 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c

> @@ -442,7 +441,7 @@ static int igt_reset_nop(void *arg)
>   
>   out:
>   	mock_file_free(i915, file);
> -	if (i915_reset_failed(i915))
> +	if (intel_gt_is_wedged(&i915->gt))

&i915->gt is used 5 times in igt_reset_nop(), might be worth having a 
local variable.

>   		err = -EIO;
>   	return err;
>   }

<snip>

> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ce1b6568515e..cf5155646927 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c

> @@ -1061,23 +1062,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   	return 0;
>   }
>   
> -static int i915_reset_info(struct seq_file *m, void *unused)

The fact that this is going away sould be mentioned in the commit message.

> -{
> -	struct drm_i915_private *dev_priv = node_to_i915(m->private);
> -	struct i915_gpu_error *error = &dev_priv->gpu_error;
> -	struct intel_engine_cs *engine;
> -	enum intel_engine_id id;
> -
> -	seq_printf(m, "full gpu reset = %u\n", i915_reset_count(error));
> -
> -	for_each_engine(engine, dev_priv, id) {
> -		seq_printf(m, "%s = %u\n", engine->name,
> -			   i915_reset_engine_count(error, engine));
> -	}
> -
> -	return 0;
> -}
> -
>   static int ironlake_drpc_info(struct seq_file *m)
>   {
>   	struct drm_i915_private *i915 = node_to_i915(m->private);

<snip>

> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b6f3baa74da4..cec3cea7d86f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -894,13 +894,13 @@ void i915_gem_runtime_suspend(struct drm_i915_private *i915)
>   	}
>   }
>   
> -static int wait_for_engines(struct drm_i915_private *i915)
> +static int wait_for_engines(struct intel_gt *gt)
>   {
> -	if (wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT)) {
> -		dev_err(i915->drm.dev,
> +	if (wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT)) {
> +		dev_err(gt->i915->drm.dev,
>   			"Failed to idle engines, declaring wedged!\n");
>   		GEM_TRACE_DUMP();
> -		i915_gem_set_wedged(i915);
> +		intel_gt_set_wedged(gt);
>   		return -EIO;
>   	}
>   
> @@ -971,7 +971,7 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,

Is it time for this to become intel_gt_wait_for_idle? (as a follow up)

>   
>   		lockdep_assert_held(&i915->drm.struct_mutex);
>   
> -		err = wait_for_engines(i915);
> +		err = wait_for_engines(&i915->gt);
>   		if (err)
>   			return err;
>   
> @@ -1149,8 +1149,8 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
>   	 * back to defaults, recovering from whatever wedged state we left it
>   	 * in and so worth trying to use the device once more.
>   	 */
> -	if (i915_terminally_wedged(i915))
> -		i915_gem_unset_wedged(i915);
> +	if (intel_gt_is_wedged(&i915->gt))
> +		intel_gt_unset_wedged(&i915->gt);

I was wondering if we should move this inside the intel_gt_sanitize just 
below, but then I realized that intel_gt_unset_wedged calls 
intel_gt_sanitize as well so that's a no (for now). However, it is IMO 
still worth to at least flip the i915_gem_sanitize function to work on 
intel_gt since al the calls inside, rpm aside, can use the gt.

>   
>   	/*
>   	 * If we inherit context state from the BIOS or earlier occupants

<snip>

> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 2ecd0c6a1c94..7dfbfda48733 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -7,6 +7,7 @@
>   #ifndef _I915_GPU_ERROR_H_
>   #define _I915_GPU_ERROR_H_
>   
> +#include <linux/atomic.h>
>   #include <linux/kref.h>
>   #include <linux/ktime.h>
>   #include <linux/sched.h>
> @@ -180,12 +181,6 @@ struct i915_gpu_state {
>   };
>   
>   struct i915_gpu_error {
> -	/* For hangcheck timer */
> -#define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
> -#define DRM_I915_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_I915_HANGCHECK_PERIOD)
> -
> -	struct delayed_work hangcheck_work;
> -
>   	/* For reset and error_state handling. */
>   	spinlock_t lock;
>   	/* Protected by the above dev->gpu_error.lock. */
> @@ -193,52 +188,11 @@ struct i915_gpu_error {
>   
>   	atomic_t pending_fb_pin;
>   
> -	/**
> -	 * flags: Control various stages of the GPU reset
> -	 *
> -	 * #I915_RESET_BACKOFF - When we start a global reset, we need to
> -	 * serialise with any other users attempting to do the same, and
> -	 * any global resources that may be clobber by the reset (such as
> -	 * FENCE registers).
> -	 *
> -	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
> -	 * acquire the struct_mutex to reset an engine, we need an explicit
> -	 * flag to prevent two concurrent reset attempts in the same engine.
> -	 * As the number of engines continues to grow, allocate the flags from
> -	 * the most significant bits.
> -	 *
> -	 * #I915_WEDGED - If reset fails and we can no longer use the GPU,
> -	 * we set the #I915_WEDGED bit. Prior to command submission, e.g.
> -	 * i915_request_alloc(), this bit is checked and the sequence
> -	 * aborted (with -EIO reported to userspace) if set.
> -	 */
> -	unsigned long flags;
> -#define I915_RESET_BACKOFF	0
> -#define I915_RESET_MODESET	1
> -#define I915_RESET_ENGINE	2
> -#define I915_WEDGED		(BITS_PER_LONG - 1)
> -
>   	/** Number of times the device has been reset (global) */
> -	u32 reset_count;
> +	atomic_t reset_count;
>   
>   	/** Number of times an engine has been reset */
> -	u32 reset_engine_count[I915_NUM_ENGINES];
> -
> -	struct mutex wedge_mutex; /* serialises wedging/unwedging */
> -
> -	/**
> -	 * Waitqueue to signal when a hang is detected. Used to for waiters
> -	 * to release the struct_mutex for the reset to procede.
> -	 */
> -	wait_queue_head_t wait_queue;
> -
> -	/**
> -	 * Waitqueue to signal when the reset has completed. Used by clients
> -	 * that wait for dev_priv->mm.wedged to settle.
> -	 */
> -	wait_queue_head_t reset_queue;
> -
> -	struct srcu_struct reset_backoff_srcu;
> +	atomic_t reset_engine_count[I915_NUM_ENGINES];

I'd argue that both reset counts should go inside the intel_reset 
structure under intel_gt since we're counting resets of the GT and its 
components.

>   };
>   
>   struct drm_i915_error_state_buf {

<snip>

> diff --git a/drivers/gpu/drm/i915/selftests/igt_wedge_me.h b/drivers/gpu/drm/i915/selftests/igt_wedge_me.h
> deleted file mode 100644
> index 08e5ff11bbd9..000000000000
> --- a/drivers/gpu/drm/i915/selftests/igt_wedge_me.h
> +++ /dev/null
> @@ -1,58 +0,0 @@
> -/*
> - * SPDX-License-Identifier: MIT
> - *
> - * Copyright © 2018 Intel Corporation
> - */
> -
> -#ifndef IGT_WEDGE_ME_H
> -#define IGT_WEDGE_ME_H
> -
> -#include <linux/workqueue.h>
> -
> -#include "../i915_gem.h"
> -
> -struct drm_i915_private;
> -
> -struct igt_wedge_me {
> -	struct delayed_work work;
> -	struct drm_i915_private *i915;
> -	const char *name;
> -};
> -
> -static void __igt_wedge_me(struct work_struct *work)
> -{
> -	struct igt_wedge_me *w = container_of(work, typeof(*w), work.work);
> -
> -	pr_err("%s timed out, cancelling test.\n", w->name);
> -
> -	GEM_TRACE("%s timed out.\n", w->name);
> -	GEM_TRACE_DUMP();

These dumps are not in intel_wedge_me, which replaced this in selftests. 
Should we add them there or are they not meaningful enough for us to care?

Daniele

> -
> -	i915_gem_set_wedged(w->i915);
> -}
> -
> -static void __igt_init_wedge(struct igt_wedge_me *w,
> -			     struct drm_i915_private *i915,
> -			     long timeout,
> -			     const char *name)
> -{
> -	w->i915 = i915;
> -	w->name = name;
> -
> -	INIT_DELAYED_WORK_ONSTACK(&w->work, __igt_wedge_me);
> -	schedule_delayed_work(&w->work, timeout);
> -}
> -
> -static void __igt_fini_wedge(struct igt_wedge_me *w)
> -{
> -	cancel_delayed_work_sync(&w->work);
> -	destroy_delayed_work_on_stack(&w->work);
> -	w->i915 = NULL;
> -}
> -
> -#define igt_wedge_on_timeout(W, DEV, TIMEOUT)				\
> -	for (__igt_init_wedge((W), (DEV), (TIMEOUT), __func__);		\
> -	     (W)->i915;							\
> -	     __igt_fini_wedge((W)))
> -
> -#endif /* IGT_WEDGE_ME_H */
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 2741805b56c2..fd4cc4809eb8 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -183,11 +183,6 @@ struct drm_i915_private *mock_gem_device(void)
>   	intel_gt_init_early(&i915->gt, i915);
>   	atomic_inc(&i915->gt.wakeref.count); /* disable; no hw support */
>   
> -	init_waitqueue_head(&i915->gpu_error.wait_queue);
> -	init_waitqueue_head(&i915->gpu_error.reset_queue);
> -	init_srcu_struct(&i915->gpu_error.reset_backoff_srcu);
> -	mutex_init(&i915->gpu_error.wedge_mutex);
> -
>   	i915->wq = alloc_ordered_workqueue("mock", 0);
>   	if (!i915->wq)
>   		goto err_drv;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 8/8] drm/i915/gt: Use intel_gt as the primary object for handling resets
  2019-07-08 22:48   ` Daniele Ceraolo Spurio
@ 2019-07-09  8:11     ` Chris Wilson
  2019-07-09  8:15     ` Chris Wilson
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-09  8:11 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx

Quoting Daniele Ceraolo Spurio (2019-07-08 23:48:51)
> 
> 
> On 7/5/19 12:46 AM, Chris Wilson wrote:
> > Having taken the first step in encapsulating the functionality by moving
> > the related files under gt/, the next step is to start encapsulating by
> > passing around the relevant structs rather than the global
> > drm_i915_private. In this step, we pass intel_gt to intel_reset.c
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > ---
> >   drivers/gpu/drm/i915/display/intel_display.c  |  21 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
> >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |   8 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_pm.c        |  25 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_throttle.c  |   2 +-
> >   .../gpu/drm/i915/gem/selftests/huge_pages.c   |  20 +-
> >   .../i915/gem/selftests/i915_gem_client_blt.c  |   4 +-
> >   .../i915/gem/selftests/i915_gem_coherency.c   |   6 +-
> >   .../drm/i915/gem/selftests/i915_gem_context.c |  17 +-
> >   .../drm/i915/gem/selftests/i915_gem_mman.c    |   2 +-
> >   .../i915/gem/selftests/i915_gem_object_blt.c  |   4 +-
> >   drivers/gpu/drm/i915/gt/intel_engine.h        |   8 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  16 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   3 +-
> >   drivers/gpu/drm/i915/gt/intel_gt.c            |   7 +
> >   drivers/gpu/drm/i915/gt/intel_gt.h            |  12 +
> >   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |  19 +-
> >   drivers/gpu/drm/i915/gt/intel_gt_types.h      |  12 +
> >   drivers/gpu/drm/i915/gt/intel_hangcheck.c     |  67 +--
> >   drivers/gpu/drm/i915/gt/intel_lrc.c           |   2 +-
> >   drivers/gpu/drm/i915/gt/intel_reset.c         | 435 +++++++++---------
> >   drivers/gpu/drm/i915/gt/intel_reset.h         |  73 +--
> >   drivers/gpu/drm/i915/gt/intel_reset_types.h   |  50 ++
> >   drivers/gpu/drm/i915/gt/intel_ringbuffer.c    |   2 +-
> >   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 107 +++--
> >   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  36 +-
> >   drivers/gpu/drm/i915/gt/selftest_reset.c      |  50 +-
> >   drivers/gpu/drm/i915/gt/selftest_timeline.c   |   3 +-
> >   .../gpu/drm/i915/gt/selftest_workarounds.c    |  33 +-
> >   drivers/gpu/drm/i915/i915_debugfs.c           |  63 +--
> >   drivers/gpu/drm/i915/i915_drv.c               |   5 +-
> >   drivers/gpu/drm/i915/i915_drv.h               |  35 +-
> >   drivers/gpu/drm/i915/i915_gem.c               |  31 +-
> >   drivers/gpu/drm/i915/i915_gpu_error.h         |  52 +--
> >   drivers/gpu/drm/i915/i915_request.c           |   5 +-
> >   drivers/gpu/drm/i915/intel_guc_submission.c   |   2 +-
> >   drivers/gpu/drm/i915/intel_uc.c               |   2 +-
> >   drivers/gpu/drm/i915/selftests/i915_active.c  |   3 +-
> >   drivers/gpu/drm/i915/selftests/i915_gem.c     |   3 +-
> >   .../gpu/drm/i915/selftests/i915_gem_evict.c   |   3 +-
> >   drivers/gpu/drm/i915/selftests/i915_request.c |   4 +-
> >   .../gpu/drm/i915/selftests/i915_selftest.c    |   2 +-
> >   .../gpu/drm/i915/selftests/igt_flush_test.c   |   5 +-
> >   drivers/gpu/drm/i915/selftests/igt_reset.c    |  38 +-
> >   drivers/gpu/drm/i915/selftests/igt_reset.h    |  10 +-
> >   drivers/gpu/drm/i915/selftests/igt_wedge_me.h |  58 ---
> >   .../gpu/drm/i915/selftests/mock_gem_device.c  |   5 -
> >   48 files changed, 665 insertions(+), 709 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/gt/intel_reset_types.h
> >   delete mode 100644 drivers/gpu/drm/i915/selftests/igt_wedge_me.h
> > 
> > diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
> > index 919f5ac844c8..4dd6856bf722 100644
> > --- a/drivers/gpu/drm/i915/display/intel_display.c
> > +++ b/drivers/gpu/drm/i915/display/intel_display.c
> > @@ -4249,12 +4249,12 @@ void intel_prepare_reset(struct drm_i915_private *dev_priv)
> >               return;
> >   
> >       /* We have a modeset vs reset deadlock, defensively unbreak it. */
> > -     set_bit(I915_RESET_MODESET, &dev_priv->gpu_error.flags);
> > -     wake_up_all(&dev_priv->gpu_error.wait_queue);
> > +     set_bit(I915_RESET_MODESET, &dev_priv->gt.reset.flags);
> > +     wake_up_bit(&dev_priv->gt.reset.flags, I915_RESET_MODESET);
> 
> The doc for wake_up_bit says that we need a memory barrier before 
> calling it. Do we have it implicitly somewhere or is it missing?

set_bit is a memory barrier on x86, but we may need a
smp_mb__after_atomic (no-op on x86) to be strictly correct.

> >       if (atomic_read(&dev_priv->gpu_error.pending_fb_pin)) {
> >               DRM_DEBUG_KMS("Modeset potentially stuck, unbreaking through wedging\n");
> > -             i915_gem_set_wedged(dev_priv);
> > +             intel_gt_set_wedged(&dev_priv->gt);
> >       }
> >   
> >       /*
> 
> <snip>
> 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> > index bf085b0cb7c6..8e2eeaec06cb 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
> 
> > @@ -123,18 +124,18 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
> >                        * Forcibly cancel outstanding work and leave
> >                        * the gpu quiet.
> >                        */
> > -                     i915_gem_set_wedged(i915);
> > +                     intel_gt_set_wedged(gt);
> >                       result = false;
> >               }
> > -     } while (i915_retire_requests(i915) && result);
> > +     } while (i915_retire_requests(gt->i915) && result);
> >   
> > -     GEM_BUG_ON(i915->gt.awake);
> > +     GEM_BUG_ON(gt->awake);
> >       return result;
> >   }
> >   
> >   bool i915_gem_load_power_context(struct drm_i915_private *i915)
> 
> This should probably move to intel_gt as well since it is the gt power 
> context that we're loading.

Working on something like that, but for the moment this is tied very
much into the heart of GEM initialisation. So it's operating at the GEM
device level currently.

> >   {
> > -     return switch_to_kernel_context_sync(i915);
> > +     return switch_to_kernel_context_sync(&i915->gt);
> >   }
> >   
> >   void i915_gem_suspend(struct drm_i915_private *i915)
> 
> <snip>
> 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > index 36ba80e6a0b7..9047e28e75ae 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > @@ -5,7 +5,9 @@
> >    */
> >   
> >   #include "i915_drv.h"
> > +#include "i915_params.h"
> >   #include "intel_engine_pm.h"
> > +#include "intel_gt.h"
> >   #include "intel_gt_pm.h"
> >   #include "intel_pm.h"
> >   #include "intel_wakeref.h"
> > @@ -17,6 +19,7 @@ static void pm_notify(struct drm_i915_private *i915, int state)
> >   
> >   static int intel_gt_unpark(struct intel_wakeref *wf)
> >   {
> > +     struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
> >       struct drm_i915_private *i915 =
> >               container_of(wf, typeof(*i915), gt.wakeref);
> 
> gt->i915 ?

Sensible. My goal here is to remove the i915 entirely.

> <snip>
> 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> > index f43ea830b1e8..70d723e801bf 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> > @@ -14,12 +14,21 @@
> >   #include <linux/types.h>
> >   
> >   #include "i915_vma.h"
> > +#include "intel_reset_types.h"
> >   #include "intel_wakeref.h"
> >   
> >   struct drm_i915_private;
> >   struct i915_ggtt;
> >   struct intel_uncore;
> >   
> > +struct intel_hangcheck {
> > +     /* For hangcheck timer */
> > +#define DRM_I915_HANGCHECK_PERIOD 1500 /* in ms */
> > +#define DRM_I915_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_I915_HANGCHECK_PERIOD)
> > +
> > +     struct delayed_work work;
> > +};
> 
> should intel_hangcheck be part of intel_reset_types?

It's not a part of reset per-se, it is only entity that runs alongside.
I just lazied out of making intel_hangcheck*.h as I am still revising my
planned heartbeats.

> > +
> >   struct intel_gt {
> >       struct drm_i915_private *i915;
> >       struct intel_uncore *uncore;
> > @@ -39,6 +48,9 @@ struct intel_gt {
> >       struct list_head closed_vma;
> >       spinlock_t closed_lock; /* guards the list of closed_vma */
> >   
> > +     struct intel_hangcheck hangcheck;
> > +     struct intel_reset reset;
> > +
> >       /**
> >        * Is the GPU currently considered idle, or busy executing
> >        * userspace requests? Whilst idle, we allow runtime power
> 
> <snip>
> 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index 72002c0f9698..67c06231b004 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> 
> > @@ -1267,113 +1252,119 @@ void i915_handle_error(struct drm_i915_private *i915,
> >       synchronize_rcu_expedited();
> >   
> >       /* Prevent any other reset-engine attempt. */
> > -     for_each_engine(engine, i915, tmp) {
> > +     for_each_engine(engine, gt->i915, tmp) {
> >               while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> > -                                     &error->flags))
> > -                     wait_on_bit(&error->flags,
> > +                                     &gt->reset.flags))
> > +                     wait_on_bit(&gt->reset.flags,
> >                                   I915_RESET_ENGINE + engine->id,
> >                                   TASK_UNINTERRUPTIBLE);
> >       }
> >   
> > -     i915_reset_device(i915, engine_mask, msg);
> > +     intel_gt_reset_global(gt, engine_mask, msg);
> >   
> > -     for_each_engine(engine, i915, tmp) {
> > -             clear_bit(I915_RESET_ENGINE + engine->id,
> > -                       &error->flags);
> > -     }
> > -
> > -     clear_bit(I915_RESET_BACKOFF, &error->flags);
> > -     wake_up_all(&error->reset_queue);
> > +     for_each_engine(engine, gt->i915, tmp)
> > +             clear_bit_unlock(I915_RESET_ENGINE + engine->id,
> > +                              &gt->reset.flags);
> > +     clear_bit_unlock(I915_RESET_BACKOFF, &gt->reset.flags);
> > +     smp_mb__after_atomic();
> 
> Why do we now need this barrier here?

We always needed it for the same reason as you gave above. PeterZ
pointed out the lapse as well for the clear_bit/wake_up_bit combo.

> 
> > +     wake_up_all(&gt->reset.queue);
> >   
> >   out:
> > -     intel_runtime_pm_put(&i915->runtime_pm, wakeref);
> > +     intel_runtime_pm_put(&gt->i915->runtime_pm, wakeref);
> >   }
> >   
> 
> <snip>
> 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h
> > index 03fba0ab3868..62f6cb520f96 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.h
> > @@ -11,56 +11,67 @@
> >   #include <linux/types.h>
> >   #include <linux/srcu.h>
> >   
> > -#include "gt/intel_engine_types.h"
> > +#include "intel_engine_types.h"
> > +#include "intel_reset_types.h"
> >   
> >   struct drm_i915_private;
> >   struct i915_request;
> >   struct intel_engine_cs;
> > +struct intel_gt;
> >   struct intel_guc;
> >   
> > +void intel_gt_init_reset(struct intel_gt *gt);
> > +void intel_gt_fini_reset(struct intel_gt *gt);
> > +
> >   __printf(4, 5)
> > -void i915_handle_error(struct drm_i915_private *i915,
> > -                    intel_engine_mask_t engine_mask,
> > -                    unsigned long flags,
> > -                    const char *fmt, ...);
> > +void intel_gt_handle_error(struct intel_gt *gt,
> > +                        intel_engine_mask_t engine_mask,
> > +                        unsigned long flags,
> > +                        const char *fmt, ...);
> >   #define I915_ERROR_CAPTURE BIT(0)
> >   
> > -void i915_reset(struct drm_i915_private *i915,
> > -             intel_engine_mask_t stalled_mask,
> > -             const char *reason);
> > -int i915_reset_engine(struct intel_engine_cs *engine,
> > -                   const char *reason);
> > -
> > -void i915_reset_request(struct i915_request *rq, bool guilty);
> > +void intel_gt_reset(struct intel_gt *gt,
> > +                 intel_engine_mask_t stalled_mask,
> > +                 const char *reason);
> > +int intel_engine_reset(struct intel_engine_cs *engine,
> > +                    const char *reason);
> >   
> > -int __must_check i915_reset_trylock(struct drm_i915_private *i915);
> > -void i915_reset_unlock(struct drm_i915_private *i915, int tag);
> > +void __i915_request_reset(struct i915_request *rq, bool guilty);
> >   
> > -int i915_terminally_wedged(struct drm_i915_private *i915);
> > +int __must_check intel_gt_reset_trylock(struct intel_gt *gt);
> > +void intel_gt_reset_unlock(struct intel_gt *gt, int tag);
> >   
> > -bool intel_has_gpu_reset(struct drm_i915_private *i915);
> > -bool intel_has_reset_engine(struct drm_i915_private *i915);
> > +void intel_gt_set_wedged(struct intel_gt *gt);
> > +bool intel_gt_unset_wedged(struct intel_gt *gt);
> > +int intel_gt_terminally_wedged(struct intel_gt *gt);
> >   
> > -int intel_gpu_reset(struct drm_i915_private *i915,
> > -                 intel_engine_mask_t engine_mask);
> > +int intel_gpu_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask);
> 
> time to rename to intel_gt_reset?

Already taken.

> 
> >   
> > -int intel_reset_guc(struct drm_i915_private *i915);
> > +int intel_reset_guc(struct intel_gt *gt);
> >   
> > -struct i915_wedge_me {
> > +struct intel_wedge_me {
> >       struct delayed_work work;
> > -     struct drm_i915_private *i915;
> > +     struct intel_gt *gt;
> >       const char *name;
> >   };
> >   
> > -void __i915_init_wedge(struct i915_wedge_me *w,
> > -                    struct drm_i915_private *i915,
> > -                    long timeout,
> > -                    const char *name);
> > -void __i915_fini_wedge(struct i915_wedge_me *w);
> > +void __intel_init_wedge(struct intel_wedge_me *w,
> > +                     struct intel_gt *gt,
> > +                     long timeout,
> > +                     const char *name);
> > +void __intel_fini_wedge(struct intel_wedge_me *w);
> >   
> > -#define i915_wedge_on_timeout(W, DEV, TIMEOUT)                               \
> > -     for (__i915_init_wedge((W), (DEV), (TIMEOUT), __func__);        \
> > -          (W)->i915;                                                 \
> > -          __i915_fini_wedge((W)))
> > +#define intel_wedge_on_timeout(W, GT, TIMEOUT)                               \
> > +     for (__intel_init_wedge((W), (GT), (TIMEOUT), __func__);        \
> > +          (W)->gt;                                                   \
> > +          __intel_fini_wedge((W)))
> > +
> > +static inline bool __intel_reset_failed(const struct intel_reset *reset)
> > +{
> > +     return unlikely(test_bit(I915_WEDGED, &reset->flags));
> > +}
> > +
> > +bool intel_has_gpu_reset(struct drm_i915_private *i915);
> > +bool intel_has_reset_engine(struct drm_i915_private *i915);
> >   
> >   #endif /* I915_RESET_H */
> 
> <snip>
> 
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > index 2d9cc3cd1f27..8caad19683a1 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> 
> > @@ -442,7 +441,7 @@ static int igt_reset_nop(void *arg)
> >   
> >   out:
> >       mock_file_free(i915, file);
> > -     if (i915_reset_failed(i915))
> > +     if (intel_gt_is_wedged(&i915->gt))
> 
> &i915->gt is used 5 times in igt_reset_nop(), might be worth having a 
> local variable.
> 
> >               err = -EIO;
> >       return err;
> >   }
> 
> <snip>
> 
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index ce1b6568515e..cf5155646927 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> 
> > @@ -1061,23 +1062,6 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
> >       return 0;
> >   }
> >   
> > -static int i915_reset_info(struct seq_file *m, void *unused)
> 
> The fact that this is going away sould be mentioned in the commit message.

Why? Who cares? Those who do care should know the information is not
going away and is available elsewhere.

> > -{
> > -     struct drm_i915_private *dev_priv = node_to_i915(m->private);
> > -     struct i915_gpu_error *error = &dev_priv->gpu_error;
> > -     struct intel_engine_cs *engine;
> > -     enum intel_engine_id id;
> > -
> > -     seq_printf(m, "full gpu reset = %u\n", i915_reset_count(error));
> > -
> > -     for_each_engine(engine, dev_priv, id) {
> > -             seq_printf(m, "%s = %u\n", engine->name,
> > -                        i915_reset_engine_count(error, engine));
> > -     }
> > -
> > -     return 0;
> > -}
> > -
> >   static int ironlake_drpc_info(struct seq_file *m)
> >   {
> >       struct drm_i915_private *i915 = node_to_i915(m->private);
> 
> <snip>
> 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index b6f3baa74da4..cec3cea7d86f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -894,13 +894,13 @@ void i915_gem_runtime_suspend(struct drm_i915_private *i915)
> >       }
> >   }
> >   
> > -static int wait_for_engines(struct drm_i915_private *i915)
> > +static int wait_for_engines(struct intel_gt *gt)
> >   {
> > -     if (wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT)) {
> > -             dev_err(i915->drm.dev,
> > +     if (wait_for(intel_engines_are_idle(gt), I915_IDLE_ENGINES_TIMEOUT)) {
> > +             dev_err(gt->i915->drm.dev,
> >                       "Failed to idle engines, declaring wedged!\n");
> >               GEM_TRACE_DUMP();
> > -             i915_gem_set_wedged(i915);
> > +             intel_gt_set_wedged(gt);
> >               return -EIO;
> >       }
> >   
> > @@ -971,7 +971,7 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
> 
> Is it time for this to become intel_gt_wait_for_idle? (as a follow up)

No, it is not time for that. See the struct_mutex and that the only
caller is from i915_gem_pm.c?

The next step here to remove struct_mutex is to move this to iterate
over gt->timelines at which point the worker can be moved over from gem
to intel_gt_pm.c (intel_gt_pm_workers.c?)

> >               lockdep_assert_held(&i915->drm.struct_mutex);
> >   
> > -             err = wait_for_engines(i915);
> > +             err = wait_for_engines(&i915->gt);
> >               if (err)
> >                       return err;
> >   
> > @@ -1149,8 +1149,8 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
> >        * back to defaults, recovering from whatever wedged state we left it
> >        * in and so worth trying to use the device once more.
> >        */
> > -     if (i915_terminally_wedged(i915))
> > -             i915_gem_unset_wedged(i915);
> > +     if (intel_gt_is_wedged(&i915->gt))
> > +             intel_gt_unset_wedged(&i915->gt);
> 
> I was wondering if we should move this inside the intel_gt_sanitize just 
> below, but then I realized that intel_gt_unset_wedged calls 
> intel_gt_sanitize as well so that's a no (for now). However, it is IMO 
> still worth to at least flip the i915_gem_sanitize function to work on 
> intel_gt since al the calls inside, rpm aside, can use the gt.

It is still i915_gem so should take its device, which is currently
drm_i915_private. We can and should move more into intel_gt_sanitize.

> >       /** Number of times the device has been reset (global) */
> > -     u32 reset_count;
> > +     atomic_t reset_count;
> >   
> >       /** Number of times an engine has been reset */
> > -     u32 reset_engine_count[I915_NUM_ENGINES];
> > -
> > -     struct mutex wedge_mutex; /* serialises wedging/unwedging */
> > -
> > -     /**
> > -      * Waitqueue to signal when a hang is detected. Used to for waiters
> > -      * to release the struct_mutex for the reset to procede.
> > -      */
> > -     wait_queue_head_t wait_queue;
> > -
> > -     /**
> > -      * Waitqueue to signal when the reset has completed. Used by clients
> > -      * that wait for dev_priv->mm.wedged to settle.
> > -      */
> > -     wait_queue_head_t reset_queue;
> > -
> > -     struct srcu_struct reset_backoff_srcu;
> > +     atomic_t reset_engine_count[I915_NUM_ENGINES];
> 
> I'd argue that both reset counts should go inside the intel_reset 
> structure under intel_gt since we're counting resets of the GT and its 
> components.

No, they are global counters as the GEM interface they serve is global.
I'd buy an argument to move them into i915->gem, except that we poke
them from inside gt/ so it seems prudent to use a tertiary shared
struct.

> > -static void __igt_wedge_me(struct work_struct *work)
> > -{
> > -     struct igt_wedge_me *w = container_of(work, typeof(*w), work.work);
> > -
> > -     pr_err("%s timed out, cancelling test.\n", w->name);
> > -
> > -     GEM_TRACE("%s timed out.\n", w->name);
> > -     GEM_TRACE_DUMP();
> 
> These dumps are not in intel_wedge_me, which replaced this in selftests. 
> Should we add them there or are they not meaningful enough for us to care?

They had been useful, but set_wedged includes a dump -- however, only if the
engines are not idle. Equally, for other users we do not want to always
dump. I am happy to play this by ear and add a flag later if we hit a
quiet selftest wedge-me and we are baffled. (Although I imagine that the
first step there will more localised debug.)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 8/8] drm/i915/gt: Use intel_gt as the primary object for handling resets
  2019-07-08 22:48   ` Daniele Ceraolo Spurio
  2019-07-09  8:11     ` Chris Wilson
@ 2019-07-09  8:15     ` Chris Wilson
  1 sibling, 0 replies; 16+ messages in thread
From: Chris Wilson @ 2019-07-09  8:15 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx

Quoting Daniele Ceraolo Spurio (2019-07-08 23:48:51)
> On 7/5/19 12:46 AM, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > index 2d9cc3cd1f27..8caad19683a1 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> 
> > @@ -442,7 +441,7 @@ static int igt_reset_nop(void *arg)
> >   
> >   out:
> >       mock_file_free(i915, file);
> > -     if (i915_reset_failed(i915))
> > +     if (intel_gt_is_wedged(&i915->gt))
> 
> &i915->gt is used 5 times in igt_reset_nop(), might be worth having a 
> local variable.

Oh, this is before intel_gt_live_selftests() :)

I changed it so that we passed intel_gt as the arg to these selftests.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-07-09  8:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-05  7:45 [PATCH 1/8] drm/i915: Order assert forcewake test Chris Wilson
2019-07-05  7:45 ` [PATCH 2/8] drm/i915: Teach execbuffer to take the engine wakeref not GT Chris Wilson
2019-07-05  7:45 ` [PATCH 3/8] drm/i915/gt: Track timeline activeness in enter/exit Chris Wilson
2019-07-05  7:46 ` [PATCH 4/8] drm/i915/gt: Convert timeline tracking to spinlock Chris Wilson
2019-07-05  7:46 ` [PATCH 5/8] drm/i915/gt: Guard timeline pinning with its own mutex Chris Wilson
2019-07-05  7:46 ` [PATCH 6/8] drm/i915: Protect request retirement with timeline->mutex Chris Wilson
2019-07-05  7:46 ` [PATCH 7/8] drm/i915: Replace struct_mutex for batch pool serialisation Chris Wilson
2019-07-05  7:46 ` [PATCH 8/8] drm/i915/gt: Use intel_gt as the primary object for handling resets Chris Wilson
2019-07-08 22:48   ` Daniele Ceraolo Spurio
2019-07-09  8:11     ` Chris Wilson
2019-07-09  8:15     ` Chris Wilson
2019-07-05  9:53 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Order assert forcewake test Patchwork
2019-07-05  9:57 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-07-05 10:16 ` ✓ Fi.CI.BAT: success " Patchwork
2019-07-05 10:36 ` [PATCH 1/8] " Chris Wilson
2019-07-06 12:54 ` ✓ Fi.CI.IGT: success for series starting with [1/8] " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.