All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring
@ 2018-05-03  6:36 Chris Wilson
  2018-05-03  6:36 ` [PATCH 02/71] drm/i915/execlists: Emit i915_trace_request_out for preemption Chris Wilson
                   ` (51 more replies)
  0 siblings, 52 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

Limit the arbitration (where preemption may occur) to inside the batch,
and prevent it from happening on the pipecontrols/flushes we use to
write the breadcrumb seqno. Once the user batch is complete, we have
nothing left to do but serialise and emit the breadcrumb; switching
contexts at this point is futile so don't.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e04798e98db2..70b722c36e65 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1934,7 +1934,7 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 		rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
 	}
 
-	cs = intel_ring_begin(rq, 4);
+	cs = intel_ring_begin(rq, 6);
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
@@ -1963,6 +1963,9 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 		(flags & I915_DISPATCH_RS ? MI_BATCH_RESOURCE_STREAMER : 0);
 	*cs++ = lower_32_bits(offset);
 	*cs++ = upper_32_bits(offset);
+
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+	*cs++ = MI_NOOP;
 	intel_ring_advance(rq, cs);
 
 	return 0;
@@ -2105,7 +2108,7 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
 	cs = gen8_emit_ggtt_write(cs, request->global_seqno,
 				  intel_hws_seqno_address(request->engine));
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 	request->tail = intel_ring_offset(request, cs);
 	assert_ring_tail_valid(request->ring, request->tail);
 
@@ -2121,7 +2124,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 	cs = gen8_emit_ggtt_write_rcs(cs, request->global_seqno,
 				      intel_hws_seqno_address(request->engine));
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 	request->tail = intel_ring_offset(request, cs);
 	assert_ring_tail_valid(request->ring, request->tail);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 02/71] drm/i915/execlists: Emit i915_trace_request_out for preemption
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03  6:36 ` [PATCH 03/71] drm/i915: Lazily unbind vma on close Chris Wilson
                   ` (50 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

Move the tracepoint into the common execlists_context_schedule_out() and
call it from preemption completion as well. A small bit of refactoring
code should help with when tracing, or else we end up with requests
mysteriously disappearing and some being emitted to HW multiple times.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 70b722c36e65..9f3cce022b2d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -398,10 +398,11 @@ execlists_context_schedule_in(struct i915_request *rq)
 }
 
 static inline void
-execlists_context_schedule_out(struct i915_request *rq)
+execlists_context_schedule_out(struct i915_request *rq, unsigned long status)
 {
 	intel_engine_context_out(rq->engine);
-	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
+	execlists_context_status_change(rq, status);
+	trace_i915_request_out(rq);
 }
 
 static void
@@ -772,12 +773,10 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 			  intel_engine_get_seqno(rq->engine));
 
 		GEM_BUG_ON(!execlists->active);
-		intel_engine_context_out(rq->engine);
-
-		execlists_context_status_change(rq,
-						i915_request_completed(rq) ?
-						INTEL_CONTEXT_SCHEDULE_OUT :
-						INTEL_CONTEXT_SCHEDULE_PREEMPTED);
+		execlists_context_schedule_out(rq,
+					       i915_request_completed(rq) ?
+					       INTEL_CONTEXT_SCHEDULE_OUT :
+					       INTEL_CONTEXT_SCHEDULE_PREEMPTED);
 
 		i915_request_put(rq);
 
@@ -1105,8 +1104,8 @@ static void execlists_submission_tasklet(unsigned long data)
 				 */
 				GEM_BUG_ON(!i915_request_completed(rq));
 
-				execlists_context_schedule_out(rq);
-				trace_i915_request_out(rq);
+				execlists_context_schedule_out(rq,
+							       INTEL_CONTEXT_SCHEDULE_OUT);
 				i915_request_put(rq);
 
 				GEM_TRACE("%s completed ctx=%d\n",
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 03/71] drm/i915: Lazily unbind vma on close
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
  2018-05-03  6:36 ` [PATCH 02/71] drm/i915/execlists: Emit i915_trace_request_out for preemption Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03 16:59   ` Tvrtko Ursulin
  2018-05-03  6:36 ` [PATCH 04/71] drm/i915: Keep one request in our ring_list Chris Wilson
                   ` (49 subsequent siblings)
  51 siblings, 1 reply; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

When userspace is passing around swapbuffers using DRI, we frequently
have to open and close the same object in the foreign address space.
This shows itself as the same object being rebound at roughly 30fps
(with a second object also being rebound at 30fps), which involves us
having to rewrite the page tables and maintain the drm_mm range manager
every time.

However, since the object still exists and it is only the local handle
that disappears, if we are lazy and do not unbind the VMA immediately
when the local user closes the object but defer it until the GPU is
idle, then we can reuse the same VMA binding. We still have to be
careful to mark the handle and lookup tables as closed to maintain the
uABI, just allowing the underlying VMA to be resurrected if the user is
able to access the same object from the same context again.

If the object itself is destroyed (neither userspace keeping a handle to
it), the VMA will be reaped immediately as usual.

In the future, this will be even more useful as instantiating a new VMA
for use on the GPU will become heavier. A nuisance indeed, so nip it in
the bud.

v2: s/__i915_vma_final_close/i915_vma_destroy/ etc.
v3: Leave a hint as to why we deferred the unbind on close.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h               |  1 +
 drivers/gpu/drm/i915/i915_gem.c               |  4 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  3 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 14 ++--
 drivers/gpu/drm/i915/i915_vma.c               | 73 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_vma.h               |  6 ++
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  1 +
 8 files changed, 79 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 11ff84eef52a..04e27806e581 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2062,6 +2062,7 @@ struct drm_i915_private {
 		struct list_head timelines;
 
 		struct list_head active_rings;
+		struct list_head closed_vma;
 		u32 active_requests;
 		u32 request_serial;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 484354f25f98..5ece6ae4bdff 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -165,6 +165,7 @@ static u32 __i915_gem_park(struct drm_i915_private *i915)
 	i915_timelines_park(i915);
 
 	i915_pmu_gt_parked(i915);
+	i915_vma_parked(i915);
 
 	i915->gt.awake = false;
 
@@ -4795,7 +4796,7 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
 					 &obj->vma_list, obj_link) {
 			GEM_BUG_ON(i915_vma_is_active(vma));
 			vma->flags &= ~I915_VMA_PIN_MASK;
-			i915_vma_close(vma);
+			i915_vma_destroy(vma);
 		}
 		GEM_BUG_ON(!list_empty(&obj->vma_list));
 		GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma_tree));
@@ -5598,6 +5599,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 
 	INIT_LIST_HEAD(&dev_priv->gt.timelines);
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
+	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
 
 	i915_gem_init__mm(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c74f5df3fb5a..f627a8c47c58 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -762,7 +762,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 		}
 
 		/* transfer ref to ctx */
-		vma->open_count++;
+		if (!vma->open_count++)
+			i915_vma_reopen(vma);
 		list_add(&lut->obj_link, &obj->lut_list);
 		list_add(&lut->ctx_link, &eb->ctx->handles_list);
 		lut->ctx = eb->ctx;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e9d828324f67..272d6bb407cc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2218,6 +2218,12 @@ i915_ppgtt_create(struct drm_i915_private *dev_priv,
 }
 
 void i915_ppgtt_close(struct i915_address_space *vm)
+{
+	GEM_BUG_ON(vm->closed);
+	vm->closed = true;
+}
+
+static void ppgtt_destroy_vma(struct i915_address_space *vm)
 {
 	struct list_head *phases[] = {
 		&vm->active_list,
@@ -2226,15 +2232,12 @@ void i915_ppgtt_close(struct i915_address_space *vm)
 		NULL,
 	}, **phase;
 
-	GEM_BUG_ON(vm->closed);
 	vm->closed = true;
-
 	for (phase = phases; *phase; phase++) {
 		struct i915_vma *vma, *vn;
 
 		list_for_each_entry_safe(vma, vn, *phase, vm_link)
-			if (!i915_vma_is_closed(vma))
-				i915_vma_close(vma);
+			i915_vma_destroy(vma);
 	}
 }
 
@@ -2245,7 +2248,8 @@ void i915_ppgtt_release(struct kref *kref)
 
 	trace_i915_ppgtt_release(&ppgtt->base);
 
-	/* vmas should already be unbound and destroyed */
+	ppgtt_destroy_vma(&ppgtt->base);
+
 	GEM_BUG_ON(!list_empty(&ppgtt->base.active_list));
 	GEM_BUG_ON(!list_empty(&ppgtt->base.inactive_list));
 	GEM_BUG_ON(!list_empty(&ppgtt->base.unbound_list));
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 4bda3bd29bf5..9324d476e0a7 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -46,8 +46,6 @@ i915_vma_retire(struct i915_gem_active *active, struct i915_request *rq)
 
 	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
 	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
-	if (unlikely(i915_vma_is_closed(vma) && !i915_vma_is_pinned(vma)))
-		WARN_ON(i915_vma_unbind(vma));
 
 	GEM_BUG_ON(!i915_gem_object_is_active(obj));
 	if (--obj->active_count)
@@ -232,7 +230,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
 	if (!vma)
 		vma = vma_create(obj, vm, view);
 
-	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_is_closed(vma));
 	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
 	GEM_BUG_ON(!IS_ERR(vma) && vma_lookup(obj, vm, view) != vma);
 	return vma;
@@ -684,13 +681,43 @@ int __i915_vma_do_pin(struct i915_vma *vma,
 	return ret;
 }
 
-static void i915_vma_destroy(struct i915_vma *vma)
+void i915_vma_close(struct i915_vma *vma)
+{
+	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
+
+	GEM_BUG_ON(i915_vma_is_closed(vma));
+	vma->flags |= I915_VMA_CLOSED;
+
+	/*
+	 * We defer actually closing, unbinding and destroying the VMA until
+	 * the next idle point, or if the object is freed in the meantime. By
+	 * postponing the unbind, we allow for it to be resurrected by the
+	 * client, avoiding the work required to rebind the VMA. This is
+	 * advantageous for DRI, where the client/server pass objects
+	 * between themselves, temporarily opening a local VMA to the
+	 * object, and then closing it again. The same object is then reused
+	 * on the next frame (or two, depending on the depth of the swap queue)
+	 * causing us to rebind the VMA once more. This ends up being a lot
+	 * of wasted work for the steady state.
+	 */
+	list_add_tail(&vma->closed_link, &vma->vm->i915->gt.closed_vma);
+}
+
+void i915_vma_reopen(struct i915_vma *vma)
+{
+	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
+
+	if (vma->flags & I915_VMA_CLOSED) {
+		vma->flags &= ~I915_VMA_CLOSED;
+		list_del(&vma->closed_link);
+	}
+}
+
+static void __i915_vma_destroy(struct i915_vma *vma)
 {
 	int i;
 
 	GEM_BUG_ON(vma->node.allocated);
-	GEM_BUG_ON(i915_vma_is_active(vma));
-	GEM_BUG_ON(!i915_vma_is_closed(vma));
 	GEM_BUG_ON(vma->fence);
 
 	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
@@ -699,6 +726,7 @@ static void i915_vma_destroy(struct i915_vma *vma)
 
 	list_del(&vma->obj_link);
 	list_del(&vma->vm_link);
+	rb_erase(&vma->obj_node, &vma->obj->vma_tree);
 
 	if (!i915_vma_is_ggtt(vma))
 		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
@@ -706,15 +734,30 @@ static void i915_vma_destroy(struct i915_vma *vma)
 	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
 }
 
-void i915_vma_close(struct i915_vma *vma)
+void i915_vma_destroy(struct i915_vma *vma)
 {
-	GEM_BUG_ON(i915_vma_is_closed(vma));
-	vma->flags |= I915_VMA_CLOSED;
+	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
 
-	rb_erase(&vma->obj_node, &vma->obj->vma_tree);
+	GEM_BUG_ON(i915_vma_is_active(vma));
+	GEM_BUG_ON(i915_vma_is_pinned(vma));
+
+	if (i915_vma_is_closed(vma))
+		list_del(&vma->closed_link);
+
+	WARN_ON(i915_vma_unbind(vma));
+	__i915_vma_destroy(vma);
+}
+
+void i915_vma_parked(struct drm_i915_private *i915)
+{
+	struct i915_vma *vma, *next;
 
-	if (!i915_vma_is_active(vma) && !i915_vma_is_pinned(vma))
-		WARN_ON(i915_vma_unbind(vma));
+	list_for_each_entry_safe(vma, next, &i915->gt.closed_vma, closed_link) {
+		GEM_BUG_ON(!i915_vma_is_closed(vma));
+		i915_vma_destroy(vma);
+	}
+
+	GEM_BUG_ON(!list_empty(&i915->gt.closed_vma));
 }
 
 static void __i915_vma_iounmap(struct i915_vma *vma)
@@ -804,7 +847,7 @@ int i915_vma_unbind(struct i915_vma *vma)
 		return -EBUSY;
 
 	if (!drm_mm_node_allocated(&vma->node))
-		goto destroy;
+		return 0;
 
 	GEM_BUG_ON(obj->bind_count == 0);
 	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
@@ -841,10 +884,6 @@ int i915_vma_unbind(struct i915_vma *vma)
 
 	i915_vma_remove(vma);
 
-destroy:
-	if (unlikely(i915_vma_is_closed(vma)))
-		i915_vma_destroy(vma);
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 8c5022095418..fc4294cfaa91 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -119,6 +119,8 @@ struct i915_vma {
 	/** This vma's place in the eviction list */
 	struct list_head evict_link;
 
+	struct list_head closed_link;
+
 	/**
 	 * Used for performing relocations during execbuffer insertion.
 	 */
@@ -285,6 +287,8 @@ void i915_vma_revoke_mmap(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
 void i915_vma_unlink_ctx(struct i915_vma *vma);
 void i915_vma_close(struct i915_vma *vma);
+void i915_vma_reopen(struct i915_vma *vma);
+void i915_vma_destroy(struct i915_vma *vma);
 
 int __i915_vma_do_pin(struct i915_vma *vma,
 		      u64 size, u64 alignment, u64 flags);
@@ -408,6 +412,8 @@ i915_vma_unpin_fence(struct i915_vma *vma)
 		__i915_vma_unpin_fence(vma);
 }
 
+void i915_vma_parked(struct drm_i915_private *i915);
+
 #define for_each_until(cond) if (cond) break; else
 
 /**
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index 05bbef363fff..d7c8ef8e6764 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1091,7 +1091,7 @@ static int __igt_write_huge(struct i915_gem_context *ctx,
 out_vma_unpin:
 	i915_vma_unpin(vma);
 out_vma_close:
-	i915_vma_close(vma);
+	i915_vma_destroy(vma);
 
 	return err;
 }
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index a662c0450e77..4b6622c6986a 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -226,6 +226,7 @@ struct drm_i915_private *mock_gem_device(void)
 
 	INIT_LIST_HEAD(&i915->gt.timelines);
 	INIT_LIST_HEAD(&i915->gt.active_rings);
+	INIT_LIST_HEAD(&i915->gt.closed_vma);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	mock_init_ggtt(i915);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 04/71] drm/i915: Keep one request in our ring_list
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
  2018-05-03  6:36 ` [PATCH 02/71] drm/i915/execlists: Emit i915_trace_request_out for preemption Chris Wilson
  2018-05-03  6:36 ` [PATCH 03/71] drm/i915: Lazily unbind vma on close Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03 17:04   ` Tvrtko Ursulin
  2018-05-03  6:36 ` [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling Chris Wilson
                   ` (48 subsequent siblings)
  51 siblings, 1 reply; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

Don't pre-emptively retire the oldest request in our ring's list if it
is the only request. We keep various bits of state alive using the
active reference from the request and would rather transfer that state
over to a new request rather than the more involved process of retiring
and reacquiring it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5acf869f3ca3..75061f9e48eb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -694,9 +694,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		goto err_unreserve;
 
 	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry_or_null(&ring->request_list,
-				      typeof(*rq), ring_link);
-	if (rq && i915_request_completed(rq))
+	rq = list_first_entry(&ring->request_list, typeof(*rq), ring_link);
+	if (!list_is_last(&rq->ring_link, &ring->request_list) &&
+	    i915_request_completed(rq))
 		i915_request_retire(rq);
 
 	/*
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (2 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 04/71] drm/i915: Keep one request in our ring_list Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03 17:49   ` Tvrtko Ursulin
  2018-05-03  6:36 ` [PATCH 06/71] drm/i915: Detect if we missed kicking the execlists tasklet Chris Wilson
                   ` (47 subsequent siblings)
  51 siblings, 1 reply; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

As we reschedule the requests, we do not want the submission tasklet
running until we finish updating the priority chains. (We start
rewriting priorities from the oldest, but the dequeue looks at the most
recent in-flight, so there is a small race condition where dequeue may
decide that preemption is falsely required.) Combine the tasklet kicking
from adding a new request with the set-wedge protection so that we only
have to adjust the preempt-counter once to achieve both goals.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     | 4 ++--
 drivers/gpu/drm/i915/i915_request.c | 5 +----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5ece6ae4bdff..03cd30001b5d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence,
 	rq = to_request(fence);
 	engine = rq->engine;
 
-	rcu_read_lock();
+	local_bh_disable(); /* RCU serialisation for set-wedged protection */
 	if (engine->schedule)
 		engine->schedule(rq, attr);
-	rcu_read_unlock();
+	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
 }
 
 static void fence_set_priority(struct dma_fence *fence,
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 75061f9e48eb..0756fafa7f81 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * decide whether to preempt the entire chain so that it is ready to
 	 * run at the earliest possible convenience.
 	 */
-	rcu_read_lock();
+	local_bh_disable();
 	if (engine->schedule)
 		engine->schedule(request, &request->ctx->sched);
-	rcu_read_unlock();
-
-	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 06/71] drm/i915: Detect if we missed kicking the execlists tasklet
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (3 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03 13:08   ` Chris Wilson
  2018-05-03  6:36 ` [PATCH 07/71] drm/i915: Move request->ctx aside Chris Wilson
                   ` (46 subsequent siblings)
  51 siblings, 1 reply; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

If inside hangcheck we see that the engine has paused, but there is an
execlists interrupt still pending, we know that the tasklet did not
fire. Dump the GEM trace along with the current engine state, and kick
the tasklet to recovery without having to go through a GPU reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_hangcheck.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 309e38b00e95..2d7f10492e35 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -267,6 +267,29 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 		}
 	}
 
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
+		struct intel_engine_execlists *execlists = &engine->execlists;
+		enum intel_engine_hangcheck_action ret = ENGINE_WAIT;
+
+		if (GEM_SHOW_DEBUG()) {
+			struct drm_printer p = drm_debug_printer("hangcheck");
+
+			GEM_TRACE_DUMP();
+			intel_engine_dump(engine, &p,
+					  "%s stuck\n", engine->name);
+		}
+
+		if (tasklet_trylock(&execlists->tasklet)) {
+			execlists->tasklet.func(execlists->tasklet.data);
+			tasklet_unlock(&execlists->tasklet);
+
+			ret = ENGINE_WAIT_KICK;
+		}
+
+		tasklet_hi_schedule(&execlists->tasklet);
+		return ret;
+	}
+
 	return ENGINE_DEAD;
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 07/71] drm/i915: Move request->ctx aside
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (4 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 06/71] drm/i915: Detect if we missed kicking the execlists tasklet Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03  6:36 ` [PATCH 08/71] drm/i915: Move fiddling with engine->last_retired_context Chris Wilson
                   ` (45 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we want to store the intel_context pointer inside
i915_request, as it is frequently access via a convoluted dance when
submitting the request to hw. Having two context pointers inside
i915_request leads to confusion so first rename the existing
i915_gem_context pointer to i915_request.gem_context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gvt/scheduler.c          |  4 +--
 drivers/gpu/drm/i915/i915_debugfs.c           |  4 +--
 drivers/gpu/drm/i915/i915_gem.c               | 10 +++---
 drivers/gpu/drm/i915/i915_gpu_error.c         | 18 ++++++-----
 drivers/gpu/drm/i915/i915_request.c           | 12 +++----
 drivers/gpu/drm/i915/i915_request.h           |  2 +-
 drivers/gpu/drm/i915/i915_trace.h             |  8 ++---
 drivers/gpu/drm/i915/intel_engine_cs.c        |  2 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |  7 +++--
 drivers/gpu/drm/i915/intel_lrc.c              | 31 ++++++++++---------
 drivers/gpu/drm/i915/intel_ringbuffer.c       | 12 +++----
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |  5 ++-
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  2 +-
 13 files changed, 63 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index ffb45a9ee228..f409a154491d 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -205,7 +205,7 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 
 static inline bool is_gvt_request(struct i915_request *req)
 {
-	return i915_gem_context_force_single_submission(req->ctx);
+	return i915_gem_context_force_single_submission(req->gem_context);
 }
 
 static void save_ring_hw_state(struct intel_vgpu *vgpu, int ring_id)
@@ -305,7 +305,7 @@ static int copy_workload_to_ring_buffer(struct intel_vgpu_workload *workload)
 	struct i915_request *req = workload->req;
 
 	if (IS_KABYLAKE(req->i915) &&
-	    is_inhibit_context(req->ctx, req->engine->id))
+	    is_inhibit_context(req->gem_context, req->engine->id))
 		intel_vgpu_restore_inhibit_context(vgpu, req);
 
 	/* allocate shadow ring buffer */
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 85911bc0b703..3118d5de195b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -542,8 +542,8 @@ static int i915_gem_object_info(struct seq_file *m, void *data)
 						   struct i915_request,
 						   client_link);
 		rcu_read_lock();
-		task = pid_task(request && request->ctx->pid ?
-				request->ctx->pid : file->pid,
+		task = pid_task(request && request->gem_context->pid ?
+				request->gem_context->pid : file->pid,
 				PIDTYPE_PID);
 		print_file_stats(m, task ? task->comm : "<unknown>", stats);
 		rcu_read_unlock();
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 03cd30001b5d..ecef2e8e5e93 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3110,7 +3110,7 @@ static void skip_request(struct i915_request *request)
 static void engine_skip_context(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
-	struct i915_gem_context *hung_ctx = request->ctx;
+	struct i915_gem_context *hung_ctx = request->gem_context;
 	struct i915_timeline *timeline = request->timeline;
 	unsigned long flags;
 
@@ -3120,7 +3120,7 @@ static void engine_skip_context(struct i915_request *request)
 	spin_lock(&timeline->lock);
 
 	list_for_each_entry_continue(request, &engine->timeline.requests, link)
-		if (request->ctx == hung_ctx)
+		if (request->gem_context == hung_ctx)
 			skip_request(request);
 
 	list_for_each_entry(request, &timeline->requests, link)
@@ -3166,11 +3166,11 @@ i915_gem_reset_request(struct intel_engine_cs *engine,
 	}
 
 	if (stalled) {
-		i915_gem_context_mark_guilty(request->ctx);
+		i915_gem_context_mark_guilty(request->gem_context);
 		skip_request(request);
 
 		/* If this context is now banned, skip all pending requests. */
-		if (i915_gem_context_is_banned(request->ctx))
+		if (i915_gem_context_is_banned(request->gem_context))
 			engine_skip_context(request);
 	} else {
 		/*
@@ -3180,7 +3180,7 @@ i915_gem_reset_request(struct intel_engine_cs *engine,
 		 */
 		request = i915_gem_find_active_request(engine);
 		if (request) {
-			i915_gem_context_mark_innocent(request->ctx);
+			i915_gem_context_mark_innocent(request->gem_context);
 			dma_fence_set_error(&request->fence, -EAGAIN);
 
 			/* Rewind the engine to replay the incomplete rq */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index df234dc23274..7cc7d3bc731b 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1287,9 +1287,11 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
 static void record_request(struct i915_request *request,
 			   struct drm_i915_error_request *erq)
 {
-	erq->context = request->ctx->hw_id;
+	struct i915_gem_context *ctx = request->gem_context;
+
+	erq->context = ctx->hw_id;
 	erq->sched_attr = request->sched.attr;
-	erq->ban_score = atomic_read(&request->ctx->ban_score);
+	erq->ban_score = atomic_read(&ctx->ban_score);
 	erq->seqno = request->global_seqno;
 	erq->jiffies = request->emitted_jiffies;
 	erq->start = i915_ggtt_offset(request->ring->vma);
@@ -1297,7 +1299,7 @@ static void record_request(struct i915_request *request,
 	erq->tail = request->tail;
 
 	rcu_read_lock();
-	erq->pid = request->ctx->pid ? pid_nr(request->ctx->pid) : 0;
+	erq->pid = ctx->pid ? pid_nr(ctx->pid) : 0;
 	rcu_read_unlock();
 }
 
@@ -1461,12 +1463,12 @@ static void gem_record_rings(struct i915_gpu_state *error)
 
 		request = i915_gem_find_active_request(engine);
 		if (request) {
+			struct i915_gem_context *ctx = request->gem_context;
 			struct intel_ring *ring;
 
-			ee->vm = request->ctx->ppgtt ?
-				&request->ctx->ppgtt->base : &ggtt->base;
+			ee->vm = ctx->ppgtt ? &ctx->ppgtt->base : &ggtt->base;
 
-			record_context(&ee->context, request->ctx);
+			record_context(&ee->context, ctx);
 
 			/* We need to copy these to an anonymous buffer
 			 * as the simplest method to avoid being overwritten
@@ -1483,11 +1485,11 @@ static void gem_record_rings(struct i915_gpu_state *error)
 
 			ee->ctx =
 				i915_error_object_create(i915,
-							 to_intel_context(request->ctx,
+							 to_intel_context(ctx,
 									  engine)->state);
 
 			error->simulated |=
-				i915_gem_context_no_error_capture(request->ctx);
+				i915_gem_context_no_error_capture(ctx);
 
 			ee->rq_head = request->head;
 			ee->rq_post = request->postfix;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0756fafa7f81..5205707fe03a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -383,7 +383,7 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
 	 */
 	if (engine->last_retired_context)
 		intel_context_unpin(engine->last_retired_context, engine);
-	engine->last_retired_context = rq->ctx;
+	engine->last_retired_context = rq->gem_context;
 }
 
 static void __retire_engine_upto(struct intel_engine_cs *engine,
@@ -454,8 +454,8 @@ static void i915_request_retire(struct i915_request *request)
 	i915_request_remove_from_client(request);
 
 	/* Retirement decays the ban score as it is a sign of ctx progress */
-	atomic_dec_if_positive(&request->ctx->ban_score);
-	intel_context_unpin(request->ctx, request->engine);
+	atomic_dec_if_positive(&request->gem_context->ban_score);
+	intel_context_unpin(request->gem_context, request->engine);
 
 	__retire_engine_upto(request->engine, request);
 
@@ -759,7 +759,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	INIT_LIST_HEAD(&rq->active_list);
 	rq->i915 = i915;
 	rq->engine = engine;
-	rq->ctx = ctx;
+	rq->gem_context = ctx;
 	rq->ring = ring;
 	rq->timeline = ring->timeline;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
@@ -813,7 +813,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		goto err_unwind;
 
 	/* Keep a second pin for the dual retirement along engine and ring */
-	__intel_context_pin(rq->ctx, engine);
+	__intel_context_pin(rq->gem_context, engine);
 
 	/* Check that we didn't interrupt ourselves with a new request */
 	GEM_BUG_ON(rq->timeline->seqno != rq->fence.seqno);
@@ -1111,7 +1111,7 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 */
 	local_bh_disable();
 	if (engine->schedule)
-		engine->schedule(request, &request->ctx->sched);
+		engine->schedule(request, &request->gem_context->sched);
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index eddbd4245cb3..dddecd9ffd0c 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -93,7 +93,7 @@ struct i915_request {
 	 * i915_request_free() will then decrement the refcount on the
 	 * context.
 	 */
-	struct i915_gem_context *ctx;
+	struct i915_gem_context *gem_context;
 	struct intel_engine_cs *engine;
 	struct intel_ring *ring;
 	struct i915_timeline *timeline;
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 408827bf5d96..462112e247c2 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -624,7 +624,7 @@ TRACE_EVENT(i915_request_queue,
 
 	    TP_fast_assign(
 			   __entry->dev = rq->i915->drm.primary->index;
-			   __entry->hw_id = rq->ctx->hw_id;
+			   __entry->hw_id = rq->gem_context->hw_id;
 			   __entry->ring = rq->engine->id;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
@@ -651,7 +651,7 @@ DECLARE_EVENT_CLASS(i915_request,
 
 	    TP_fast_assign(
 			   __entry->dev = rq->i915->drm.primary->index;
-			   __entry->hw_id = rq->ctx->hw_id;
+			   __entry->hw_id = rq->gem_context->hw_id;
 			   __entry->ring = rq->engine->id;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
@@ -695,7 +695,7 @@ DECLARE_EVENT_CLASS(i915_request_hw,
 
 		    TP_fast_assign(
 				   __entry->dev = rq->i915->drm.primary->index;
-				   __entry->hw_id = rq->ctx->hw_id;
+				   __entry->hw_id = rq->gem_context->hw_id;
 				   __entry->ring = rq->engine->id;
 				   __entry->ctx = rq->fence.context;
 				   __entry->seqno = rq->fence.seqno;
@@ -792,7 +792,7 @@ TRACE_EVENT(i915_request_wait_begin,
 	     */
 	    TP_fast_assign(
 			   __entry->dev = rq->i915->drm.primary->index;
-			   __entry->hw_id = rq->ctx->hw_id;
+			   __entry->hw_id = rq->gem_context->hw_id;
 			   __entry->ring = rq->engine->id;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index a90769b9954e..62dd394fbfcd 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1002,7 +1002,7 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 	 */
 	rq = __i915_gem_active_peek(&engine->timeline.last_request);
 	if (rq)
-		return rq->ctx == kernel_context;
+		return rq->gem_context == kernel_context;
 	else
 		return engine->last_retired_context == kernel_context;
 }
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 62828e39ee26..cc7b0c1b5e8c 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -513,8 +513,9 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	struct intel_guc_client *client = guc->execbuf_client;
 	struct intel_engine_cs *engine = rq->engine;
-	u32 ctx_desc = lower_32_bits(intel_lr_context_descriptor(rq->ctx,
-								 engine));
+	u32 ctx_desc =
+		lower_32_bits(intel_lr_context_descriptor(rq->gem_context,
+							  engine));
 	u32 ring_tail = intel_ring_set_tail(rq->ring, rq->tail) / sizeof(u64);
 
 	spin_lock(&client->wq_lock);
@@ -709,7 +710,7 @@ static void guc_dequeue(struct intel_engine_cs *engine)
 		struct i915_request *rq, *rn;
 
 		list_for_each_entry_safe(rq, rn, &p->requests, sched.link) {
-			if (last && rq->ctx != last->ctx) {
+			if (last && rq->gem_context != last->gem_context) {
 				if (port == last_port) {
 					__list_del_many(&p->requests,
 							&rq->sched.link);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9f3cce022b2d..578cb89b3af7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -416,9 +416,10 @@ execlists_update_context_pdps(struct i915_hw_ppgtt *ppgtt, u32 *reg_state)
 
 static u64 execlists_update_context(struct i915_request *rq)
 {
-	struct intel_context *ce = to_intel_context(rq->ctx, rq->engine);
+	struct intel_context *ce =
+		to_intel_context(rq->gem_context, rq->engine);
 	struct i915_hw_ppgtt *ppgtt =
-		rq->ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
+		rq->gem_context->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
 	u32 *reg_state = ce->lrc_reg_state;
 
 	reg_state[CTX_RING_TAIL+1] = intel_ring_set_tail(rq->ring, rq->tail);
@@ -668,7 +669,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 * second request, and so we never need to tell the
 			 * hardware about the first.
 			 */
-			if (last && !can_merge_ctx(rq->ctx, last->ctx)) {
+			if (last && !can_merge_ctx(rq->gem_context,
+						   last->gem_context)) {
 				/*
 				 * If we are on the second port and cannot
 				 * combine this request with the last, then we
@@ -687,14 +689,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 * the same context (even though a different
 				 * request) to the second port.
 				 */
-				if (ctx_single_port_submission(last->ctx) ||
-				    ctx_single_port_submission(rq->ctx)) {
+				if (ctx_single_port_submission(last->gem_context) ||
+				    ctx_single_port_submission(rq->gem_context)) {
 					__list_del_many(&p->requests,
 							&rq->sched.link);
 					goto done;
 				}
 
-				GEM_BUG_ON(last->ctx == rq->ctx);
+				GEM_BUG_ON(last->gem_context == rq->gem_context);
 
 				if (submit)
 					port_assign(port, last);
@@ -1401,7 +1403,7 @@ static void execlists_context_unpin(struct intel_engine_cs *engine,
 static int execlists_request_alloc(struct i915_request *request)
 {
 	struct intel_context *ce =
-		to_intel_context(request->ctx, request->engine);
+		to_intel_context(request->gem_context, request->engine);
 	int ret;
 
 	GEM_BUG_ON(!ce->pin_count);
@@ -1855,7 +1857,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 	 * future request will be after userspace has had the opportunity
 	 * to recreate its own state.
 	 */
-	regs = to_intel_context(request->ctx, engine)->lrc_reg_state;
+	regs = to_intel_context(request->gem_context, engine)->lrc_reg_state;
 	if (engine->default_state) {
 		void *defaults;
 
@@ -1868,7 +1870,8 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 			i915_gem_object_unpin_map(engine->default_state);
 		}
 	}
-	execlists_init_reg_state(regs, request->ctx, engine, request->ring);
+	execlists_init_reg_state(regs,
+				 request->gem_context, engine, request->ring);
 
 	/* Move the RING_HEAD onto the breadcrumb, past the hanging batch */
 	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(request->ring->vma);
@@ -1883,7 +1886,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 
 static int intel_logical_ring_emit_pdps(struct i915_request *rq)
 {
-	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
+	struct i915_hw_ppgtt *ppgtt = rq->gem_context->ppgtt;
 	struct intel_engine_cs *engine = rq->engine;
 	const int num_lri_cmds = GEN8_3LVL_PDPES * 2;
 	u32 *cs;
@@ -1922,15 +1925,15 @@ static int gen8_emit_bb_start(struct i915_request *rq,
 	 * it is unsafe in case of lite-restore (because the ctx is
 	 * not idle). PML4 is allocated during ppgtt init so this is
 	 * not needed in 48-bit.*/
-	if (rq->ctx->ppgtt &&
-	    (intel_engine_flag(rq->engine) & rq->ctx->ppgtt->pd_dirty_rings) &&
-	    !i915_vm_is_48bit(&rq->ctx->ppgtt->base) &&
+	if (rq->gem_context->ppgtt &&
+	    (intel_engine_flag(rq->engine) & rq->gem_context->ppgtt->pd_dirty_rings) &&
+	    !i915_vm_is_48bit(&rq->gem_context->ppgtt->base) &&
 	    !intel_vgpu_active(rq->i915)) {
 		ret = intel_logical_ring_emit_pdps(rq);
 		if (ret)
 			return ret;
 
-		rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
+		rq->gem_context->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
 	}
 
 	cs = intel_ring_begin(rq, 6);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8f19349a6055..fbd23127505d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -558,8 +558,8 @@ static void reset_ring_common(struct intel_engine_cs *engine,
 	 */
 	if (request) {
 		struct drm_i915_private *dev_priv = request->i915;
-		struct intel_context *ce = to_intel_context(request->ctx,
-							    engine);
+		struct intel_context *ce =
+			to_intel_context(request->gem_context, engine);
 		struct i915_hw_ppgtt *ppgtt;
 
 		if (ce->state) {
@@ -571,7 +571,7 @@ static void reset_ring_common(struct intel_engine_cs *engine,
 				   CCID_EN);
 		}
 
-		ppgtt = request->ctx->ppgtt ?: engine->i915->mm.aliasing_ppgtt;
+		ppgtt = request->gem_context->ppgtt ?: engine->i915->mm.aliasing_ppgtt;
 		if (ppgtt) {
 			u32 pd_offset = ppgtt->pd.base.ggtt_offset << 10;
 
@@ -1441,7 +1441,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 
 	*cs++ = MI_NOOP;
 	*cs++ = MI_SET_CONTEXT;
-	*cs++ = i915_ggtt_offset(to_intel_context(rq->ctx, engine)->state) | flags;
+	*cs++ = i915_ggtt_offset(to_intel_context(rq->gem_context, engine)->state) | flags;
 	/*
 	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
 	 * WaMiSetContext_Hang:snb,ivb,vlv
@@ -1509,7 +1509,7 @@ static int remap_l3(struct i915_request *rq, int slice)
 static int switch_context(struct i915_request *rq)
 {
 	struct intel_engine_cs *engine = rq->engine;
-	struct i915_gem_context *to_ctx = rq->ctx;
+	struct i915_gem_context *to_ctx = rq->gem_context;
 	struct i915_hw_ppgtt *to_mm =
 		to_ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
 	struct i915_gem_context *from_ctx = engine->legacy_active_context;
@@ -1580,7 +1580,7 @@ static int ring_request_alloc(struct i915_request *request)
 {
 	int ret;
 
-	GEM_BUG_ON(!to_intel_context(request->ctx, request->engine)->pin_count);
+	GEM_BUG_ON(!to_intel_context(request->gem_context, request->engine)->pin_count);
 
 	/* Flush enough space to reduce the likelihood of waiting after
 	 * we start building the request - in which case we will just
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index c61bf65454a9..429d36f955c0 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -104,7 +104,10 @@ static int emit_recurse_batch(struct hang *h,
 			      struct i915_request *rq)
 {
 	struct drm_i915_private *i915 = h->i915;
-	struct i915_address_space *vm = rq->ctx->ppgtt ? &rq->ctx->ppgtt->base : &i915->ggtt.base;
+	struct i915_address_space *vm =
+		rq->gem_context->ppgtt ?
+		&rq->gem_context->ppgtt->base :
+		&i915->ggtt.base;
 	struct i915_vma *hws, *vma;
 	unsigned int flags;
 	u32 *batch;
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index ee7e22d18ff8..20279547cb05 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -82,7 +82,7 @@ static int emit_recurse_batch(struct spinner *spin,
 			      struct i915_request *rq,
 			      u32 arbitration_command)
 {
-	struct i915_address_space *vm = &rq->ctx->ppgtt->base;
+	struct i915_address_space *vm = &rq->gem_context->ppgtt->base;
 	struct i915_vma *hws, *vma;
 	u32 *batch;
 	int err;
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 08/71] drm/i915: Move fiddling with engine->last_retired_context
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (5 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 07/71] drm/i915: Move request->ctx aside Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03  6:36 ` [PATCH 09/71] drm/i915: Store a pointer to intel_context in i915_request Chris Wilson
                   ` (44 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

Move the knowledge about resetting the current context tracking on the
engine from inside i915_gem_context.c into intel_engine_cs.c

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 12 ++----------
 drivers/gpu/drm/i915/intel_engine_cs.c  | 23 +++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 33f8a4b3c981..78dc4cb305c2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -514,16 +514,8 @@ void i915_gem_contexts_lost(struct drm_i915_private *dev_priv)
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
-	for_each_engine(engine, dev_priv, id) {
-		engine->legacy_active_context = NULL;
-		engine->legacy_active_ppgtt = NULL;
-
-		if (!engine->last_retired_context)
-			continue;
-
-		intel_context_unpin(engine->last_retired_context, engine);
-		engine->last_retired_context = NULL;
-	}
+	for_each_engine(engine, dev_priv, id)
+		intel_engine_lost_context(engine);
 }
 
 void i915_gem_contexts_fini(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 62dd394fbfcd..a1b85440ce5a 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1076,6 +1076,29 @@ void intel_engines_unpark(struct drm_i915_private *i915)
 	}
 }
 
+/**
+ * intel_engine_lost_context: called when the GPU is reset into unknown state
+ * @engine: the engine
+ *
+ * We have either reset the GPU or otherwise about to lose state tracking of
+ * the current GPU logical state (e.g. suspend). On next use, it is therefore
+ * imperative that we make no presumptions about the current state and load
+ * from scratch.
+ */
+void intel_engine_lost_context(struct intel_engine_cs *engine)
+{
+	struct i915_gem_context *ctx;
+
+	lockdep_assert_held(&engine->i915->drm.struct_mutex);
+
+	engine->legacy_active_context = NULL;
+	engine->legacy_active_ppgtt = NULL;
+
+	ctx = fetch_and_zero(&engine->last_retired_context);
+	if (ctx)
+		intel_context_unpin(ctx, engine);
+}
+
 bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
 {
 	switch (INTEL_GEN(engine->i915)) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 010750e8ee44..c4e56044e34f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -1046,6 +1046,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine);
 bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
 
 bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine);
+void intel_engine_lost_context(struct intel_engine_cs *engine);
 
 void intel_engines_park(struct drm_i915_private *i915);
 void intel_engines_unpark(struct drm_i915_private *i915);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 09/71] drm/i915: Store a pointer to intel_context in i915_request
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (6 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 08/71] drm/i915: Move fiddling with engine->last_retired_context Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-04 10:31   ` Tvrtko Ursulin
  2018-05-03  6:36 ` [PATCH 10/71] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
                   ` (43 subsequent siblings)
  51 siblings, 1 reply; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

To ease the frequent and ugly pointer dance of
&request->gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.

v2: Set mock_context->ops so we don't crash and burn in selftests.
    Cleanups from Tvrtko.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gvt/mmio_context.c       |   6 +-
 drivers/gpu/drm/i915/gvt/mmio_context.h       |   2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c          | 141 +++++++-----------
 drivers/gpu/drm/i915/gvt/scheduler.h          |   1 -
 drivers/gpu/drm/i915/i915_drv.h               |   1 +
 drivers/gpu/drm/i915/i915_gem.c               |  12 +-
 drivers/gpu/drm/i915/i915_gem_context.c       |  17 ++-
 drivers/gpu/drm/i915/i915_gem_context.h       |  21 ++-
 drivers/gpu/drm/i915/i915_gpu_error.c         |   3 +-
 drivers/gpu/drm/i915/i915_perf.c              |  25 ++--
 drivers/gpu/drm/i915/i915_request.c           |  34 ++---
 drivers/gpu/drm/i915/i915_request.h           |   1 +
 drivers/gpu/drm/i915/intel_engine_cs.c        |  54 ++++---
 drivers/gpu/drm/i915/intel_guc_submission.c   |  10 +-
 drivers/gpu/drm/i915/intel_lrc.c              | 118 +++++++++------
 drivers/gpu/drm/i915/intel_lrc.h              |   7 -
 drivers/gpu/drm/i915/intel_ringbuffer.c       | 100 ++++++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h       |   9 +-
 drivers/gpu/drm/i915/selftests/mock_context.c |   7 +
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  41 +++--
 20 files changed, 320 insertions(+), 290 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
index 0f949554d118..708170e61625 100644
--- a/drivers/gpu/drm/i915/gvt/mmio_context.c
+++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
@@ -446,9 +446,9 @@ static void switch_mocs(struct intel_vgpu *pre, struct intel_vgpu *next,
 
 #define CTX_CONTEXT_CONTROL_VAL	0x03
 
-bool is_inhibit_context(struct i915_gem_context *ctx, int ring_id)
+bool is_inhibit_context(struct intel_context *ce)
 {
-	u32 *reg_state = ctx->__engine[ring_id].lrc_reg_state;
+	const u32 *reg_state = ce->lrc_reg_state;
 	u32 inhibit_mask =
 		_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
 
@@ -501,7 +501,7 @@ static void switch_mmio(struct intel_vgpu *pre,
 			 * itself.
 			 */
 			if (mmio->in_context &&
-			    !is_inhibit_context(s->shadow_ctx, ring_id))
+			    !is_inhibit_context(&s->shadow_ctx->__engine[ring_id]))
 				continue;
 
 			if (mmio->mask)
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.h b/drivers/gpu/drm/i915/gvt/mmio_context.h
index 0439eb8057a8..5c3b9ff9f96a 100644
--- a/drivers/gpu/drm/i915/gvt/mmio_context.h
+++ b/drivers/gpu/drm/i915/gvt/mmio_context.h
@@ -49,7 +49,7 @@ void intel_gvt_switch_mmio(struct intel_vgpu *pre,
 
 void intel_gvt_init_engine_mmio_context(struct intel_gvt *gvt);
 
-bool is_inhibit_context(struct i915_gem_context *ctx, int ring_id);
+bool is_inhibit_context(struct intel_context *ce);
 
 int intel_vgpu_restore_inhibit_context(struct intel_vgpu *vgpu,
 				       struct i915_request *req);
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index f409a154491d..d9aa39d28584 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -54,11 +54,8 @@ static void set_context_pdp_root_pointer(
 
 static void update_shadow_pdps(struct intel_vgpu_workload *workload)
 {
-	struct intel_vgpu *vgpu = workload->vgpu;
-	int ring_id = workload->ring_id;
-	struct i915_gem_context *shadow_ctx = vgpu->submission.shadow_ctx;
 	struct drm_i915_gem_object *ctx_obj =
-		shadow_ctx->__engine[ring_id].state->obj;
+		workload->req->hw_context->state->obj;
 	struct execlist_ring_context *shadow_ring_context;
 	struct page *page;
 
@@ -128,9 +125,8 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_gvt *gvt = vgpu->gvt;
 	int ring_id = workload->ring_id;
-	struct i915_gem_context *shadow_ctx = vgpu->submission.shadow_ctx;
 	struct drm_i915_gem_object *ctx_obj =
-		shadow_ctx->__engine[ring_id].state->obj;
+		workload->req->hw_context->state->obj;
 	struct execlist_ring_context *shadow_ring_context;
 	struct page *page;
 	void *dst;
@@ -280,10 +276,8 @@ static int shadow_context_status_change(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
-static void shadow_context_descriptor_update(struct i915_gem_context *ctx,
-		struct intel_engine_cs *engine)
+static void shadow_context_descriptor_update(struct intel_context *ce)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
 	u64 desc = 0;
 
 	desc = ce->lrc_desc;
@@ -292,7 +286,7 @@ static void shadow_context_descriptor_update(struct i915_gem_context *ctx,
 	 * like GEN8_CTX_* cached in desc_template
 	 */
 	desc &= U64_MAX << 12;
-	desc |= ctx->desc_template & ((1ULL << 12) - 1);
+	desc |= ce->gem_context->desc_template & ((1ULL << 12) - 1);
 
 	ce->lrc_desc = desc;
 }
@@ -300,12 +294,11 @@ static void shadow_context_descriptor_update(struct i915_gem_context *ctx,
 static int copy_workload_to_ring_buffer(struct intel_vgpu_workload *workload)
 {
 	struct intel_vgpu *vgpu = workload->vgpu;
+	struct i915_request *req = workload->req;
 	void *shadow_ring_buffer_va;
 	u32 *cs;
-	struct i915_request *req = workload->req;
 
-	if (IS_KABYLAKE(req->i915) &&
-	    is_inhibit_context(req->gem_context, req->engine->id))
+	if (IS_KABYLAKE(req->i915) && is_inhibit_context(req->hw_context))
 		intel_vgpu_restore_inhibit_context(vgpu, req);
 
 	/* allocate shadow ring buffer */
@@ -353,60 +346,56 @@ int intel_gvt_scan_and_shadow_workload(struct intel_vgpu_workload *workload)
 	struct intel_vgpu_submission *s = &vgpu->submission;
 	struct i915_gem_context *shadow_ctx = s->shadow_ctx;
 	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
-	int ring_id = workload->ring_id;
-	struct intel_engine_cs *engine = dev_priv->engine[ring_id];
-	struct intel_ring *ring;
+	struct intel_engine_cs *engine = dev_priv->engine[workload->ring_id];
+	struct intel_context *ce;
 	int ret;
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
-	if (workload->shadowed)
+	if (workload->req)
 		return 0;
 
+	/* pin shadow context by gvt even the shadow context will be pinned
+	 * when i915 alloc request. That is because gvt will update the guest
+	 * context from shadow context when workload is completed, and at that
+	 * moment, i915 may already unpined the shadow context to make the
+	 * shadow_ctx pages invalid. So gvt need to pin itself. After update
+	 * the guest context, gvt can unpin the shadow_ctx safely.
+	 */
+	ce = intel_context_pin(shadow_ctx, engine);
+	if (IS_ERR(ce)) {
+		gvt_vgpu_err("fail to pin shadow context\n");
+		return PTR_ERR(ce);
+	}
+
 	shadow_ctx->desc_template &= ~(0x3 << GEN8_CTX_ADDRESSING_MODE_SHIFT);
 	shadow_ctx->desc_template |= workload->ctx_desc.addressing_mode <<
 				    GEN8_CTX_ADDRESSING_MODE_SHIFT;
 
-	if (!test_and_set_bit(ring_id, s->shadow_ctx_desc_updated))
-		shadow_context_descriptor_update(shadow_ctx,
-					dev_priv->engine[ring_id]);
+	if (!test_and_set_bit(workload->ring_id, s->shadow_ctx_desc_updated))
+		shadow_context_descriptor_update(ce);
 
 	ret = intel_gvt_scan_and_shadow_ringbuffer(workload);
 	if (ret)
-		goto err_scan;
+		goto err_unpin;
 
 	if ((workload->ring_id == RCS) &&
 	    (workload->wa_ctx.indirect_ctx.size != 0)) {
 		ret = intel_gvt_scan_and_shadow_wa_ctx(&workload->wa_ctx);
 		if (ret)
-			goto err_scan;
-	}
-
-	/* pin shadow context by gvt even the shadow context will be pinned
-	 * when i915 alloc request. That is because gvt will update the guest
-	 * context from shadow context when workload is completed, and at that
-	 * moment, i915 may already unpined the shadow context to make the
-	 * shadow_ctx pages invalid. So gvt need to pin itself. After update
-	 * the guest context, gvt can unpin the shadow_ctx safely.
-	 */
-	ring = intel_context_pin(shadow_ctx, engine);
-	if (IS_ERR(ring)) {
-		ret = PTR_ERR(ring);
-		gvt_vgpu_err("fail to pin shadow context\n");
-		goto err_shadow;
+			goto err_shadow;
 	}
 
 	ret = populate_shadow_context(workload);
 	if (ret)
-		goto err_unpin;
-	workload->shadowed = true;
+		goto err_shadow;
+
 	return 0;
 
-err_unpin:
-	intel_context_unpin(shadow_ctx, engine);
 err_shadow:
 	release_shadow_wa_ctx(&workload->wa_ctx);
-err_scan:
+err_unpin:
+	intel_context_unpin(ce);
 	return ret;
 }
 
@@ -414,7 +403,6 @@ static int intel_gvt_generate_request(struct intel_vgpu_workload *workload)
 {
 	int ring_id = workload->ring_id;
 	struct drm_i915_private *dev_priv = workload->vgpu->gvt->dev_priv;
-	struct intel_engine_cs *engine = dev_priv->engine[ring_id];
 	struct i915_request *rq;
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_vgpu_submission *s = &vgpu->submission;
@@ -437,7 +425,6 @@ static int intel_gvt_generate_request(struct intel_vgpu_workload *workload)
 	return 0;
 
 err_unpin:
-	intel_context_unpin(shadow_ctx, engine);
 	release_shadow_wa_ctx(&workload->wa_ctx);
 	return ret;
 }
@@ -517,21 +504,13 @@ static int prepare_shadow_batch_buffer(struct intel_vgpu_workload *workload)
 	return ret;
 }
 
-static int update_wa_ctx_2_shadow_ctx(struct intel_shadow_wa_ctx *wa_ctx)
+static void update_wa_ctx_2_shadow_ctx(struct intel_shadow_wa_ctx *wa_ctx)
 {
-	struct intel_vgpu_workload *workload = container_of(wa_ctx,
-					struct intel_vgpu_workload,
-					wa_ctx);
-	int ring_id = workload->ring_id;
-	struct intel_vgpu_submission *s = &workload->vgpu->submission;
-	struct i915_gem_context *shadow_ctx = s->shadow_ctx;
-	struct drm_i915_gem_object *ctx_obj =
-		shadow_ctx->__engine[ring_id].state->obj;
-	struct execlist_ring_context *shadow_ring_context;
-	struct page *page;
-
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	shadow_ring_context = kmap_atomic(page);
+	struct intel_vgpu_workload *workload =
+		container_of(wa_ctx, struct intel_vgpu_workload, wa_ctx);
+	struct i915_request *rq = workload->req;
+	struct execlist_ring_context *shadow_ring_context =
+		(struct execlist_ring_context *)rq->hw_context->lrc_reg_state;
 
 	shadow_ring_context->bb_per_ctx_ptr.val =
 		(shadow_ring_context->bb_per_ctx_ptr.val &
@@ -539,9 +518,6 @@ static int update_wa_ctx_2_shadow_ctx(struct intel_shadow_wa_ctx *wa_ctx)
 	shadow_ring_context->rcs_indirect_ctx.val =
 		(shadow_ring_context->rcs_indirect_ctx.val &
 		(~INDIRECT_CTX_ADDR_MASK)) | wa_ctx->indirect_ctx.shadow_gma;
-
-	kunmap_atomic(shadow_ring_context);
-	return 0;
 }
 
 static int prepare_shadow_wa_ctx(struct intel_shadow_wa_ctx *wa_ctx)
@@ -670,12 +646,9 @@ static int prepare_workload(struct intel_vgpu_workload *workload)
 static int dispatch_workload(struct intel_vgpu_workload *workload)
 {
 	struct intel_vgpu *vgpu = workload->vgpu;
-	struct intel_vgpu_submission *s = &vgpu->submission;
-	struct i915_gem_context *shadow_ctx = s->shadow_ctx;
 	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
 	int ring_id = workload->ring_id;
-	struct intel_engine_cs *engine = dev_priv->engine[ring_id];
-	int ret = 0;
+	int ret;
 
 	gvt_dbg_sched("ring id %d prepare to dispatch workload %p\n",
 		ring_id, workload);
@@ -687,10 +660,6 @@ static int dispatch_workload(struct intel_vgpu_workload *workload)
 		goto out;
 
 	ret = prepare_workload(workload);
-	if (ret) {
-		intel_context_unpin(shadow_ctx, engine);
-		goto out;
-	}
 
 out:
 	if (ret)
@@ -765,27 +734,23 @@ static struct intel_vgpu_workload *pick_next_workload(
 
 static void update_guest_context(struct intel_vgpu_workload *workload)
 {
+	struct i915_request *rq = workload->req;
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_gvt *gvt = vgpu->gvt;
-	struct intel_vgpu_submission *s = &vgpu->submission;
-	struct i915_gem_context *shadow_ctx = s->shadow_ctx;
-	int ring_id = workload->ring_id;
-	struct drm_i915_gem_object *ctx_obj =
-		shadow_ctx->__engine[ring_id].state->obj;
+	struct drm_i915_gem_object *ctx_obj = rq->hw_context->state->obj;
 	struct execlist_ring_context *shadow_ring_context;
 	struct page *page;
 	void *src;
 	unsigned long context_gpa, context_page_num;
 	int i;
 
-	gvt_dbg_sched("ring id %d workload lrca %x\n", ring_id,
-			workload->ctx_desc.lrca);
-
-	context_page_num = gvt->dev_priv->engine[ring_id]->context_size;
+	gvt_dbg_sched("ring id %d workload lrca %x\n", rq->engine->id,
+		      workload->ctx_desc.lrca);
 
+	context_page_num = rq->engine->context_size;
 	context_page_num = context_page_num >> PAGE_SHIFT;
 
-	if (IS_BROADWELL(gvt->dev_priv) && ring_id == RCS)
+	if (IS_BROADWELL(gvt->dev_priv) && rq->engine->id == RCS)
 		context_page_num = 19;
 
 	i = 2;
@@ -858,6 +823,7 @@ static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
 		scheduler->current_workload[ring_id];
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_vgpu_submission *s = &vgpu->submission;
+	struct i915_request *rq;
 	int event;
 
 	mutex_lock(&gvt->lock);
@@ -866,11 +832,8 @@ static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
 	 * switch to make sure request is completed.
 	 * For the workload w/o request, directly complete the workload.
 	 */
-	if (workload->req) {
-		struct drm_i915_private *dev_priv =
-			workload->vgpu->gvt->dev_priv;
-		struct intel_engine_cs *engine =
-			dev_priv->engine[workload->ring_id];
+	rq = fetch_and_zero(&workload->req);
+	if (rq) {
 		wait_event(workload->shadow_ctx_status_wq,
 			   !atomic_read(&workload->shadow_ctx_active));
 
@@ -886,8 +849,6 @@ static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
 				workload->status = 0;
 		}
 
-		i915_request_put(fetch_and_zero(&workload->req));
-
 		if (!workload->status && !(vgpu->resetting_eng &
 					   ENGINE_MASK(ring_id))) {
 			update_guest_context(workload);
@@ -896,10 +857,13 @@ static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
 					 INTEL_GVT_EVENT_MAX)
 				intel_vgpu_trigger_virtual_event(vgpu, event);
 		}
-		mutex_lock(&dev_priv->drm.struct_mutex);
+
 		/* unpin shadow ctx as the shadow_ctx update is done */
-		intel_context_unpin(s->shadow_ctx, engine);
-		mutex_unlock(&dev_priv->drm.struct_mutex);
+		mutex_lock(&rq->i915->drm.struct_mutex);
+		intel_context_unpin(rq->hw_context);
+		mutex_unlock(&rq->i915->drm.struct_mutex);
+
+		i915_request_put(rq);
 	}
 
 	gvt_dbg_sched("ring id %d complete workload %p status %d\n",
@@ -1273,7 +1237,6 @@ alloc_workload(struct intel_vgpu *vgpu)
 	atomic_set(&workload->shadow_ctx_active, 0);
 
 	workload->status = -EINPROGRESS;
-	workload->shadowed = false;
 	workload->vgpu = vgpu;
 
 	return workload;
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.h b/drivers/gpu/drm/i915/gvt/scheduler.h
index 6c644782193e..21eddab4a9cd 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.h
+++ b/drivers/gpu/drm/i915/gvt/scheduler.h
@@ -83,7 +83,6 @@ struct intel_vgpu_workload {
 	struct i915_request *req;
 	/* if this workload has been dispatched to i915? */
 	bool dispatched;
-	bool shadowed;
 	int status;
 
 	struct intel_vgpu_mm *shadow_mm;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 04e27806e581..9341b725113b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1949,6 +1949,7 @@ struct drm_i915_private {
 			 */
 			struct i915_perf_stream *exclusive_stream;
 
+			struct intel_context *pinned_ctx;
 			u32 specific_ctx_id;
 
 			struct hrtimer poll_check_timer;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ecef2e8e5e93..8a8a77c2ef5f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3229,14 +3229,14 @@ void i915_gem_reset(struct drm_i915_private *dev_priv,
 	i915_retire_requests(dev_priv);
 
 	for_each_engine(engine, dev_priv, id) {
-		struct i915_gem_context *ctx;
+		struct intel_context *ce;
 
 		i915_gem_reset_engine(engine,
 				      engine->hangcheck.active_request,
 				      stalled_mask & ENGINE_MASK(id));
-		ctx = fetch_and_zero(&engine->last_retired_context);
-		if (ctx)
-			intel_context_unpin(ctx, engine);
+		ce = fetch_and_zero(&engine->last_retired_context);
+		if (ce)
+			intel_context_unpin(ce);
 
 		/*
 		 * Ostensibily, we always want a context loaded for powersaving,
@@ -4946,13 +4946,13 @@ void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj)
 
 static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 {
-	struct i915_gem_context *kernel_context = i915->kernel_context;
+	struct i915_gem_context *kctx = i915->kernel_context;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
 	for_each_engine(engine, i915, id) {
 		GEM_BUG_ON(__i915_gem_active_peek(&engine->timeline.last_request));
-		GEM_BUG_ON(engine->last_retired_context != kernel_context);
+		GEM_BUG_ON(engine->last_retired_context->gem_context != kctx);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 78dc4cb305c2..66aad55c5273 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -127,14 +127,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
 		struct intel_context *ce = &ctx->__engine[n];
 
-		if (!ce->state)
-			continue;
-
-		WARN_ON(ce->pin_count);
-		if (ce->ring)
-			intel_ring_free(ce->ring);
-
-		__i915_gem_object_release_unless_active(ce->state->obj);
+		if (ce->ops)
+			ce->ops->destroy(ce);
 	}
 
 	kfree(ctx->name);
@@ -266,6 +260,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
+	unsigned int n;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -283,6 +278,12 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	ctx->i915 = dev_priv;
 	ctx->sched.priority = I915_PRIORITY_NORMAL;
 
+	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
+		struct intel_context *ce = &ctx->__engine[n];
+
+		ce->gem_context = ctx;
+	}
+
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index ace3b129c189..749a4ff566f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -45,6 +45,11 @@ struct intel_ring;
 
 #define DEFAULT_CONTEXT_HANDLE 0
 
+struct intel_context_ops {
+	void (*unpin)(struct intel_context *ce);
+	void (*destroy)(struct intel_context *ce);
+};
+
 /**
  * struct i915_gem_context - client state
  *
@@ -144,11 +149,14 @@ struct i915_gem_context {
 
 	/** engine: per-engine logical HW state */
 	struct intel_context {
+		struct i915_gem_context *gem_context;
 		struct i915_vma *state;
 		struct intel_ring *ring;
 		u32 *lrc_reg_state;
 		u64 lrc_desc;
 		int pin_count;
+
+		const struct intel_context_ops *ops;
 	} __engine[I915_NUM_ENGINES];
 
 	/** ring_size: size for allocating the per-engine ring buffer */
@@ -263,25 +271,22 @@ to_intel_context(struct i915_gem_context *ctx,
 	return &ctx->__engine[engine->id];
 }
 
-static inline struct intel_ring *
+static inline struct intel_context *
 intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
 {
 	return engine->context_pin(engine, ctx);
 }
 
-static inline void __intel_context_pin(struct i915_gem_context *ctx,
-				       const struct intel_engine_cs *engine)
+static inline void __intel_context_pin(struct intel_context *ce)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
-
 	GEM_BUG_ON(!ce->pin_count);
 	ce->pin_count++;
 }
 
-static inline void intel_context_unpin(struct i915_gem_context *ctx,
-				       struct intel_engine_cs *engine)
+static inline void intel_context_unpin(struct intel_context *ce)
 {
-	engine->context_unpin(engine, ctx);
+	GEM_BUG_ON(!ce->ops);
+	ce->ops->unpin(ce);
 }
 
 /* i915_gem_context.c */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 7cc7d3bc731b..145823f0b48e 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1485,8 +1485,7 @@ static void gem_record_rings(struct i915_gpu_state *error)
 
 			ee->ctx =
 				i915_error_object_create(i915,
-							 to_intel_context(ctx,
-									  engine)->state);
+							 request->hw_context->state);
 
 			error->simulated |=
 				i915_gem_context_no_error_capture(ctx);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index d9341415df40..9b580aba7e25 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1221,7 +1221,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 		dev_priv->perf.oa.specific_ctx_id = stream->ctx->hw_id;
 	} else {
 		struct intel_engine_cs *engine = dev_priv->engine[RCS];
-		struct intel_ring *ring;
+		struct intel_context *ce;
 		int ret;
 
 		ret = i915_mutex_lock_interruptible(&dev_priv->drm);
@@ -1234,19 +1234,19 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 		 *
 		 * NB: implied RCS engine...
 		 */
-		ring = intel_context_pin(stream->ctx, engine);
+		ce = intel_context_pin(stream->ctx, engine);
 		mutex_unlock(&dev_priv->drm.struct_mutex);
-		if (IS_ERR(ring))
-			return PTR_ERR(ring);
+		if (IS_ERR(ce))
+			return PTR_ERR(ce);
 
+		dev_priv->perf.oa.pinned_ctx = ce;
 
 		/*
 		 * Explicitly track the ID (instead of calling
 		 * i915_ggtt_offset() on the fly) considering the difference
 		 * with gen8+ and execlists
 		 */
-		dev_priv->perf.oa.specific_ctx_id =
-			i915_ggtt_offset(to_intel_context(stream->ctx, engine)->state);
+		dev_priv->perf.oa.specific_ctx_id = i915_ggtt_offset(ce->state);
 	}
 
 	return 0;
@@ -1262,17 +1262,14 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
 static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
 {
 	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct intel_context *ce;
 
-	if (HAS_LOGICAL_RING_CONTEXTS(dev_priv)) {
-		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
-	} else {
-		struct intel_engine_cs *engine = dev_priv->engine[RCS];
+	dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
 
+	ce = fetch_and_zero(&dev_priv->perf.oa.pinned_ctx);
+	if (ce) {
 		mutex_lock(&dev_priv->drm.struct_mutex);
-
-		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
-		intel_context_unpin(stream->ctx, engine);
-
+		intel_context_unpin(ce);
 		mutex_unlock(&dev_priv->drm.struct_mutex);
 	}
 }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5205707fe03a..e5925fcf6004 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -382,8 +382,8 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
 	 * the subsequent request.
 	 */
 	if (engine->last_retired_context)
-		intel_context_unpin(engine->last_retired_context, engine);
-	engine->last_retired_context = rq->gem_context;
+		intel_context_unpin(engine->last_retired_context);
+	engine->last_retired_context = rq->hw_context;
 }
 
 static void __retire_engine_upto(struct intel_engine_cs *engine,
@@ -455,7 +455,7 @@ static void i915_request_retire(struct i915_request *request)
 
 	/* Retirement decays the ban score as it is a sign of ctx progress */
 	atomic_dec_if_positive(&request->gem_context->ban_score);
-	intel_context_unpin(request->gem_context, request->engine);
+	intel_context_unpin(request->hw_context);
 
 	__retire_engine_upto(request->engine, request);
 
@@ -656,7 +656,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct drm_i915_private *i915 = engine->i915;
 	struct i915_request *rq;
-	struct intel_ring *ring;
+	struct intel_context *ce;
 	int ret;
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -680,22 +680,21 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 * GGTT space, so do this first before we reserve a seqno for
 	 * ourselves.
 	 */
-	ring = intel_context_pin(ctx, engine);
-	if (IS_ERR(ring))
-		return ERR_CAST(ring);
-	GEM_BUG_ON(!ring);
+	ce = intel_context_pin(ctx, engine);
+	if (IS_ERR(ce))
+		return ERR_CAST(ce);
 
 	ret = reserve_gt(i915);
 	if (ret)
 		goto err_unpin;
 
-	ret = intel_ring_wait_for_space(ring, MIN_SPACE_FOR_ADD_REQUEST);
+	ret = intel_ring_wait_for_space(ce->ring, MIN_SPACE_FOR_ADD_REQUEST);
 	if (ret)
 		goto err_unreserve;
 
 	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry(&ring->request_list, typeof(*rq), ring_link);
-	if (!list_is_last(&rq->ring_link, &ring->request_list) &&
+	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
+	if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
 	    i915_request_completed(rq))
 		i915_request_retire(rq);
 
@@ -760,8 +759,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->i915 = i915;
 	rq->engine = engine;
 	rq->gem_context = ctx;
-	rq->ring = ring;
-	rq->timeline = ring->timeline;
+	rq->hw_context = ce;
+	rq->ring = ce->ring;
+	rq->timeline = ce->ring->timeline;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
 
 	spin_lock_init(&rq->lock);
@@ -813,14 +813,14 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		goto err_unwind;
 
 	/* Keep a second pin for the dual retirement along engine and ring */
-	__intel_context_pin(rq->gem_context, engine);
+	__intel_context_pin(ce);
 
 	/* Check that we didn't interrupt ourselves with a new request */
 	GEM_BUG_ON(rq->timeline->seqno != rq->fence.seqno);
 	return rq;
 
 err_unwind:
-	rq->ring->emit = rq->head;
+	ce->ring->emit = rq->head;
 
 	/* Make sure we didn't add ourselves to external state before freeing */
 	GEM_BUG_ON(!list_empty(&rq->active_list));
@@ -831,7 +831,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 err_unreserve:
 	unreserve_gt(i915);
 err_unpin:
-	intel_context_unpin(ctx, engine);
+	intel_context_unpin(ce);
 	return ERR_PTR(ret);
 }
 
@@ -1017,8 +1017,8 @@ i915_request_await_object(struct i915_request *to,
 void __i915_request_add(struct i915_request *request, bool flush_caches)
 {
 	struct intel_engine_cs *engine = request->engine;
-	struct intel_ring *ring = request->ring;
 	struct i915_timeline *timeline = request->timeline;
+	struct intel_ring *ring = request->ring;
 	struct i915_request *prev;
 	u32 *cs;
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index dddecd9ffd0c..1bbbb7a9fa03 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -95,6 +95,7 @@ struct i915_request {
 	 */
 	struct i915_gem_context *gem_context;
 	struct intel_engine_cs *engine;
+	struct intel_context *hw_context;
 	struct intel_ring *ring;
 	struct i915_timeline *timeline;
 	struct intel_signal_node signaling;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index a1b85440ce5a..bddc57ccfa4a 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -656,6 +656,12 @@ static int init_phys_status_page(struct intel_engine_cs *engine)
 	return 0;
 }
 
+static void __intel_context_unpin(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine)
+{
+	intel_context_unpin(to_intel_context(ctx, engine));
+}
+
 /**
  * intel_engines_init_common - initialize cengine state which might require hw access
  * @engine: Engine to initialize.
@@ -669,7 +675,8 @@ static int init_phys_status_page(struct intel_engine_cs *engine)
  */
 int intel_engine_init_common(struct intel_engine_cs *engine)
 {
-	struct intel_ring *ring;
+	struct drm_i915_private *i915 = engine->i915;
+	struct intel_context *ce;
 	int ret;
 
 	engine->set_default_submission(engine);
@@ -681,18 +688,18 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	 * be available. To avoid this we always pin the default
 	 * context.
 	 */
-	ring = intel_context_pin(engine->i915->kernel_context, engine);
-	if (IS_ERR(ring))
-		return PTR_ERR(ring);
+	ce = intel_context_pin(i915->kernel_context, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
 
 	/*
 	 * Similarly the preempt context must always be available so that
 	 * we can interrupt the engine at any time.
 	 */
-	if (engine->i915->preempt_context) {
-		ring = intel_context_pin(engine->i915->preempt_context, engine);
-		if (IS_ERR(ring)) {
-			ret = PTR_ERR(ring);
+	if (i915->preempt_context) {
+		ce = intel_context_pin(i915->preempt_context, engine);
+		if (IS_ERR(ce)) {
+			ret = PTR_ERR(ce);
 			goto err_unpin_kernel;
 		}
 	}
@@ -701,7 +708,7 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	if (ret)
 		goto err_unpin_preempt;
 
-	if (HWS_NEEDS_PHYSICAL(engine->i915))
+	if (HWS_NEEDS_PHYSICAL(i915))
 		ret = init_phys_status_page(engine);
 	else
 		ret = init_status_page(engine);
@@ -713,10 +720,11 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 err_breadcrumbs:
 	intel_engine_fini_breadcrumbs(engine);
 err_unpin_preempt:
-	if (engine->i915->preempt_context)
-		intel_context_unpin(engine->i915->preempt_context, engine);
+	if (i915->preempt_context)
+		__intel_context_unpin(i915->preempt_context, engine);
+
 err_unpin_kernel:
-	intel_context_unpin(engine->i915->kernel_context, engine);
+	__intel_context_unpin(i915->kernel_context, engine);
 	return ret;
 }
 
@@ -729,6 +737,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
+	struct drm_i915_private *i915 = engine->i915;
+
 	intel_engine_cleanup_scratch(engine);
 
 	if (HWS_NEEDS_PHYSICAL(engine->i915))
@@ -743,9 +753,9 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	if (engine->default_state)
 		i915_gem_object_put(engine->default_state);
 
-	if (engine->i915->preempt_context)
-		intel_context_unpin(engine->i915->preempt_context, engine);
-	intel_context_unpin(engine->i915->kernel_context, engine);
+	if (i915->preempt_context)
+		__intel_context_unpin(i915->preempt_context, engine);
+	__intel_context_unpin(i915->kernel_context, engine);
 
 	i915_timeline_fini(&engine->timeline);
 }
@@ -989,8 +999,8 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
  */
 bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 {
-	const struct i915_gem_context * const kernel_context =
-		engine->i915->kernel_context;
+	const struct intel_context *kernel_context =
+		to_intel_context(engine->i915->kernel_context, engine);
 	struct i915_request *rq;
 
 	lockdep_assert_held(&engine->i915->drm.struct_mutex);
@@ -1002,7 +1012,7 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 	 */
 	rq = __i915_gem_active_peek(&engine->timeline.last_request);
 	if (rq)
-		return rq->gem_context == kernel_context;
+		return rq->hw_context == kernel_context;
 	else
 		return engine->last_retired_context == kernel_context;
 }
@@ -1087,16 +1097,16 @@ void intel_engines_unpark(struct drm_i915_private *i915)
  */
 void intel_engine_lost_context(struct intel_engine_cs *engine)
 {
-	struct i915_gem_context *ctx;
+	struct intel_context *ce;
 
 	lockdep_assert_held(&engine->i915->drm.struct_mutex);
 
 	engine->legacy_active_context = NULL;
 	engine->legacy_active_ppgtt = NULL;
 
-	ctx = fetch_and_zero(&engine->last_retired_context);
-	if (ctx)
-		intel_context_unpin(ctx, engine);
+	ce = fetch_and_zero(&engine->last_retired_context);
+	if (ce)
+		intel_context_unpin(ce);
 }
 
 bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index cc7b0c1b5e8c..3d4aaaf74a84 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -513,9 +513,7 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	struct intel_guc_client *client = guc->execbuf_client;
 	struct intel_engine_cs *engine = rq->engine;
-	u32 ctx_desc =
-		lower_32_bits(intel_lr_context_descriptor(rq->gem_context,
-							  engine));
+	u32 ctx_desc = lower_32_bits(rq->hw_context->lrc_desc);
 	u32 ring_tail = intel_ring_set_tail(rq->ring, rq->tail) / sizeof(u64);
 
 	spin_lock(&client->wq_lock);
@@ -553,8 +551,8 @@ static void inject_preempt_context(struct work_struct *work)
 					     preempt_work[engine->id]);
 	struct intel_guc_client *client = guc->preempt_client;
 	struct guc_stage_desc *stage_desc = __get_stage_desc(client);
-	u32 ctx_desc = lower_32_bits(intel_lr_context_descriptor(client->owner,
-								 engine));
+	u32 ctx_desc = lower_32_bits(to_intel_context(client->owner,
+						      engine)->lrc_desc);
 	u32 data[7];
 
 	/*
@@ -710,7 +708,7 @@ static void guc_dequeue(struct intel_engine_cs *engine)
 		struct i915_request *rq, *rn;
 
 		list_for_each_entry_safe(rq, rn, &p->requests, sched.link) {
-			if (last && rq->gem_context != last->gem_context) {
+			if (last && rq->hw_context != last->hw_context) {
 				if (port == last_port) {
 					__list_del_many(&p->requests,
 							&rq->sched.link);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 578cb89b3af7..c29d5f5582c2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,7 +164,8 @@
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
-					    struct intel_engine_cs *engine);
+					    struct intel_engine_cs *engine,
+					    struct intel_context *ce);
 static void execlists_init_reg_state(u32 *reg_state,
 				     struct i915_gem_context *ctx,
 				     struct intel_engine_cs *engine,
@@ -222,9 +223,9 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
  */
 static void
 intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
-				   struct intel_engine_cs *engine)
+				   struct intel_engine_cs *engine,
+				   struct intel_context *ce)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
 	u64 desc;
 
 	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > (BIT(GEN8_CTX_ID_WIDTH)));
@@ -416,8 +417,7 @@ execlists_update_context_pdps(struct i915_hw_ppgtt *ppgtt, u32 *reg_state)
 
 static u64 execlists_update_context(struct i915_request *rq)
 {
-	struct intel_context *ce =
-		to_intel_context(rq->gem_context, rq->engine);
+	struct intel_context *ce = rq->hw_context;
 	struct i915_hw_ppgtt *ppgtt =
 		rq->gem_context->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
 	u32 *reg_state = ce->lrc_reg_state;
@@ -494,14 +494,14 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
 }
 
-static bool ctx_single_port_submission(const struct i915_gem_context *ctx)
+static bool ctx_single_port_submission(const struct intel_context *ce)
 {
 	return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
-		i915_gem_context_force_single_submission(ctx));
+		i915_gem_context_force_single_submission(ce->gem_context));
 }
 
-static bool can_merge_ctx(const struct i915_gem_context *prev,
-			  const struct i915_gem_context *next)
+static bool can_merge_ctx(const struct intel_context *prev,
+			  const struct intel_context *next)
 {
 	if (prev != next)
 		return false;
@@ -669,8 +669,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 * second request, and so we never need to tell the
 			 * hardware about the first.
 			 */
-			if (last && !can_merge_ctx(rq->gem_context,
-						   last->gem_context)) {
+			if (last &&
+			    !can_merge_ctx(rq->hw_context, last->hw_context)) {
 				/*
 				 * If we are on the second port and cannot
 				 * combine this request with the last, then we
@@ -689,14 +689,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 * the same context (even though a different
 				 * request) to the second port.
 				 */
-				if (ctx_single_port_submission(last->gem_context) ||
-				    ctx_single_port_submission(rq->gem_context)) {
+				if (ctx_single_port_submission(last->hw_context) ||
+				    ctx_single_port_submission(rq->hw_context)) {
 					__list_del_many(&p->requests,
 							&rq->sched.link);
 					goto done;
 				}
 
-				GEM_BUG_ON(last->gem_context == rq->gem_context);
+				GEM_BUG_ON(last->hw_context == rq->hw_context);
 
 				if (submit)
 					port_assign(port, last);
@@ -1303,6 +1303,37 @@ static void execlists_schedule(struct i915_request *request,
 	spin_unlock_irq(&engine->timeline.lock);
 }
 
+static void execlists_context_destroy(struct intel_context *ce)
+{
+	GEM_BUG_ON(!ce->state);
+	GEM_BUG_ON(ce->pin_count);
+
+	intel_ring_free(ce->ring);
+	__i915_gem_object_release_unless_active(ce->state->obj);
+}
+
+static void __execlists_context_unpin(struct intel_context *ce)
+{
+	intel_ring_unpin(ce->ring);
+
+	ce->state->obj->pin_global--;
+	i915_gem_object_unpin_map(ce->state->obj);
+	i915_vma_unpin(ce->state);
+
+	i915_gem_context_put(ce->gem_context);
+}
+
+static void execlists_context_unpin(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->gem_context->i915->drm.struct_mutex);
+	GEM_BUG_ON(ce->pin_count == 0);
+
+	if (--ce->pin_count)
+		return;
+
+	__execlists_context_unpin(ce);
+}
+
 static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
 {
 	unsigned int flags;
@@ -1326,21 +1357,15 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
 	return i915_vma_pin(vma, 0, GEN8_LR_CONTEXT_ALIGN, flags);
 }
 
-static struct intel_ring *
-execlists_context_pin(struct intel_engine_cs *engine,
-		      struct i915_gem_context *ctx)
+static struct intel_context *
+__execlists_context_pin(struct intel_engine_cs *engine,
+			struct i915_gem_context *ctx,
+			struct intel_context *ce)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
 	void *vaddr;
 	int ret;
 
-	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-
-	if (likely(ce->pin_count++))
-		goto out;
-	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
-
-	ret = execlists_context_deferred_alloc(ctx, engine);
+	ret = execlists_context_deferred_alloc(ctx, engine, ce);
 	if (ret)
 		goto err;
 	GEM_BUG_ON(!ce->state);
@@ -1359,7 +1384,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
 	if (ret)
 		goto unpin_map;
 
-	intel_lr_context_descriptor_update(ctx, engine);
+	intel_lr_context_descriptor_update(ctx, engine, ce);
 
 	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
 	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
@@ -1368,8 +1393,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
 
 	ce->state->obj->pin_global++;
 	i915_gem_context_get(ctx);
-out:
-	return ce->ring;
+	return ce;
 
 unpin_map:
 	i915_gem_object_unpin_map(ce->state->obj);
@@ -1380,33 +1404,33 @@ execlists_context_pin(struct intel_engine_cs *engine,
 	return ERR_PTR(ret);
 }
 
-static void execlists_context_unpin(struct intel_engine_cs *engine,
-				    struct i915_gem_context *ctx)
+static const struct intel_context_ops execlists_context_ops = {
+	.unpin = execlists_context_unpin,
+	.destroy = execlists_context_destroy,
+};
+
+static struct intel_context *
+execlists_context_pin(struct intel_engine_cs *engine,
+		      struct i915_gem_context *ctx)
 {
 	struct intel_context *ce = to_intel_context(ctx, engine);
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-	GEM_BUG_ON(ce->pin_count == 0);
 
-	if (--ce->pin_count)
-		return;
-
-	intel_ring_unpin(ce->ring);
+	if (likely(ce->pin_count++))
+		return ce;
+	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
 
-	ce->state->obj->pin_global--;
-	i915_gem_object_unpin_map(ce->state->obj);
-	i915_vma_unpin(ce->state);
+	ce->ops = &execlists_context_ops;
 
-	i915_gem_context_put(ctx);
+	return __execlists_context_pin(engine, ctx, ce);
 }
 
 static int execlists_request_alloc(struct i915_request *request)
 {
-	struct intel_context *ce =
-		to_intel_context(request->gem_context, request->engine);
 	int ret;
 
-	GEM_BUG_ON(!ce->pin_count);
+	GEM_BUG_ON(!request->hw_context->pin_count);
 
 	/* Flush enough space to reduce the likelihood of waiting after
 	 * we start building the request - in which case we will just
@@ -1857,7 +1881,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 	 * future request will be after userspace has had the opportunity
 	 * to recreate its own state.
 	 */
-	regs = to_intel_context(request->gem_context, engine)->lrc_reg_state;
+	regs = request->hw_context->lrc_reg_state;
 	if (engine->default_state) {
 		void *defaults;
 
@@ -2216,8 +2240,6 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->reset_hw = reset_common_ring;
 
 	engine->context_pin = execlists_context_pin;
-	engine->context_unpin = execlists_context_unpin;
-
 	engine->request_alloc = execlists_request_alloc;
 
 	engine->emit_flush = gen8_emit_flush;
@@ -2452,7 +2474,7 @@ static void execlists_init_reg_state(u32 *regs,
 	struct drm_i915_private *dev_priv = engine->i915;
 	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt ?: dev_priv->mm.aliasing_ppgtt;
 	u32 base = engine->mmio_base;
-	bool rcs = engine->id == RCS;
+	bool rcs = engine->class == RENDER_CLASS;
 
 	/* A context is actually a big batch buffer with several
 	 * MI_LOAD_REGISTER_IMM commands followed by (reg, value) pairs. The
@@ -2597,10 +2619,10 @@ populate_lr_context(struct i915_gem_context *ctx,
 }
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
-					    struct intel_engine_cs *engine)
+					    struct intel_engine_cs *engine,
+					    struct intel_context *ce)
 {
 	struct drm_i915_gem_object *ctx_obj;
-	struct intel_context *ce = to_intel_context(ctx, engine);
 	struct i915_vma *vma;
 	uint32_t context_size;
 	struct intel_ring *ring;
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 4ec7d8dd13c8..1593194e930c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -104,11 +104,4 @@ struct i915_gem_context;
 
 void intel_lr_context_resume(struct drm_i915_private *dev_priv);
 
-static inline uint64_t
-intel_lr_context_descriptor(struct i915_gem_context *ctx,
-			    struct intel_engine_cs *engine)
-{
-	return to_intel_context(ctx, engine)->lrc_desc;
-}
-
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index fbd23127505d..526ee8302fce 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -558,8 +558,7 @@ static void reset_ring_common(struct intel_engine_cs *engine,
 	 */
 	if (request) {
 		struct drm_i915_private *dev_priv = request->i915;
-		struct intel_context *ce =
-			to_intel_context(request->gem_context, engine);
+		struct intel_context *ce = request->hw_context;
 		struct i915_hw_ppgtt *ppgtt;
 
 		if (ce->state) {
@@ -1169,7 +1168,31 @@ intel_ring_free(struct intel_ring *ring)
 	kfree(ring);
 }
 
-static int context_pin(struct intel_context *ce)
+static void intel_ring_context_destroy(struct intel_context *ce)
+{
+	GEM_BUG_ON(ce->pin_count);
+
+	if (ce->state)
+		__i915_gem_object_release_unless_active(ce->state->obj);
+}
+
+static void intel_ring_context_unpin(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->gem_context->i915->drm.struct_mutex);
+	GEM_BUG_ON(ce->pin_count == 0);
+
+	if (--ce->pin_count)
+		return;
+
+	if (ce->state) {
+		ce->state->obj->pin_global--;
+		i915_vma_unpin(ce->state);
+	}
+
+	i915_gem_context_put(ce->gem_context);
+}
+
+static int __context_pin(struct intel_context *ce)
 {
 	struct i915_vma *vma = ce->state;
 	int ret;
@@ -1258,25 +1281,19 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	return ERR_PTR(err);
 }
 
-static struct intel_ring *
-intel_ring_context_pin(struct intel_engine_cs *engine,
-		       struct i915_gem_context *ctx)
+static struct intel_context *
+__ring_context_pin(struct intel_engine_cs *engine,
+		   struct i915_gem_context *ctx,
+		   struct intel_context *ce)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
-	int ret;
-
-	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-
-	if (likely(ce->pin_count++))
-		goto out;
-	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
+	int err;
 
 	if (!ce->state && engine->context_size) {
 		struct i915_vma *vma;
 
 		vma = alloc_context_vma(engine);
 		if (IS_ERR(vma)) {
-			ret = PTR_ERR(vma);
+			err = PTR_ERR(vma);
 			goto err;
 		}
 
@@ -1284,8 +1301,8 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
 	}
 
 	if (ce->state) {
-		ret = context_pin(ce);
-		if (ret)
+		err = __context_pin(ce);
+		if (err)
 			goto err;
 
 		ce->state->obj->pin_global++;
@@ -1293,32 +1310,37 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
 
 	i915_gem_context_get(ctx);
 
-out:
 	/* One ringbuffer to rule them all */
-	return engine->buffer;
+	GEM_BUG_ON(!engine->buffer);
+	ce->ring = engine->buffer;
+
+	return ce;
 
 err:
 	ce->pin_count = 0;
-	return ERR_PTR(ret);
+	return ERR_PTR(err);
 }
 
-static void intel_ring_context_unpin(struct intel_engine_cs *engine,
-				     struct i915_gem_context *ctx)
+static const struct intel_context_ops ring_context_ops = {
+	.unpin = intel_ring_context_unpin,
+	.destroy = intel_ring_context_destroy,
+};
+
+static struct intel_context *
+intel_ring_context_pin(struct intel_engine_cs *engine,
+		       struct i915_gem_context *ctx)
 {
 	struct intel_context *ce = to_intel_context(ctx, engine);
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-	GEM_BUG_ON(ce->pin_count == 0);
 
-	if (--ce->pin_count)
-		return;
+	if (likely(ce->pin_count++))
+		return ce;
+	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
 
-	if (ce->state) {
-		ce->state->obj->pin_global--;
-		i915_vma_unpin(ce->state);
-	}
+	ce->ops = &ring_context_ops;
 
-	i915_gem_context_put(ctx);
+	return __ring_context_pin(engine, ctx, ce);
 }
 
 static int intel_init_ring_buffer(struct intel_engine_cs *engine)
@@ -1329,10 +1351,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 
 	intel_engine_setup_common(engine);
 
-	err = intel_engine_init_common(engine);
-	if (err)
-		goto err;
-
 	timeline = i915_timeline_create(engine->i915, engine->name);
 	if (IS_ERR(timeline)) {
 		err = PTR_ERR(timeline);
@@ -1354,8 +1372,14 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	GEM_BUG_ON(engine->buffer);
 	engine->buffer = ring;
 
+	err = intel_engine_init_common(engine);
+	if (err)
+		goto err_unpin;
+
 	return 0;
 
+err_unpin:
+	intel_ring_unpin(ring);
 err_ring:
 	intel_ring_free(ring);
 err:
@@ -1441,7 +1465,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 
 	*cs++ = MI_NOOP;
 	*cs++ = MI_SET_CONTEXT;
-	*cs++ = i915_ggtt_offset(to_intel_context(rq->gem_context, engine)->state) | flags;
+	*cs++ = i915_ggtt_offset(rq->hw_context->state) | flags;
 	/*
 	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
 	 * WaMiSetContext_Hang:snb,ivb,vlv
@@ -1532,7 +1556,7 @@ static int switch_context(struct i915_request *rq)
 		hw_flags = MI_FORCE_RESTORE;
 	}
 
-	if (to_intel_context(to_ctx, engine)->state &&
+	if (rq->hw_context->state &&
 	    (to_ctx != from_ctx || hw_flags & MI_FORCE_RESTORE)) {
 		GEM_BUG_ON(engine->id != RCS);
 
@@ -1580,7 +1604,7 @@ static int ring_request_alloc(struct i915_request *request)
 {
 	int ret;
 
-	GEM_BUG_ON(!to_intel_context(request->gem_context, request->engine)->pin_count);
+	GEM_BUG_ON(!request->hw_context->pin_count);
 
 	/* Flush enough space to reduce the likelihood of waiting after
 	 * we start building the request - in which case we will just
@@ -2009,8 +2033,6 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 	engine->reset_hw = reset_ring_common;
 
 	engine->context_pin = intel_ring_context_pin;
-	engine->context_unpin = intel_ring_context_unpin;
-
 	engine->request_alloc = ring_request_alloc;
 
 	engine->emit_breadcrumb = i9xx_emit_breadcrumb;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c4e56044e34f..5e78ee3f5775 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -431,10 +431,9 @@ struct intel_engine_cs {
 
 	void		(*set_default_submission)(struct intel_engine_cs *engine);
 
-	struct intel_ring *(*context_pin)(struct intel_engine_cs *engine,
-					  struct i915_gem_context *ctx);
-	void		(*context_unpin)(struct intel_engine_cs *engine,
-					 struct i915_gem_context *ctx);
+	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
+					     struct i915_gem_context *ctx);
+
 	int		(*request_alloc)(struct i915_request *rq);
 	int		(*init_context)(struct i915_request *rq);
 
@@ -550,7 +549,7 @@ struct intel_engine_cs {
 	 * to the kernel context and trash it as the save may not happen
 	 * before the hardware is powered down.
 	 */
-	struct i915_gem_context *last_retired_context;
+	struct intel_context *last_retired_context;
 
 	/* We track the current MI_SET_CONTEXT in order to eliminate
 	 * redudant context switches. This presumes that requests are not
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 501becc47c0c..8904f1ce64e3 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -30,6 +30,7 @@ mock_context(struct drm_i915_private *i915,
 	     const char *name)
 {
 	struct i915_gem_context *ctx;
+	unsigned int n;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -43,6 +44,12 @@ mock_context(struct drm_i915_private *i915,
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
 
+	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
+		struct intel_context *ce = &ctx->__engine[n];
+
+		ce->gem_context = ctx;
+	}
+
 	ret = ida_simple_get(&i915->contexts.hw_ida,
 			     0, MAX_CONTEXT_HW_ID, GFP_KERNEL);
 	if (ret < 0)
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 26bf29d97007..33eddfc1f8ce 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -72,25 +72,37 @@ static void hw_delay_complete(struct timer_list *t)
 	spin_unlock(&engine->hw_lock);
 }
 
-static struct intel_ring *
-mock_context_pin(struct intel_engine_cs *engine,
-		 struct i915_gem_context *ctx)
+static void mock_context_unpin(struct intel_context *ce)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	if (--ce->pin_count)
+		return;
 
-	if (!ce->pin_count++)
-		i915_gem_context_get(ctx);
+	i915_gem_context_put(ce->gem_context);
+}
 
-	return engine->buffer;
+static void mock_context_destroy(struct intel_context *ce)
+{
+	GEM_BUG_ON(ce->pin_count);
 }
 
-static void mock_context_unpin(struct intel_engine_cs *engine,
-			       struct i915_gem_context *ctx)
+static const struct intel_context_ops mock_context_ops = {
+	.unpin = mock_context_unpin,
+	.destroy = mock_context_destroy,
+};
+
+static struct intel_context *
+mock_context_pin(struct intel_engine_cs *engine,
+		 struct i915_gem_context *ctx)
 {
 	struct intel_context *ce = to_intel_context(ctx, engine);
 
-	if (!--ce->pin_count)
-		i915_gem_context_put(ctx);
+	if (!ce->pin_count++) {
+		i915_gem_context_get(ctx);
+		ce->ring = engine->buffer;
+		ce->ops = &mock_context_ops;
+	}
+
+	return ce;
 }
 
 static int mock_request_alloc(struct i915_request *request)
@@ -185,7 +197,6 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.status_page.page_addr = (void *)(engine + 1);
 
 	engine->base.context_pin = mock_context_pin;
-	engine->base.context_unpin = mock_context_unpin;
 	engine->base.request_alloc = mock_request_alloc;
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_breadcrumb = mock_emit_breadcrumb;
@@ -238,11 +249,13 @@ void mock_engine_free(struct intel_engine_cs *engine)
 {
 	struct mock_engine *mock =
 		container_of(engine, typeof(*mock), base);
+	struct intel_context *ce;
 
 	GEM_BUG_ON(timer_pending(&mock->hw_delay));
 
-	if (engine->last_retired_context)
-		intel_context_unpin(engine->last_retired_context, engine);
+	ce = fetch_and_zero(&engine->last_retired_context);
+	if (ce)
+		intel_context_unpin(ce);
 
 	mock_ring_free(engine->buffer);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 10/71] drm/i915/execlists: Refactor out complete_preempt_context()
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (7 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 09/71] drm/i915: Store a pointer to intel_context in i915_request Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03  6:36 ` [PATCH 11/71] drm/i915: Move engine reset prepare/finish to backends Chris Wilson
                   ` (42 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

As a complement to inject_preempt_context(), follow up with the function
to handle its completion. This will be useful should we wish to extend
the duties of the preempt-context for execlists.

v2: And do the same for the guc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com> #v1
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 26 ++++++++++++++-------
 drivers/gpu/drm/i915/intel_lrc.c            | 23 ++++++++++--------
 2 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 3d4aaaf74a84..9dbbe5b5390b 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -622,6 +622,21 @@ static void wait_for_guc_preempt_report(struct intel_engine_cs *engine)
 	report->report_return_status = INTEL_GUC_REPORT_STATUS_UNKNOWN;
 }
 
+static void complete_preempt_context(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists *execlists = &engine->execlists;
+
+	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
+
+	execlists_cancel_port_requests(execlists);
+	execlists_unwind_incomplete_requests(execlists);
+
+	wait_for_guc_preempt_report(engine);
+	intel_write_status_page(engine, I915_GEM_HWS_PREEMPT_INDEX, 0);
+
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
+}
+
 /**
  * guc_submit() - Submit commands through GuC
  * @engine: engine associated with the commands
@@ -776,15 +791,8 @@ static void guc_submission_tasklet(unsigned long data)
 
 	if (execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT) &&
 	    intel_read_status_page(engine, I915_GEM_HWS_PREEMPT_INDEX) ==
-	    GUC_PREEMPT_FINISHED) {
-		execlists_cancel_port_requests(&engine->execlists);
-		execlists_unwind_incomplete_requests(execlists);
-
-		wait_for_guc_preempt_report(engine);
-
-		execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
-		intel_write_status_page(engine, I915_GEM_HWS_PREEMPT_INDEX, 0);
-	}
+	    GUC_PREEMPT_FINISHED)
+		complete_preempt_context(engine);
 
 	if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT))
 		guc_dequeue(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c29d5f5582c2..6067d1dc06ef 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -551,8 +551,18 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
 	if (execlists->ctrl_reg)
 		writel(EL_CTRL_LOAD, execlists->ctrl_reg);
 
-	execlists_clear_active(&engine->execlists, EXECLISTS_ACTIVE_HWACK);
-	execlists_set_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT);
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
+	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
+}
+
+static void complete_preempt_context(struct intel_engine_execlists *execlists)
+{
+	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
+
+	execlists_cancel_port_requests(execlists);
+	execlists_unwind_incomplete_requests(execlists);
+
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
 static void execlists_dequeue(struct intel_engine_cs *engine)
@@ -1050,14 +1060,7 @@ static void execlists_submission_tasklet(unsigned long data)
 			if (status & GEN8_CTX_STATUS_COMPLETE &&
 			    buf[2*head + 1] == execlists->preempt_complete_status) {
 				GEM_TRACE("%s preempt-idle\n", engine->name);
-
-				execlists_cancel_port_requests(execlists);
-				execlists_unwind_incomplete_requests(execlists);
-
-				GEM_BUG_ON(!execlists_is_active(execlists,
-								EXECLISTS_ACTIVE_PREEMPT));
-				execlists_clear_active(execlists,
-						       EXECLISTS_ACTIVE_PREEMPT);
+				complete_preempt_context(execlists);
 				continue;
 			}
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 11/71] drm/i915: Move engine reset prepare/finish to backends
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (8 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 10/71] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03  6:36 ` [PATCH 12/71] drm/i915: Split execlists/guc reset preparations Chris Wilson
                   ` (41 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

In preparation to more carefully handling incomplete preemption during
reset by execlists, we move the existing code wholesale to the backends
under a couple of new reset vfuncs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 47 +++-----------------
 drivers/gpu/drm/i915/intel_lrc.c        | 59 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 23 ++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h |  9 +++-
 4 files changed, 88 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8a8a77c2ef5f..694471fba777 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3002,7 +3002,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 struct i915_request *
 i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 {
-	struct i915_request *request = NULL;
+	struct i915_request *request;
 
 	/*
 	 * During the reset sequence, we must prevent the engine from
@@ -3025,40 +3025,7 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 */
 	kthread_park(engine->breadcrumbs.signaler);
 
-	/*
-	 * Prevent request submission to the hardware until we have
-	 * completed the reset in i915_gem_reset_finish(). If a request
-	 * is completed by one engine, it may then queue a request
-	 * to a second via its execlists->tasklet *just* as we are
-	 * calling engine->init_hw() and also writing the ELSP.
-	 * Turning off the execlists->tasklet until the reset is over
-	 * prevents the race.
-	 *
-	 * Note that this needs to be a single atomic operation on the
-	 * tasklet (flush existing tasks, prevent new tasks) to prevent
-	 * a race between reset and set-wedged. It is not, so we do the best
-	 * we can atm and make sure we don't lock the machine up in the more
-	 * common case of recursively being called from set-wedged from inside
-	 * i915_reset.
-	 */
-	if (!atomic_read(&engine->execlists.tasklet.count))
-		tasklet_kill(&engine->execlists.tasklet);
-	tasklet_disable(&engine->execlists.tasklet);
-
-	/*
-	 * We're using worker to queue preemption requests from the tasklet in
-	 * GuC submission mode.
-	 * Even though tasklet was disabled, we may still have a worker queued.
-	 * Let's make sure that all workers scheduled before disabling the
-	 * tasklet are completed before continuing with the reset.
-	 */
-	if (engine->i915->guc.preempt_wq)
-		flush_workqueue(engine->i915->guc.preempt_wq);
-
-	if (engine->irq_seqno_barrier)
-		engine->irq_seqno_barrier(engine);
-
-	request = i915_gem_find_active_request(engine);
+	request = engine->reset.prepare(engine);
 	if (request && request->fence.error == -EIO)
 		request = ERR_PTR(-EIO); /* Previous reset failed! */
 
@@ -3209,13 +3176,8 @@ void i915_gem_reset_engine(struct intel_engine_cs *engine,
 	if (request)
 		request = i915_gem_reset_request(engine, request, stalled);
 
-	if (request) {
-		DRM_DEBUG_DRIVER("resetting %s to restart from tail of request 0x%x\n",
-				 engine->name, request->global_seqno);
-	}
-
 	/* Setup the CS to resume from the breadcrumb of the hung request */
-	engine->reset_hw(engine, request);
+	engine->reset.reset(engine, request);
 }
 
 void i915_gem_reset(struct drm_i915_private *dev_priv,
@@ -3263,7 +3225,8 @@ void i915_gem_reset(struct drm_i915_private *dev_priv,
 
 void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
 {
-	tasklet_enable(&engine->execlists.tasklet);
+	engine->reset.finish(engine);
+
 	kthread_unpark(engine->breadcrumbs.signaler);
 
 	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6067d1dc06ef..d23386823d94 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1829,8 +1829,48 @@ static int gen9_init_render_ring(struct intel_engine_cs *engine)
 	return 0;
 }
 
-static void reset_common_ring(struct intel_engine_cs *engine,
-			      struct i915_request *request)
+static struct i915_request *
+execlists_reset_prepare(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	GEM_TRACE("%s\n", engine->name);
+
+	/*
+	 * Prevent request submission to the hardware until we have
+	 * completed the reset in i915_gem_reset_finish(). If a request
+	 * is completed by one engine, it may then queue a request
+	 * to a second via its execlists->tasklet *just* as we are
+	 * calling engine->init_hw() and also writing the ELSP.
+	 * Turning off the execlists->tasklet until the reset is over
+	 * prevents the race.
+	 *
+	 * Note that this needs to be a single atomic operation on the
+	 * tasklet (flush existing tasks, prevent new tasks) to prevent
+	 * a race between reset and set-wedged. It is not, so we do the best
+	 * we can atm and make sure we don't lock the machine up in the more
+	 * common case of recursively being called from set-wedged from inside
+	 * i915_reset.
+	 */
+	if (!atomic_read(&execlists->tasklet.count))
+		tasklet_kill(&execlists->tasklet);
+	tasklet_disable(&execlists->tasklet);
+
+	/*
+	 * We're using worker to queue preemption requests from the tasklet in
+	 * GuC submission mode.
+	 * Even though tasklet was disabled, we may still have a worker queued.
+	 * Let's make sure that all workers scheduled before disabling the
+	 * tasklet are completed before continuing with the reset.
+	 */
+	if (engine->i915->guc.preempt_wq)
+		flush_workqueue(engine->i915->guc.preempt_wq);
+
+	return i915_gem_find_active_request(engine);
+}
+
+static void execlists_reset(struct intel_engine_cs *engine,
+			    struct i915_request *request)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	unsigned long flags;
@@ -1840,6 +1880,9 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 		  engine->name, request ? request->global_seqno : 0,
 		  intel_engine_get_seqno(engine));
 
+	/* The submission tasklet must be disabled, engine->reset.prepare(). */
+	GEM_BUG_ON(!atomic_read(&execlists->tasklet.count));
+
 	/* See execlists_cancel_requests() for the irq/spinlock split. */
 	local_irq_save(flags);
 
@@ -1911,6 +1954,13 @@ static void reset_common_ring(struct intel_engine_cs *engine,
 	unwind_wa_tail(request);
 }
 
+static void execlists_reset_finish(struct intel_engine_cs *engine)
+{
+	tasklet_enable(&engine->execlists.tasklet);
+
+	GEM_TRACE("%s\n", engine->name);
+}
+
 static int intel_logical_ring_emit_pdps(struct i915_request *rq)
 {
 	struct i915_hw_ppgtt *ppgtt = rq->gem_context->ppgtt;
@@ -2240,7 +2290,10 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 {
 	/* Default vfuncs which can be overriden by each engine. */
 	engine->init_hw = gen8_init_common_ring;
-	engine->reset_hw = reset_common_ring;
+
+	engine->reset.prepare = execlists_reset_prepare;
+	engine->reset.reset = execlists_reset;
+	engine->reset.finish = execlists_reset_finish;
 
 	engine->context_pin = execlists_context_pin;
 	engine->request_alloc = execlists_request_alloc;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 526ee8302fce..1e37e1be16c2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -531,9 +531,20 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	return ret;
 }
 
-static void reset_ring_common(struct intel_engine_cs *engine,
-			      struct i915_request *request)
+static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
 {
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+
+	return i915_gem_find_active_request(engine);
+}
+
+static void reset_ring(struct intel_engine_cs *engine,
+		       struct i915_request *request)
+{
+	GEM_TRACE("%s seqno=%x\n",
+		  engine->name, request ? request->global_seqno : 0);
+
 	/*
 	 * RC6 must be prevented until the reset is complete and the engine
 	 * reinitialised. If it occurs in the middle of this sequence, the
@@ -596,6 +607,10 @@ static void reset_ring_common(struct intel_engine_cs *engine,
 	}
 }
 
+static void reset_finish(struct intel_engine_cs *engine)
+{
+}
+
 static int intel_rcs_ctx_init(struct i915_request *rq)
 {
 	int ret;
@@ -2030,7 +2045,9 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 	intel_ring_init_semaphores(dev_priv, engine);
 
 	engine->init_hw = init_ring_common;
-	engine->reset_hw = reset_ring_common;
+	engine->reset.prepare = reset_prepare;
+	engine->reset.reset = reset_ring;
+	engine->reset.finish = reset_finish;
 
 	engine->context_pin = intel_ring_context_pin;
 	engine->request_alloc = ring_request_alloc;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 5e78ee3f5775..567e7ec2cef7 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -423,8 +423,13 @@ struct intel_engine_cs {
 	void		(*irq_disable)(struct intel_engine_cs *engine);
 
 	int		(*init_hw)(struct intel_engine_cs *engine);
-	void		(*reset_hw)(struct intel_engine_cs *engine,
-				    struct i915_request *rq);
+
+	struct {
+		struct i915_request *(*prepare)(struct intel_engine_cs *engine);
+		void (*reset)(struct intel_engine_cs *engine,
+			      struct i915_request *rq);
+		void (*finish)(struct intel_engine_cs *engine);
+	} reset;
 
 	void		(*park)(struct intel_engine_cs *engine);
 	void		(*unpark)(struct intel_engine_cs *engine);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 12/71] drm/i915: Split execlists/guc reset preparations
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (9 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 11/71] drm/i915: Move engine reset prepare/finish to backends Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03  6:36 ` [PATCH 13/71] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
                   ` (40 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will make the execlists reset prepare callback
take into account preemption by flushing the context-switch handler.
This is not applicable to the GuC submission backend, so split the two
into their own backend callbacks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 41 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c            | 11 +-----
 2 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 9dbbe5b5390b..6dda87edb4b6 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -798,6 +798,46 @@ static void guc_submission_tasklet(unsigned long data)
 		guc_dequeue(engine);
 }
 
+static struct i915_request *
+guc_reset_prepare(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+
+	GEM_TRACE("%s\n", engine->name);
+
+	/*
+	 * Prevent request submission to the hardware until we have
+	 * completed the reset in i915_gem_reset_finish(). If a request
+	 * is completed by one engine, it may then queue a request
+	 * to a second via its execlists->tasklet *just* as we are
+	 * calling engine->init_hw() and also writing the ELSP.
+	 * Turning off the execlists->tasklet until the reset is over
+	 * prevents the race.
+	 *
+	 * Note that this needs to be a single atomic operation on the
+	 * tasklet (flush existing tasks, prevent new tasks) to prevent
+	 * a race between reset and set-wedged. It is not, so we do the best
+	 * we can atm and make sure we don't lock the machine up in the more
+	 * common case of recursively being called from set-wedged from inside
+	 * i915_reset.
+	 */
+	if (!atomic_read(&execlists->tasklet.count))
+		tasklet_kill(&execlists->tasklet);
+	tasklet_disable(&execlists->tasklet);
+
+	/*
+	 * We're using worker to queue preemption requests from the tasklet in
+	 * GuC submission mode.
+	 * Even though tasklet was disabled, we may still have a worker queued.
+	 * Let's make sure that all workers scheduled before disabling the
+	 * tasklet are completed before continuing with the reset.
+	 */
+	if (engine->i915->guc.preempt_wq)
+		flush_workqueue(engine->i915->guc.preempt_wq);
+
+	return i915_gem_find_active_request(engine);
+}
+
 /*
  * Everything below here is concerned with setup & teardown, and is
  * therefore not part of the somewhat time-critical batch-submission
@@ -1258,6 +1298,7 @@ int intel_guc_submission_enable(struct intel_guc *guc)
 			&engine->execlists;
 
 		execlists->tasklet.func = guc_submission_tasklet;
+		engine->reset.prepare = guc_reset_prepare;
 		engine->park = guc_submission_park;
 		engine->unpark = guc_submission_unpark;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d23386823d94..5c29991c8183 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1856,16 +1856,6 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 		tasklet_kill(&execlists->tasklet);
 	tasklet_disable(&execlists->tasklet);
 
-	/*
-	 * We're using worker to queue preemption requests from the tasklet in
-	 * GuC submission mode.
-	 * Even though tasklet was disabled, we may still have a worker queued.
-	 * Let's make sure that all workers scheduled before disabling the
-	 * tasklet are completed before continuing with the reset.
-	 */
-	if (engine->i915->guc.preempt_wq)
-		flush_workqueue(engine->i915->guc.preempt_wq);
-
 	return i915_gem_find_active_request(engine);
 }
 
@@ -2270,6 +2260,7 @@ static void execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->cancel_requests = execlists_cancel_requests;
 	engine->schedule = execlists_schedule;
 	engine->execlists.tasklet.func = execlists_submission_tasklet;
+	engine->reset.prepare = execlists_reset_prepare;
 
 	engine->park = NULL;
 	engine->unpark = NULL;
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 13/71] drm/i915/execlists: Flush pending preemption events during reset
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (10 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 12/71] drm/i915: Split execlists/guc reset preparations Chris Wilson
@ 2018-05-03  6:36 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 14/71] drm/i915: Combine tasklet_kill and tasklet_disable Chris Wilson
                   ` (39 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:36 UTC (permalink / raw)
  To: intel-gfx

Catch up with the inflight CSB events, after disabling the tasklet
before deciding which request was truly guilty of hanging the GPU.

v2: Restore checking of use_csb_mmio on every loop, don't forget old
vgpu.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 127 +++++++++++++++++++++----------
 1 file changed, 87 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5c29991c8183..0c7733d6eb01 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -945,34 +945,14 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	local_irq_restore(flags);
 }
 
-/*
- * Check the unread Context Status Buffers and manage the submission of new
- * contexts to the ELSP accordingly.
- */
-static void execlists_submission_tasklet(unsigned long data)
+static void process_csb(struct intel_engine_cs *engine)
 {
-	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
-	struct drm_i915_private *dev_priv = engine->i915;
+	struct drm_i915_private *i915 = engine->i915;
 	bool fw = false;
 
-	/*
-	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
-	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
-	 * not be relinquished until the device is idle (see
-	 * i915_gem_idle_work_handler()). As a precaution, we make sure
-	 * that all ELSP are drained i.e. we have processed the CSB,
-	 * before allowing ourselves to idle and calling intel_runtime_pm_put().
-	 */
-	GEM_BUG_ON(!dev_priv->gt.awake);
-
-	/*
-	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
-	 * imposing the cost of a locked atomic transaction when submitting a
-	 * new request (outside of the context-switch interrupt).
-	 */
-	while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
+	do {
 		/* The HWSP contains a (cacheable) mirror of the CSB */
 		const u32 *buf =
 			&engine->status_page.page_addr[I915_HWS_CSB_BUF0_INDEX];
@@ -980,28 +960,27 @@ static void execlists_submission_tasklet(unsigned long data)
 
 		if (unlikely(execlists->csb_use_mmio)) {
 			buf = (u32 * __force)
-				(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
-			execlists->csb_head = -1; /* force mmio read of CSB ptrs */
+				(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_BUF_LO(engine, 0)));
+			execlists->csb_head = -1; /* force mmio read of CSB */
 		}
 
 		/* Clear before reading to catch new interrupts */
 		clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 		smp_mb__after_atomic();
 
-		if (unlikely(execlists->csb_head == -1)) { /* following a reset */
+		if (unlikely(execlists->csb_head == -1)) { /* after a reset */
 			if (!fw) {
-				intel_uncore_forcewake_get(dev_priv,
-							   execlists->fw_domains);
+				intel_uncore_forcewake_get(i915, execlists->fw_domains);
 				fw = true;
 			}
 
-			head = readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+			head = readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
 			tail = GEN8_CSB_WRITE_PTR(head);
 			head = GEN8_CSB_READ_PTR(head);
 			execlists->csb_head = head;
 		} else {
 			const int write_idx =
-				intel_hws_csb_write_index(dev_priv) -
+				intel_hws_csb_write_index(i915) -
 				I915_HWS_CSB_BUF0_INDEX;
 
 			head = execlists->csb_head;
@@ -1009,8 +988,8 @@ static void execlists_submission_tasklet(unsigned long data)
 		}
 		GEM_TRACE("%s cs-irq head=%d [%d%s], tail=%d [%d%s]\n",
 			  engine->name,
-			  head, GEN8_CSB_READ_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
-			  tail, GEN8_CSB_WRITE_PTR(readl(dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
+			  head, GEN8_CSB_READ_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?",
+			  tail, GEN8_CSB_WRITE_PTR(readl(i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)))), fw ? "" : "?");
 
 		while (head != tail) {
 			struct i915_request *rq;
@@ -1020,7 +999,8 @@ static void execlists_submission_tasklet(unsigned long data)
 			if (++head == GEN8_CSB_ENTRIES)
 				head = 0;
 
-			/* We are flying near dragons again.
+			/*
+			 * We are flying near dragons again.
 			 *
 			 * We hold a reference to the request in execlist_port[]
 			 * but no more than that. We are operating in softirq
@@ -1129,15 +1109,48 @@ static void execlists_submission_tasklet(unsigned long data)
 		if (head != execlists->csb_head) {
 			execlists->csb_head = head;
 			writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
-			       dev_priv->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
+			       i915->regs + i915_mmio_reg_offset(RING_CONTEXT_STATUS_PTR(engine)));
 		}
-	}
+	} while (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
 
-	if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT))
-		execlists_dequeue(engine);
+	if (unlikely(fw))
+		intel_uncore_forcewake_put(i915, execlists->fw_domains);
+}
+
+/*
+ * Check the unread Context Status Buffers and manage the submission of new
+ * contexts to the ELSP accordingly.
+ */
+static void execlists_submission_tasklet(unsigned long data)
+{
+	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
 
-	if (fw)
-		intel_uncore_forcewake_put(dev_priv, execlists->fw_domains);
+	GEM_TRACE("%s awake?=%d, active=%x, irq-posted?=%d\n",
+		  engine->name,
+		  engine->i915->gt.awake,
+		  engine->execlists.active,
+		  test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted));
+
+	/*
+	 * We can skip acquiring intel_runtime_pm_get() here as it was taken
+	 * on our behalf by the request (see i915_gem_mark_busy()) and it will
+	 * not be relinquished until the device is idle (see
+	 * i915_gem_idle_work_handler()). As a precaution, we make sure
+	 * that all ELSP are drained i.e. we have processed the CSB,
+	 * before allowing ourselves to idle and calling intel_runtime_pm_put().
+	 */
+	GEM_BUG_ON(!engine->i915->gt.awake);
+
+	/*
+	 * Prefer doing test_and_clear_bit() as a two stage operation to avoid
+	 * imposing the cost of a locked atomic transaction when submitting a
+	 * new request (outside of the context-switch interrupt).
+	 */
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+		process_csb(engine);
+
+	if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT))
+		execlists_dequeue(engine);
 
 	/* If the engine is now idle, so should be the flag; and vice versa. */
 	GEM_BUG_ON(execlists_is_active(&engine->execlists,
@@ -1833,6 +1846,7 @@ static struct i915_request *
 execlists_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_request *request, *active;
 
 	GEM_TRACE("%s\n", engine->name);
 
@@ -1856,7 +1870,40 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 		tasklet_kill(&execlists->tasklet);
 	tasklet_disable(&execlists->tasklet);
 
-	return i915_gem_find_active_request(engine);
+	/*
+	 * We want to flush the pending context switches, having disabled
+	 * the tasklet above, we can assume exclusive access to the execlists.
+	 * For this allows us to catch up with an inflight preemption event,
+	 * and avoid blaming an innocent request if the stall was due to the
+	 * preemption itself.
+	 */
+	if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted))
+		process_csb(engine);
+
+	/*
+	 * The last active request can then be no later than the last request
+	 * now in ELSP[0]. So search backwards from there, so that if the GPU
+	 * has advanced beyond the last CSB update, it will be pardoned.
+	 */
+	active = NULL;
+	request = port_request(execlists->port);
+	if (request) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&engine->timeline.lock, flags);
+		list_for_each_entry_from_reverse(request,
+						 &engine->timeline.requests,
+						 link) {
+			if (__i915_request_completed(request,
+						     request->global_seqno))
+				break;
+
+			active = request;
+		}
+		spin_unlock_irqrestore(&engine->timeline.lock, flags);
+	}
+
+	return active;
 }
 
 static void execlists_reset(struct intel_engine_cs *engine,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 14/71] drm/i915: Combine tasklet_kill and tasklet_disable
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (11 preceding siblings ...)
  2018-05-03  6:36 ` [PATCH 13/71] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 15/71] drm/i915: Stop parking the signaler around reset Chris Wilson
                   ` (38 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Ideally, we want to atomically flush and disable the tasklet before
resetting the GPU. At present, we rely on being the only part to touch
our tasklet and serialisation of the reset process to ensure that we can
suspend the tasklet from the mix of reset/wedge pathways. In this patch,
we move the tasklet abuse into its own function and tweak it such that
we only do a synchronous operation the first time it is disabled around
the reset. This allows us to avoid the sync inside a softirq context in
subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0c7733d6eb01..a60c3afd0adb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1842,6 +1842,16 @@ static int gen9_init_render_ring(struct intel_engine_cs *engine)
 	return 0;
 }
 
+static void tasklet_kill_and_disable(struct tasklet_struct *t)
+{
+	if (!atomic_read(&t->count))
+		tasklet_kill(t);
+
+	if (atomic_inc_return(&t->count) == 1)
+		tasklet_unlock_wait(t);
+	smp_mb();
+}
+
 static struct i915_request *
 execlists_reset_prepare(struct intel_engine_cs *engine)
 {
@@ -1866,9 +1876,7 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	 * common case of recursively being called from set-wedged from inside
 	 * i915_reset.
 	 */
-	if (!atomic_read(&execlists->tasklet.count))
-		tasklet_kill(&execlists->tasklet);
-	tasklet_disable(&execlists->tasklet);
+	tasklet_kill_and_disable(&execlists->tasklet);
 
 	/*
 	 * We want to flush the pending context switches, having disabled
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 15/71] drm/i915: Stop parking the signaler around reset
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (12 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 14/71] drm/i915: Combine tasklet_kill and tasklet_disable Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 16/71] drm/i915: Be irqsafe inside reset Chris Wilson
                   ` (37 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

We cannot call kthread_park() from softirq context, so let's avoid it
entirely during the reset. We wanted to suspend the signaler so that it
would not mark a request as complete at the same time as we marked it as
being in error. Instead of parking the signaling, stop the engine from
advancing so that the GPU doesn't emit the breadcrumb for our chosen
"guilty" request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 14 --------------
 drivers/gpu/drm/i915/intel_lrc.c        | 21 +++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 18 ++++++++++++++++++
 3 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 694471fba777..022a84ac0c74 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3013,18 +3013,6 @@ i915_gem_reset_prepare_engine(struct intel_engine_cs *engine)
 	 */
 	intel_uncore_forcewake_get(engine->i915, FORCEWAKE_ALL);
 
-	/*
-	 * Prevent the signaler thread from updating the request
-	 * state (by calling dma_fence_signal) as we are processing
-	 * the reset. The write from the GPU of the seqno is
-	 * asynchronous and the signaler thread may see a different
-	 * value to us and declare the request complete, even though
-	 * the reset routine have picked that request as the active
-	 * (incomplete) request. This conflict is not handled
-	 * gracefully!
-	 */
-	kthread_park(engine->breadcrumbs.signaler);
-
 	request = engine->reset.prepare(engine);
 	if (request && request->fence.error == -EIO)
 		request = ERR_PTR(-EIO); /* Previous reset failed! */
@@ -3227,8 +3215,6 @@ void i915_gem_reset_finish_engine(struct intel_engine_cs *engine)
 {
 	engine->reset.finish(engine);
 
-	kthread_unpark(engine->breadcrumbs.signaler);
-
 	intel_uncore_forcewake_put(engine->i915, FORCEWAKE_ALL);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a60c3afd0adb..bcebcd6f2848 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1852,6 +1852,21 @@ static void tasklet_kill_and_disable(struct tasklet_struct *t)
 	smp_mb();
 }
 
+static void set_stop_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	const u32 base = engine->mmio_base;
+	const i915_reg_t mode = RING_MI_MODE(base);
+
+	GEM_TRACE("%s\n", engine->name);
+	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+	if (__intel_wait_for_register_fw(dev_priv,
+					 mode, MODE_IDLE, MODE_IDLE,
+					 1000, 0,
+					 NULL))
+		GEM_TRACE("%s: timed out on STOP_RING -> IDLE\n", engine->name);
+}
+
 static struct i915_request *
 execlists_reset_prepare(struct intel_engine_cs *engine)
 {
@@ -1898,6 +1913,12 @@ execlists_reset_prepare(struct intel_engine_cs *engine)
 	if (request) {
 		unsigned long flags;
 
+		/*
+		 * Prevent the breadcrumb from advancing before we decide
+		 * which request is currently active.
+		 */
+		set_stop_engine(engine);
+
 		spin_lock_irqsave(&engine->timeline.lock, flags);
 		list_for_each_entry_from_reverse(request,
 						 &engine->timeline.requests,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1e37e1be16c2..166901cbfa88 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -531,8 +531,26 @@ static int init_ring_common(struct intel_engine_cs *engine)
 	return ret;
 }
 
+static void set_stop_engine(struct intel_engine_cs *engine)
+{
+	struct drm_i915_private *dev_priv = engine->i915;
+	const u32 base = engine->mmio_base;
+	const i915_reg_t mode = RING_MI_MODE(base);
+
+	I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
+	if (__intel_wait_for_register_fw(dev_priv,
+					 mode, MODE_IDLE, MODE_IDLE,
+					 1000, 0,
+					 NULL))
+		DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
+				 engine->name);
+}
+
 static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
 {
+	if (INTEL_GEN(engine->i915) >= 3)
+		set_stop_engine(engine);
+
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 16/71] drm/i915: Be irqsafe inside reset
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (13 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 15/71] drm/i915: Stop parking the signaler around reset Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 17/71] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
                   ` (36 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

As we want to be able to call i915_reset_engine and co from a softirq or
timer context, we need to be irqsafe at all timers. So we have to forgo
the simple spin_lock_irq for the full spin_lock_irqsave.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 022a84ac0c74..987a7d2a4bca 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3135,15 +3135,17 @@ i915_gem_reset_request(struct intel_engine_cs *engine,
 		 */
 		request = i915_gem_find_active_request(engine);
 		if (request) {
+			unsigned long flags;
+
 			i915_gem_context_mark_innocent(request->gem_context);
 			dma_fence_set_error(&request->fence, -EAGAIN);
 
 			/* Rewind the engine to replay the incomplete rq */
-			spin_lock_irq(&engine->timeline.lock);
+			spin_lock_irqsave(&engine->timeline.lock, flags);
 			request = list_prev_entry(request, link);
 			if (&request->link == &engine->timeline.requests)
 				request = NULL;
-			spin_unlock_irq(&engine->timeline.lock);
+			spin_unlock_irqrestore(&engine->timeline.lock, flags);
 		}
 	}
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 17/71] drm/i915/execlists: Make submission tasklet hardirq safe
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (14 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 16/71] drm/i915: Be irqsafe inside reset Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 18/71] drm/i915/guc: " Chris Wilson
                   ` (35 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Prepare to allow the execlists submission to be run from underneath a
hardirq timer context (and not just the current softirq context) as is
required for fast preemption resets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bcebcd6f2848..3e793badf0f2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -573,6 +573,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		&execlists->port[execlists->port_mask];
 	struct i915_request *last = port_request(port);
 	struct rb_node *rb;
+	unsigned long flags;
 	bool submit = false;
 
 	/* Hardware submission is through 2 ports. Conceptually each port
@@ -596,7 +597,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
-	spin_lock_irq(&engine->timeline.lock);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 	rb = execlists->first;
 	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
 
@@ -757,7 +758,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
 
 unlock:
-	spin_unlock_irq(&engine->timeline.lock);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
 	if (submit) {
 		execlists_user_begin(execlists, execlists->port);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 18/71] drm/i915/guc: Make submission tasklet hardirq safe
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (15 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 17/71] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 19/71] drm/i915: Allow init_breadcrumbs to be used from irq context Chris Wilson
                   ` (34 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Prepare to allow the GuC submission to be run from underneath a
hardirq timer context (and not just the current softirq context) as is
required for fast preemption resets.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 6dda87edb4b6..3aa1b9fe86bc 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -690,10 +690,11 @@ static void guc_dequeue(struct intel_engine_cs *engine)
 	struct i915_request *last = NULL;
 	const struct execlist_port * const last_port =
 		&execlists->port[execlists->port_mask];
+	unsigned long flags;
 	bool submit = false;
 	struct rb_node *rb;
 
-	spin_lock_irq(&engine->timeline.lock);
+	spin_lock_irqsave(&engine->timeline.lock, flags);
 	rb = execlists->first;
 	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
 
@@ -764,7 +765,7 @@ static void guc_dequeue(struct intel_engine_cs *engine)
 	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
 
 unlock:
-	spin_unlock_irq(&engine->timeline.lock);
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
 static void guc_submission_tasklet(unsigned long data)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 19/71] drm/i915: Allow init_breadcrumbs to be used from irq context
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (16 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 18/71] drm/i915/guc: " Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 20/71] drm/i915/execlists: Force preemption via reset on timeout Chris Wilson
                   ` (33 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

In order to support engine reset from irq (timer) context, we need to be
able to re-initialise the breadcrumbs. So we need to promote the plain
spin_lock_irq to a safe spin_lock_irqsave.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 18e643df523e..86a987b8ac66 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -846,8 +846,9 @@ static void cancel_fake_irq(struct intel_engine_cs *engine)
 void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 {
 	struct intel_breadcrumbs *b = &engine->breadcrumbs;
+	unsigned long flags;
 
-	spin_lock_irq(&b->irq_lock);
+	spin_lock_irqsave(&b->irq_lock, flags);
 
 	/*
 	 * Leave the fake_irq timer enabled (if it is running), but clear the
@@ -871,7 +872,7 @@ void intel_engine_reset_breadcrumbs(struct intel_engine_cs *engine)
 	 */
 	clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
 
-	spin_unlock_irq(&b->irq_lock);
+	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
 
 void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 20/71] drm/i915/execlists: Force preemption via reset on timeout
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (17 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 19/71] drm/i915: Allow init_breadcrumbs to be used from irq context Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 21/71] drm/i915/execlists: Try preempt-reset from hardirq timer context Chris Wilson
                   ` (32 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Install a timer when trying to preempt on behalf of an important
context such that if the active context does not honour the preemption
request within the desired timeout, then we reset the GPU to allow the
important context to run.

v2: Install the timer on scheduling the preempt request; long before we
even try to inject preemption into the ELSP, as the tasklet/injection
may itself be blocked.
v3: Update the guc to handle the preemption/tasklet timer.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_engine_cs.c      |  4 +
 drivers/gpu/drm/i915/intel_guc_submission.c |  1 +
 drivers/gpu/drm/i915/intel_lrc.c            | 90 +++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h     |  8 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c  | 65 +++++++++++++++
 5 files changed, 158 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index bddc57ccfa4a..61dcedddb799 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -489,6 +489,9 @@ static void intel_engine_init_execlist(struct intel_engine_cs *engine)
 	execlists->queue_priority = INT_MIN;
 	execlists->queue = RB_ROOT;
 	execlists->first = NULL;
+
+	hrtimer_init(&execlists->preempt_timer,
+		     CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 }
 
 /**
@@ -1041,6 +1044,7 @@ void intel_engines_park(struct drm_i915_private *i915)
 
 	for_each_engine(engine, i915, id) {
 		/* Flush the residual irq tasklets first. */
+		hrtimer_cancel(&engine->execlists.preempt_timer);
 		intel_engine_disarm_breadcrumbs(engine);
 		tasklet_kill(&engine->execlists.tasklet);
 
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 3aa1b9fe86bc..f240a4d2d625 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -751,6 +751,7 @@ static void guc_dequeue(struct intel_engine_cs *engine)
 			kmem_cache_free(engine->i915->priorities, p);
 	}
 done:
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT_TIMEOUT);
 	execlists->queue_priority = rb ? to_priolist(rb)->priority : INT_MIN;
 	execlists->first = rb;
 	if (submit) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3e793badf0f2..0c962ed0150d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -555,6 +555,53 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
 	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
+static enum hrtimer_restart preempt_timeout(struct hrtimer *hrtimer)
+{
+	struct intel_engine_execlists *execlists =
+		container_of(hrtimer, typeof(*execlists), preempt_timer);
+
+	GEM_TRACE("%s active=%x\n",
+		  container_of(execlists,
+			       struct intel_engine_cs,
+			       execlists)->name,
+		  execlists->active);
+
+	if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT_TIMEOUT))
+		return HRTIMER_NORESTART;
+
+	if (GEM_SHOW_DEBUG()) {
+		struct intel_engine_cs *engine =
+			container_of(execlists, typeof(*engine), execlists);
+		struct drm_printer p = drm_debug_printer(__func__);
+
+		intel_engine_dump(engine, &p, "%s\n", engine->name);
+	}
+
+	queue_work(system_highpri_wq, &execlists->preempt_reset);
+
+	return HRTIMER_NORESTART;
+}
+
+static void preempt_reset(struct work_struct *work)
+{
+	struct intel_engine_execlists *execlists =
+		container_of(work, typeof(*execlists), preempt_reset);
+	struct intel_engine_cs *engine =
+		  container_of(execlists, struct intel_engine_cs, execlists);
+
+	GEM_TRACE("%s\n", engine->name);
+
+	tasklet_disable(&execlists->tasklet);
+
+	execlists->tasklet.func(execlists->tasklet.data);
+
+	if (execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT_TIMEOUT))
+		i915_handle_error(engine->i915, BIT(engine->id), 0,
+				  "preemption time out on %s", engine->name);
+
+	tasklet_enable(&execlists->tasklet);
+}
+
 static void complete_preempt_context(struct intel_engine_execlists *execlists)
 {
 	GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT));
@@ -651,7 +698,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		 * priorities of the ports haven't been switch.
 		 */
 		if (port_count(&port[1]))
-			goto unlock;
+			goto clear_preempt_timeout;
 
 		/*
 		 * WaIdleLiteRestore:bdw,skl
@@ -757,6 +804,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	/* We must always keep the beast fed if we have work piled up */
 	GEM_BUG_ON(execlists->first && !port_isset(execlists->port));
 
+clear_preempt_timeout:
+	execlists_clear_active(execlists, EXECLISTS_ACTIVE_PREEMPT_TIMEOUT);
 unlock:
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
@@ -1167,16 +1216,38 @@ static void queue_request(struct intel_engine_cs *engine,
 		      &lookup_priolist(engine, node, prio)->requests);
 }
 
-static void __submit_queue(struct intel_engine_cs *engine, int prio)
+static void __submit_queue(struct intel_engine_cs *engine,
+			   int prio, unsigned int timeout)
 {
-	engine->execlists.queue_priority = prio;
-	tasklet_hi_schedule(&engine->execlists.tasklet);
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+	int old = execlists->queue_priority;
+
+	GEM_TRACE("%s prio=%d (previous=%d)\n", engine->name, prio, old);
+
+	if (unlikely(execlists_is_active(execlists,
+					 EXECLISTS_ACTIVE_PREEMPT_TIMEOUT)))
+		hrtimer_cancel(&execlists->preempt_timer);
+
+	execlists->queue_priority = prio;
+	tasklet_hi_schedule(&execlists->tasklet);
+
+	/* Set a timer to force preemption vs hostile userspace */
+	if (timeout && __execlists_need_preempt(prio, old)) {
+		GEM_TRACE("%s preempt timeout=%uns\n", engine->name, timeout);
+
+		execlists_set_active(execlists,
+				     EXECLISTS_ACTIVE_PREEMPT_TIMEOUT);
+		hrtimer_start(&execlists->preempt_timer,
+			      ns_to_ktime(timeout),
+			      HRTIMER_MODE_REL);
+	}
 }
 
-static void submit_queue(struct intel_engine_cs *engine, int prio)
+static void submit_queue(struct intel_engine_cs *engine,
+			 int prio, unsigned int timeout)
 {
 	if (prio > engine->execlists.queue_priority)
-		__submit_queue(engine, prio);
+		__submit_queue(engine, prio, timeout);
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1188,7 +1259,7 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
-	submit_queue(engine, rq_prio(request));
+	submit_queue(engine, rq_prio(request), 0);
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
@@ -1314,7 +1385,7 @@ static void execlists_schedule(struct i915_request *request,
 
 		if (prio > engine->execlists.queue_priority &&
 		    i915_sw_fence_done(&sched_to_request(node)->submit))
-			__submit_queue(engine, prio);
+			__submit_queue(engine, prio, 0);
 	}
 
 	spin_unlock_irq(&engine->timeline.lock);
@@ -2435,6 +2506,9 @@ logical_ring_setup(struct intel_engine_cs *engine)
 	tasklet_init(&engine->execlists.tasklet,
 		     execlists_submission_tasklet, (unsigned long)engine);
 
+	INIT_WORK(&engine->execlists.preempt_reset, preempt_reset);
+	engine->execlists.preempt_timer.function = preempt_timeout;
+
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 567e7ec2cef7..8693d4d800ad 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -270,8 +270,9 @@ struct intel_engine_execlists {
 	 */
 	unsigned int active;
 #define EXECLISTS_ACTIVE_USER 0
-#define EXECLISTS_ACTIVE_PREEMPT 1
-#define EXECLISTS_ACTIVE_HWACK 2
+#define EXECLISTS_ACTIVE_HWACK 1
+#define EXECLISTS_ACTIVE_PREEMPT 2
+#define EXECLISTS_ACTIVE_PREEMPT_TIMEOUT 3
 
 	/**
 	 * @port_mask: number of execlist ports - 1
@@ -317,6 +318,9 @@ struct intel_engine_execlists {
 	 * @preempt_complete_status: expected CSB upon completing preemption
 	 */
 	u32 preempt_complete_status;
+
+	struct hrtimer preempt_timer;
+	struct work_struct preempt_reset;
 };
 
 #define INTEL_ENGINE_CS_MAX_NAME 8
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 20279547cb05..2aaab6072512 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -498,12 +498,77 @@ static int live_late_preempt(void *arg)
 	goto err_ctx_lo;
 }
 
+static void mark_preemption_hang(struct intel_engine_execlists *execlists)
+{
+	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
+	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT_TIMEOUT);
+}
+
+static int live_preempt_timeout(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_gem_context *ctx;
+	enum intel_engine_id id;
+	struct spinner spin;
+	int err = -ENOMEM;
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	if (spinner_init(&spin, i915))
+		goto err_unlock;
+
+	ctx = kernel_context(i915);
+	if (!ctx)
+		goto err_spin;
+
+	for_each_engine(engine, i915, id) {
+		struct i915_request *rq;
+
+		rq = spinner_create_request(&spin, ctx, engine, MI_NOOP);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto err_ctx;
+		}
+
+		i915_request_add(rq);
+		if (!wait_for_spinner(&spin, rq)) {
+			i915_gem_set_wedged(i915);
+			err = -EIO;
+			goto err_ctx;
+		}
+
+		GEM_TRACE("%s triggering reset\n", engine->name);
+		mark_preemption_hang(&engine->execlists);
+		preempt_reset(&engine->execlists.preempt_reset);
+
+		if (flush_test(i915, I915_WAIT_LOCKED)) {
+			err = -EIO;
+			goto err_ctx;
+		}
+	}
+
+	err = 0;
+err_ctx:
+	kernel_context_close(ctx);
+err_spin:
+	spinner_fini(&spin);
+err_unlock:
+	flush_test(i915, I915_WAIT_LOCKED);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(live_sanitycheck),
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
+		SUBTEST(live_preempt_timeout),
 	};
 	return i915_subtests(tests, i915);
 }
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 21/71] drm/i915/execlists: Try preempt-reset from hardirq timer context
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (18 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 20/71] drm/i915/execlists: Force preemption via reset on timeout Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 22/71] drm/i915/preemption: Select timeout when scheduling Chris Wilson
                   ` (31 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

When circumstances allow, trying resetting the engine directly from the
preemption timeout handler. As this is softirq context, we have to be
careful both not to sleep and not to spin on anything we may be
interrupting (e.g. the submission tasklet).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c           |  35 +++++-
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 122 +++++++++++++++++++++
 2 files changed, 156 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0c962ed0150d..e9b8121cc1c7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -555,6 +555,38 @@ static void inject_preempt_context(struct intel_engine_cs *engine)
 	execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT);
 }
 
+static int try_preempt_reset(struct intel_engine_execlists *execlists)
+{
+	int err = -EBUSY;
+
+	if (tasklet_trylock(&execlists->tasklet)) {
+		struct intel_engine_cs *engine =
+			container_of(execlists, typeof(*engine), execlists);
+		const unsigned int bit = I915_RESET_ENGINE + engine->id;
+		unsigned long *lock = &engine->i915->gpu_error.flags;
+
+		execlists->tasklet.func(execlists->tasklet.data);
+
+		if (!execlists_is_active(execlists,
+					 EXECLISTS_ACTIVE_PREEMPT_TIMEOUT)) {
+			/* Nothing to do; the tasklet was just delayed. */
+			err = 0;
+		} else if (!test_and_set_bit(bit, lock)) {
+			tasklet_disable_nosync(&execlists->tasklet);
+			err = i915_reset_engine(engine,
+						"preemption time out");
+			tasklet_enable(&execlists->tasklet);
+
+			clear_bit(bit, lock);
+			wake_up_bit(lock, bit);
+		}
+
+		tasklet_unlock(&execlists->tasklet);
+	}
+
+	return err;
+}
+
 static enum hrtimer_restart preempt_timeout(struct hrtimer *hrtimer)
 {
 	struct intel_engine_execlists *execlists =
@@ -577,7 +609,8 @@ static enum hrtimer_restart preempt_timeout(struct hrtimer *hrtimer)
 		intel_engine_dump(engine, &p, "%s\n", engine->name);
 	}
 
-	queue_work(system_highpri_wq, &execlists->preempt_reset);
+	if (try_preempt_reset(execlists))
+		queue_work(system_highpri_wq, &execlists->preempt_reset);
 
 	return HRTIMER_NORESTART;
 }
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 2aaab6072512..5ac4bf36aa84 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -562,6 +562,127 @@ static int live_preempt_timeout(void *arg)
 	return err;
 }
 
+static void __softirq_begin(void)
+{
+	local_bh_disable();
+}
+
+static void __softirq_end(void)
+{
+	local_bh_enable();
+}
+
+static void __hardirq_begin(void)
+{
+	local_irq_disable();
+}
+
+static void __hardirq_end(void)
+{
+	local_irq_enable();
+}
+
+static int live_preempt_reset(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_gem_context *ctx;
+	enum intel_engine_id id;
+	struct spinner spin;
+	int err = -ENOMEM;
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	if (spinner_init(&spin, i915))
+		goto err_unlock;
+
+	ctx = kernel_context(i915);
+	if (!ctx)
+		goto err_spin;
+
+	for_each_engine(engine, i915, id) {
+		static const struct {
+			const char *name;
+			void (*critical_section_begin)(void);
+			void (*critical_section_end)(void);
+		} phases[] = {
+			{ "softirq", __softirq_begin, __softirq_end },
+			{ "hardirq", __hardirq_begin, __hardirq_end },
+			{ }
+		};
+		const typeof(*phases) *p;
+
+		for (p = phases; p->name; p++) {
+			struct i915_request *rq;
+
+			rq = spinner_create_request(&spin, ctx, engine,
+						    MI_NOOP);
+			if (IS_ERR(rq)) {
+				err = PTR_ERR(rq);
+				goto err_ctx;
+			}
+
+			i915_request_add(rq);
+			if (!wait_for_spinner(&spin, rq)) {
+				i915_gem_set_wedged(i915);
+				err = -EIO;
+				goto err_ctx;
+			}
+
+			/* Flush to give try_preempt_reset a chance */
+			do {
+				tasklet_schedule(&engine->execlists.tasklet);
+				usleep_range(100, 1000);
+				tasklet_kill(&engine->execlists.tasklet);
+			} while (test_bit(ENGINE_IRQ_EXECLIST,
+					  &engine->irq_posted));
+			GEM_BUG_ON(i915_request_completed(rq));
+
+			GEM_TRACE("%s triggering %s reset\n",
+				  engine->name, p->name);
+			p->critical_section_begin();
+
+			/* Trick execution of the tasklet from within reset */
+			set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+
+			mark_preemption_hang(&engine->execlists);
+			err = try_preempt_reset(&engine->execlists);
+
+			p->critical_section_end();
+			if (err) {
+				pr_err("Preempt softirq reset failed on %s, irq_posted? %d, tasklet state %lx\n",
+				       engine->name,
+				       test_bit(ENGINE_IRQ_EXECLIST,
+						&engine->irq_posted),
+				       engine->execlists.tasklet.state);
+				spinner_end(&spin);
+				i915_gem_set_wedged(i915);
+				goto err_ctx;
+			}
+			GEM_BUG_ON(test_bit(ENGINE_IRQ_EXECLIST,
+					    &engine->irq_posted));
+
+			if (flush_test(i915, I915_WAIT_LOCKED)) {
+				err = -EIO;
+				goto err_ctx;
+			}
+		}
+	}
+
+	err = 0;
+err_ctx:
+	kernel_context_close(ctx);
+err_spin:
+	spinner_fini(&spin);
+err_unlock:
+	flush_test(i915, I915_WAIT_LOCKED);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -569,6 +690,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_preempt_timeout),
+		SUBTEST(live_preempt_reset),
 	};
 	return i915_subtests(tests, i915);
 }
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 22/71] drm/i915/preemption: Select timeout when scheduling
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (19 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 21/71] drm/i915/execlists: Try preempt-reset from hardirq timer context Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 23/71] drm/i915: Use a preemption timeout to enforce interactivity Chris Wilson
                   ` (30 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

The choice of preemption timeout is determined by the context from which
we trigger the preemption, as such allow the caller to specify the
desired timeout.

Effectively the other choice would be to use the shortest timeout along
the dependency chain. However, given that we would have already
triggered preemption for the dependency chain, we can assume that no
preemption along that chain is more important than the current request,
ergo we need only consider the current timeout. Realising this, we can
then pass control of the preemption timeout to the caller for greater
flexibility.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c            |   2 +-
 drivers/gpu/drm/i915/i915_request.c        |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c           |   5 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   6 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 110 ++++++++++++++++++++-
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 987a7d2a4bca..4986c4f1ecf9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -580,7 +580,7 @@ static void __fence_set_priority(struct dma_fence *fence,
 
 	local_bh_disable(); /* RCU serialisation for set-wedged protection */
 	if (engine->schedule)
-		engine->schedule(rq, attr);
+		engine->schedule(rq, attr, 0);
 	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e5925fcf6004..76ee297483b1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1111,7 +1111,7 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 */
 	local_bh_disable();
 	if (engine->schedule)
-		engine->schedule(request, &request->gem_context->sched);
+		engine->schedule(request, &request->gem_context->sched, 0);
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e9b8121cc1c7..8ddb1351c5ce 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1321,7 +1321,8 @@ sched_lock_engine(struct i915_sched_node *node, struct intel_engine_cs *locked)
 }
 
 static void execlists_schedule(struct i915_request *request,
-			       const struct i915_sched_attr *attr)
+			       const struct i915_sched_attr *attr,
+			       unsigned int timeout)
 {
 	struct intel_engine_cs *engine;
 	struct i915_dependency *dep, *p;
@@ -1418,7 +1419,7 @@ static void execlists_schedule(struct i915_request *request,
 
 		if (prio > engine->execlists.queue_priority &&
 		    i915_sw_fence_done(&sched_to_request(node)->submit))
-			__submit_queue(engine, prio, 0);
+			__submit_queue(engine, prio, timeout);
 	}
 
 	spin_unlock_irq(&engine->timeline.lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8693d4d800ad..56be17df3c37 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -467,14 +467,16 @@ struct intel_engine_cs {
 	 */
 	void		(*submit_request)(struct i915_request *rq);
 
-	/* Call when the priority on a request has changed and it and its
+	/*
+	 * Call when the priority on a request has changed and it and its
 	 * dependencies may need rescheduling. Note the request itself may
 	 * not be ready to run!
 	 *
 	 * Called under the struct_mutex.
 	 */
 	void		(*schedule)(struct i915_request *request,
-				    const struct i915_sched_attr *attr);
+				    const struct i915_sched_attr *attr,
+				    unsigned int timeout);
 
 	/*
 	 * Cancel all requests on the hardware, or queued for execution.
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 5ac4bf36aa84..ec906914bee2 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -460,7 +460,7 @@ static int live_late_preempt(void *arg)
 		}
 
 		attr.priority = I915_PRIORITY_MAX;
-		engine->schedule(rq, &attr);
+		engine->schedule(rq, &attr, 0);
 
 		if (!wait_for_spinner(&spin_hi, rq)) {
 			pr_err("High priority context failed to preempt the low priority context\n");
@@ -683,6 +683,113 @@ static int live_preempt_reset(void *arg)
 	return err;
 }
 
+static int live_late_preempt_timeout(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct i915_gem_context *ctx_hi, *ctx_lo;
+	struct spinner spin_hi, spin_lo;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = -ENOMEM;
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	if (spinner_init(&spin_hi, i915))
+		goto err_unlock;
+
+	if (spinner_init(&spin_lo, i915))
+		goto err_spin_hi;
+
+	ctx_hi = kernel_context(i915);
+	if (!ctx_hi)
+		goto err_spin_lo;
+
+	ctx_lo = kernel_context(i915);
+	if (!ctx_lo)
+		goto err_ctx_hi;
+
+	for_each_engine(engine, i915, id) {
+		struct i915_request *rq;
+
+		rq = spinner_create_request(&spin_lo, ctx_lo, engine, MI_NOOP);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		i915_request_add(rq);
+		if (!wait_for_spinner(&spin_lo, rq)) {
+			pr_err("First context failed to start\n");
+			goto err_wedged;
+		}
+
+		rq = spinner_create_request(&spin_hi, ctx_hi, engine, MI_NOOP);
+		if (IS_ERR(rq)) {
+			spinner_end(&spin_lo);
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		i915_request_add(rq);
+		if (wait_for_spinner(&spin_hi, rq)) {
+			pr_err("Second context overtook first?\n");
+			goto err_wedged;
+		}
+
+		GEM_TRACE("%s rescheduling (no timeout)\n", engine->name);
+		engine->schedule(rq, &(struct i915_sched_attr){
+				 .priority = 1,
+				 }, 0);
+
+		if (wait_for_spinner(&spin_hi, rq)) {
+			pr_err("High priority context overtook first without an arbitration point?\n");
+			goto err_wedged;
+		}
+
+		GEM_TRACE("%s rescheduling (with timeout)\n", engine->name);
+		engine->schedule(rq, &(struct i915_sched_attr){
+				 .priority = 2,
+				 }, 10 * 1000 /* 10us */);
+
+		if (!wait_for_spinner(&spin_hi, rq)) {
+			pr_err("High priority context failed to force itself in front of the low priority context\n");
+			GEM_TRACE_DUMP();
+			goto err_wedged;
+		}
+
+		spinner_end(&spin_hi);
+		spinner_end(&spin_lo);
+		if (flush_test(i915, I915_WAIT_LOCKED)) {
+			err = -EIO;
+			goto err_ctx_lo;
+		}
+	}
+
+	err = 0;
+err_ctx_lo:
+	kernel_context_close(ctx_lo);
+err_ctx_hi:
+	kernel_context_close(ctx_hi);
+err_spin_lo:
+	spinner_fini(&spin_lo);
+err_spin_hi:
+	spinner_fini(&spin_hi);
+err_unlock:
+	flush_test(i915, I915_WAIT_LOCKED);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	spinner_end(&spin_hi);
+	spinner_end(&spin_lo);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_ctx_lo;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -691,6 +798,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_preempt_timeout),
 		SUBTEST(live_preempt_reset),
+		SUBTEST(live_late_preempt_timeout),
 	};
 	return i915_subtests(tests, i915);
 }
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 23/71] drm/i915: Use a preemption timeout to enforce interactivity
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (20 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 22/71] drm/i915/preemption: Select timeout when scheduling Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 24/71] drm/i915: Allow user control over preempt timeout on their important context Chris Wilson
                   ` (29 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Use a liberal timeout of 20ms to ensure that the rendering for an
interactive pageflip is started in a timely fashion, and that
user interaction is not blocked by GPU, or CPU, hogs. This is at the cost
of resetting whoever was blocking the preemption, likely leading to that
context/process being banned from submitting future requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h      |  4 +++-
 drivers/gpu/drm/i915/i915_gem.c      | 19 +++++++++++--------
 drivers/gpu/drm/i915/intel_display.c | 17 ++++++++++++++++-
 3 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9341b725113b..1fed1a90b25e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3167,8 +3167,10 @@ int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 			 struct intel_rps_client *rps);
 int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 				  unsigned int flags,
-				  const struct i915_sched_attr *attr);
+				  const struct i915_sched_attr *attr,
+				  unsigned int timeout);
 #define I915_PRIORITY_DISPLAY I915_PRIORITY_MAX
+#define I915_PREEMPTION_TIMEOUT_DISPLAY (100 * 1000 * 1000) /* 100 ms / 10Hz */
 
 int __must_check
 i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4986c4f1ecf9..344e3d98acd5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -567,7 +567,8 @@ i915_gem_object_wait_reservation(struct reservation_object *resv,
 }
 
 static void __fence_set_priority(struct dma_fence *fence,
-				 const struct i915_sched_attr *attr)
+				 const struct i915_sched_attr *attr,
+				 unsigned int timeout)
 {
 	struct i915_request *rq;
 	struct intel_engine_cs *engine;
@@ -580,12 +581,13 @@ static void __fence_set_priority(struct dma_fence *fence,
 
 	local_bh_disable(); /* RCU serialisation for set-wedged protection */
 	if (engine->schedule)
-		engine->schedule(rq, attr, 0);
+		engine->schedule(rq, attr, timeout);
 	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
 }
 
 static void fence_set_priority(struct dma_fence *fence,
-			       const struct i915_sched_attr *attr)
+			       const struct i915_sched_attr *attr,
+			       unsigned int timeout)
 {
 	/* Recurse once into a fence-array */
 	if (dma_fence_is_array(fence)) {
@@ -593,16 +595,17 @@ static void fence_set_priority(struct dma_fence *fence,
 		int i;
 
 		for (i = 0; i < array->num_fences; i++)
-			__fence_set_priority(array->fences[i], attr);
+			__fence_set_priority(array->fences[i], attr, timeout);
 	} else {
-		__fence_set_priority(fence, attr);
+		__fence_set_priority(fence, attr, timeout);
 	}
 }
 
 int
 i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 			      unsigned int flags,
-			      const struct i915_sched_attr *attr)
+			      const struct i915_sched_attr *attr,
+			      unsigned int timeout)
 {
 	struct dma_fence *excl;
 
@@ -617,7 +620,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 			return ret;
 
 		for (i = 0; i < count; i++) {
-			fence_set_priority(shared[i], attr);
+			fence_set_priority(shared[i], attr, timeout);
 			dma_fence_put(shared[i]);
 		}
 
@@ -627,7 +630,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 	}
 
 	if (excl) {
-		fence_set_priority(excl, attr);
+		fence_set_priority(excl, attr, timeout);
 		dma_fence_put(excl);
 	}
 	return 0;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 1087358f6364..d230be4bd587 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12797,7 +12797,8 @@ static void fb_obj_bump_render_priority(struct drm_i915_gem_object *obj)
 		.priority = I915_PRIORITY_DISPLAY,
 	};
 
-	i915_gem_object_wait_priority(obj, 0, &attr);
+	i915_gem_object_wait_priority(obj, 0,
+				      &attr, I915_PREEMPTION_TIMEOUT_DISPLAY);
 }
 
 /**
@@ -12876,6 +12877,20 @@ intel_prepare_plane_fb(struct drm_plane *plane,
 
 	ret = intel_plane_pin_fb(to_intel_plane_state(new_state));
 
+	/*
+	 * Reschedule our dependencies, and ensure we run within a timeout.
+	 *
+	 * Note that if the timeout is exceeded, then whoever was running that
+	 * prevented us from acquiring the GPU is declared rogue and reset. An
+	 * unresponsive process will then be banned in order to preserve
+	 * interactivity. Since this can be seen as a bit heavy-handed, we
+	 * select a timeout for when the dropped frames start to become a
+	 * noticeable nuisance for the user (100 ms, i.e. preemption was
+	 * blocked for more than a few frames). Note, this is only a timeout
+	 * for a delay in preempting the current request in order to run our
+	 * dependency chain, our dependency chain may itself take a long time
+	 * to run to completion before we can present the framebuffer.
+	 */
 	fb_obj_bump_render_priority(obj);
 
 	mutex_unlock(&dev_priv->drm.struct_mutex);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 24/71] drm/i915: Allow user control over preempt timeout on their important context
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (21 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 23/71] drm/i915: Use a preemption timeout to enforce interactivity Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 25/71] drm/i915: Disable preemption and sleeping while using the punit sideband Chris Wilson
                   ` (28 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

One usecase would be to couple in via EGL_NV_context_priority_realtime
in userspace to provide some QoS guarantees in conjunction with setting
the highest priority.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 22 ++++++
 drivers/gpu/drm/i915/i915_gem_context.h    | 13 ++++
 drivers/gpu/drm/i915/i915_request.c        |  7 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  3 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 85 ++++++++++++++++++++++
 include/uapi/drm/i915_drm.h                | 12 +++
 6 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 66aad55c5273..dccae45211d1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -755,6 +755,15 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->value = ctx->sched.priority;
 		break;
+	case I915_CONTEXT_PARAM_PREEMPT_TIMEOUT:
+		if (!(to_i915(dev)->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
+			ret = -ENODEV;
+		else if (args->size)
+			ret = -EINVAL;
+		else
+			args->value = ctx->preempt_timeout;
+		break;
+
 	default:
 		ret = -EINVAL;
 		break;
@@ -830,6 +839,19 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		}
 		break;
 
+	case I915_CONTEXT_PARAM_PREEMPT_TIMEOUT:
+		if (args->size)
+			ret = -EINVAL;
+		else if (args->value > U32_MAX)
+			ret = -EINVAL;
+		else if (!(to_i915(dev)->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
+			ret = -ENODEV;
+		else if (args->value && !capable(CAP_SYS_ADMIN))
+			ret = -EPERM;
+		else
+			ctx->preempt_timeout = args->value;
+		break;
+
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 749a4ff566f5..23c88902bbc3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -144,6 +144,19 @@ struct i915_gem_context {
 
 	struct i915_sched_attr sched;
 
+	/**
+	 * @preempt_timeout: QoS guarantee for the high priority context
+	 *
+	 * Some clients need a guarantee that they will start executing
+	 * within a certain window, even at the expense of others. This entails
+	 * that if a preemption request is not honoured by the active context
+	 * within the timeout, we will reset the GPU to evict the hog and
+	 * run the high priority context instead.
+	 *
+	 * Timeout is stored in nanoseconds.
+	 */
+	u32 preempt_timeout;
+
 	/** ggtt_offset_bias: placement restriction for context objects */
 	u32 ggtt_offset_bias;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 76ee297483b1..17842549177a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1110,8 +1110,11 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * run at the earliest possible convenience.
 	 */
 	local_bh_disable();
-	if (engine->schedule)
-		engine->schedule(request, &request->gem_context->sched, 0);
+	if (engine->schedule) {
+		engine->schedule(request,
+				 &request->gem_context->sched,
+				 request->gem_context->preempt_timeout);
+	}
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8ddb1351c5ce..994e945a2e2d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1292,7 +1292,8 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 
 	queue_request(engine, &request->sched, rq_prio(request));
-	submit_queue(engine, rq_prio(request), 0);
+	submit_queue(engine,
+		     rq_prio(request), request->gem_context->preempt_timeout);
 
 	GEM_BUG_ON(!engine->execlists.first);
 	GEM_BUG_ON(list_empty(&request->sched.link));
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index ec906914bee2..1f30f45d2532 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -790,6 +790,90 @@ static int live_late_preempt_timeout(void *arg)
 	goto err_ctx_lo;
 }
 
+static int live_context_preempt_timeout(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct i915_gem_context *ctx_hi, *ctx_lo;
+	struct spinner spin_hi, spin_lo;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = -ENOMEM;
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	if (spinner_init(&spin_hi, i915))
+		goto err_unlock;
+
+	if (spinner_init(&spin_lo, i915))
+		goto err_spin_hi;
+
+	ctx_hi = kernel_context(i915);
+	if (!ctx_hi)
+		goto err_spin_lo;
+	ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY;
+	ctx_hi->preempt_timeout = 50 * 1000; /* 50us */
+
+	ctx_lo = kernel_context(i915);
+	if (!ctx_lo)
+		goto err_ctx_hi;
+	ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY;
+
+	for_each_engine(engine, i915, id) {
+		struct i915_request *rq;
+
+		rq = spinner_create_request(&spin_lo, ctx_lo, engine, MI_NOOP);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		i915_request_add(rq);
+		if (!wait_for_spinner(&spin_lo, rq)) {
+			i915_gem_set_wedged(i915);
+			err = -EIO;
+			goto err_ctx_lo;
+		}
+
+		rq = spinner_create_request(&spin_hi, ctx_hi, engine, MI_NOOP);
+		if (IS_ERR(rq)) {
+			spinner_end(&spin_lo);
+			err = PTR_ERR(rq);
+			goto err_ctx_lo;
+		}
+
+		i915_request_add(rq);
+		if (!wait_for_spinner(&spin_hi, rq)) {
+			i915_gem_set_wedged(i915);
+			err = -EIO;
+			goto err_ctx_lo;
+		}
+
+		spinner_end(&spin_hi);
+		spinner_end(&spin_lo);
+		if (flush_test(i915, I915_WAIT_LOCKED)) {
+			err = -EIO;
+			goto err_ctx_lo;
+		}
+	}
+
+	err = 0;
+err_ctx_lo:
+	kernel_context_close(ctx_lo);
+err_ctx_hi:
+	kernel_context_close(ctx_hi);
+err_spin_lo:
+	spinner_fini(&spin_lo);
+err_spin_hi:
+	spinner_fini(&spin_hi);
+err_unlock:
+	flush_test(i915, I915_WAIT_LOCKED);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -799,6 +883,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt_timeout),
 		SUBTEST(live_preempt_reset),
 		SUBTEST(live_late_preempt_timeout),
+		SUBTEST(live_context_preempt_timeout),
 	};
 	return i915_subtests(tests, i915);
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..853e0c7e0e85 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,6 +1456,18 @@ struct drm_i915_gem_context_param {
 #define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
 #define   I915_CONTEXT_DEFAULT_PRIORITY		0
 #define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
+
+/*
+ * I915_CONTEXT_PARAM_PREEMPT_TIMEOUT:
+ *
+ * Preemption timeout give in nanoseconds.
+ *
+ * Only allowed for privileged clients (CAP_SYS_ADMIN), this property allows
+ * the preempting context to kick out a GPU hog using a GPU reset if they do
+ * not honour our preemption request in time.
+ */
+#define I915_CONTEXT_PARAM_PREEMPT_TIMEOUT	0x7
+
 	__u64 value;
 };
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 25/71] drm/i915: Disable preemption and sleeping while using the punit sideband
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (22 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 24/71] drm/i915: Allow user control over preempt timeout on their important context Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 26/71] drm/i915: Lift acquiring the vlv punit magic to a common sb-get Chris Wilson
                   ` (27 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: Hans de Goede

While we talk to the punit over its sideband, we need to prevent the cpu
from sleeping in order to prevent a potential machine hang.

Note that by itself, it appears that pm_qos_update_request (via
intel_idle) doesn't provide a sufficient barrier to ensure that all core
are indeed awake (out of Cstate) and that the package is awake. To do so,
we need to supplement the pm_qos with a manual ping on_each_cpu.

v2: Restrict the heavy-weight wakeup to just the ISOF_PORT_PUNIT, there
is insufficient evidence to implicate a wider problem atm. Similarly,
restrict the w/a to Valleyview, as Cherryview doesn't have an angry cadre
of users.

The working theory, courtesy of Ville and Hans, is the issue lies within
the power delivery and so is likely to be unit and board specific and
occurs when both the unit/fw require extra power at the same time as the
cpu package is changing its own power state.

References: https://bugzilla.kernel.org/show_bug.cgi?id=109051
References: https://bugs.freedesktop.org/show_bug.cgi?id=102657
References: https://bugzilla.kernel.org/show_bug.cgi?id=195255
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c       |  6 ++
 drivers/gpu/drm/i915/i915_drv.h       |  1 +
 drivers/gpu/drm/i915/intel_sideband.c | 89 +++++++++++++++++++++------
 3 files changed, 77 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index b7dbeba72dec..4d6a45f20e42 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -913,6 +913,9 @@ static int i915_driver_init_early(struct drm_i915_private *dev_priv,
 	spin_lock_init(&dev_priv->uncore.lock);
 
 	mutex_init(&dev_priv->sb_lock);
+	pm_qos_add_request(&dev_priv->sb_qos,
+			   PM_QOS_CPU_DMA_LATENCY, PM_QOS_DEFAULT_VALUE);
+
 	mutex_init(&dev_priv->modeset_restore_lock);
 	mutex_init(&dev_priv->av_mutex);
 	mutex_init(&dev_priv->wm.wm_mutex);
@@ -965,6 +968,9 @@ static void i915_driver_cleanup_early(struct drm_i915_private *dev_priv)
 	i915_gem_cleanup_early(dev_priv);
 	i915_workqueues_cleanup(dev_priv);
 	i915_engines_cleanup(dev_priv);
+
+	pm_qos_remove_request(&dev_priv->sb_qos);
+	mutex_destroy(&dev_priv->sb_lock);
 }
 
 static int i915_mmio_setup(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1fed1a90b25e..2d88ba8bd2e8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1646,6 +1646,7 @@ struct drm_i915_private {
 
 	/* Sideband mailbox protection */
 	struct mutex sb_lock;
+	struct pm_qos_request sb_qos;
 
 	/** Cached value of IMR to avoid reads in updating the bitfield */
 	union {
diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
index 75c872bb8cc9..d56eda33734e 100644
--- a/drivers/gpu/drm/i915/intel_sideband.c
+++ b/drivers/gpu/drm/i915/intel_sideband.c
@@ -22,6 +22,8 @@
  *
  */
 
+#include <asm/iosf_mbi.h>
+
 #include "i915_drv.h"
 #include "intel_drv.h"
 
@@ -39,18 +41,48 @@
 /* Private register write, double-word addressing, non-posted */
 #define SB_CRWRDA_NP	0x07
 
-static int vlv_sideband_rw(struct drm_i915_private *dev_priv, u32 devfn,
-			   u32 port, u32 opcode, u32 addr, u32 *val)
+static void ping(void *info)
 {
-	u32 cmd, be = 0xf, bar = 0;
-	bool is_read = (opcode == SB_MRD_NP || opcode == SB_CRRDDA_NP);
+}
 
-	cmd = (devfn << IOSF_DEVFN_SHIFT) | (opcode << IOSF_OPCODE_SHIFT) |
-		(port << IOSF_PORT_SHIFT) | (be << IOSF_BYTE_ENABLES_SHIFT) |
-		(bar << IOSF_BAR_SHIFT);
+static void __vlv_punit_get(struct drm_i915_private *dev_priv)
+{
+	iosf_mbi_punit_acquire();
 
-	WARN_ON(!mutex_is_locked(&dev_priv->sb_lock));
+	/*
+	 * Prevent the cpu from sleeping while we use this sideband, otherwise
+	 * the punit may cause a machine hang. The issue appears to be isolated
+	 * with changing the power state of the CPU package while changing
+	 * the power state via the punit, and we have only observed it
+	 * reliably on 4-core Baytail systems suggesting the issue is in the
+	 * power delivery mechanism and likely to be be board/function
+	 * specific. Hence we presume the workaround needs only be applied
+	 * to the Valleyview P-unit and not all sideband communications.
+	 */
+	if (IS_VALLEYVIEW(dev_priv)) {
+		pm_qos_update_request(&dev_priv->sb_qos, 0);
+		on_each_cpu(ping, NULL, 1);
+	}
+}
+
+static void __vlv_punit_put(struct drm_i915_private *dev_priv)
+{
+	if (IS_VALLEYVIEW(dev_priv))
+		pm_qos_update_request(&dev_priv->sb_qos, PM_QOS_DEFAULT_VALUE);
 
+	iosf_mbi_punit_release();
+}
+
+static int vlv_sideband_rw(struct drm_i915_private *dev_priv,
+			   u32 devfn, u32 port, u32 opcode,
+			   u32 addr, u32 *val)
+{
+	const bool is_read = (opcode == SB_MRD_NP || opcode == SB_CRRDDA_NP);
+	int err;
+
+	lockdep_assert_held(&dev_priv->sb_lock);
+
+	/* Flush the previous comms, just in case it failed last time. */
 	if (intel_wait_for_register(dev_priv,
 				    VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0,
 				    5)) {
@@ -59,22 +91,33 @@ static int vlv_sideband_rw(struct drm_i915_private *dev_priv, u32 devfn,
 		return -EAGAIN;
 	}
 
-	I915_WRITE(VLV_IOSF_ADDR, addr);
-	I915_WRITE(VLV_IOSF_DATA, is_read ? 0 : *val);
-	I915_WRITE(VLV_IOSF_DOORBELL_REQ, cmd);
-
-	if (intel_wait_for_register(dev_priv,
-				    VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0,
-				    5)) {
+	preempt_disable();
+
+	I915_WRITE_FW(VLV_IOSF_ADDR, addr);
+	I915_WRITE_FW(VLV_IOSF_DATA, is_read ? 0 : *val);
+	I915_WRITE_FW(VLV_IOSF_DOORBELL_REQ,
+		      (devfn << IOSF_DEVFN_SHIFT) |
+		      (opcode << IOSF_OPCODE_SHIFT) |
+		      (port << IOSF_PORT_SHIFT) |
+		      (0xf << IOSF_BYTE_ENABLES_SHIFT) |
+		      (0 << IOSF_BAR_SHIFT) |
+		      IOSF_SB_BUSY);
+
+	if (__intel_wait_for_register_fw(dev_priv,
+					 VLV_IOSF_DOORBELL_REQ, IOSF_SB_BUSY, 0,
+					 10000, 0, NULL) == 0) {
+		if (is_read)
+			*val = I915_READ_FW(VLV_IOSF_DATA);
+		err = 0;
+	} else {
 		DRM_DEBUG_DRIVER("IOSF sideband finish wait (%s) timed out\n",
 				 is_read ? "read" : "write");
-		return -ETIMEDOUT;
+		err = -ETIMEDOUT;
 	}
 
-	if (is_read)
-		*val = I915_READ(VLV_IOSF_DATA);
+	preempt_enable();
 
-	return 0;
+	return err;
 }
 
 u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr)
@@ -84,8 +127,12 @@ u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr)
 	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
 
 	mutex_lock(&dev_priv->sb_lock);
+	__vlv_punit_get(dev_priv);
+
 	vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
 			SB_CRRDDA_NP, addr, &val);
+
+	__vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->sb_lock);
 
 	return val;
@@ -98,8 +145,12 @@ int vlv_punit_write(struct drm_i915_private *dev_priv, u32 addr, u32 val)
 	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
 
 	mutex_lock(&dev_priv->sb_lock);
+	__vlv_punit_get(dev_priv);
+
 	err = vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
 			      SB_CRWRDA_NP, addr, &val);
+
+	__vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->sb_lock);
 
 	return err;
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 26/71] drm/i915: Lift acquiring the vlv punit magic to a common sb-get
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (23 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 25/71] drm/i915: Disable preemption and sleeping while using the punit sideband Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 27/71] drm/i915: Lift sideband locking for vlv_punit_(read|write) Chris Wilson
                   ` (26 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

As we now employ a very heavy pm_qos around the punit access, we want to
minimise the number of synchronous requests by performing one for the
whole punit sequence rather than around individual accesses. The
sideband lock is used for this, so push the pm_qos into the sideband
lock acquisition and release, moving it from the lowlevel punit rw
routine to the callers. In the first step, we move the punit magic into
the common sideband lock so that we can acquire a bunch of ports
simultaneously, and if need be extend the workaround protection later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |  44 ++++++++-
 drivers/gpu/drm/i915/intel_cdclk.c      |   6 +-
 drivers/gpu/drm/i915/intel_display.c    |  37 ++++----
 drivers/gpu/drm/i915/intel_dp.c         |   4 +-
 drivers/gpu/drm/i915/intel_dpio_phy.c   |  37 ++++----
 drivers/gpu/drm/i915/intel_dsi.c        |   8 +-
 drivers/gpu/drm/i915/intel_dsi_pll.c    |  14 +--
 drivers/gpu/drm/i915/intel_dsi_vbt.c    |   8 +-
 drivers/gpu/drm/i915/intel_hdmi.c       |   4 +-
 drivers/gpu/drm/i915/intel_pm.c         |   4 +-
 drivers/gpu/drm/i915/intel_runtime_pm.c |   8 +-
 drivers/gpu/drm/i915/intel_sideband.c   | 115 ++++++++++++++++++++----
 12 files changed, 207 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2d88ba8bd2e8..513a8e69e13e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3462,25 +3462,61 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
 		      u32 reply_mask, u32 reply, int timeout_base_ms);
 
 /* intel_sideband.c */
+
+enum {
+	VLV_IOSF_SB_BUNIT,
+	VLV_IOSF_SB_CCK,
+	VLV_IOSF_SB_CCU,
+	VLV_IOSF_SB_DPIO,
+	VLV_IOSF_SB_FLISDSI,
+	VLV_IOSF_SB_GPIO,
+	VLV_IOSF_SB_NC,
+	VLV_IOSF_SB_PUNIT,
+};
+
+void vlv_iosf_sb_get(struct drm_i915_private *dev_priv, unsigned long ports);
+u32 vlv_iosf_sb_read(struct drm_i915_private *dev_priv, u8 port, u32 reg);
+void vlv_iosf_sb_write(struct drm_i915_private *dev_priv, u8 port, u32 reg, u32 val);
+void vlv_iosf_sb_put(struct drm_i915_private *dev_priv, unsigned long ports);
+
+void vlv_punit_get(struct drm_i915_private *dev_priv);
 u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr);
 int vlv_punit_write(struct drm_i915_private *dev_priv, u32 addr, u32 val);
+void vlv_punit_put(struct drm_i915_private *dev_priv);
+
+void vlv_nc_get(struct drm_i915_private *dev_priv);
 u32 vlv_nc_read(struct drm_i915_private *dev_priv, u8 addr);
-u32 vlv_iosf_sb_read(struct drm_i915_private *dev_priv, u8 port, u32 reg);
-void vlv_iosf_sb_write(struct drm_i915_private *dev_priv, u8 port, u32 reg, u32 val);
+void vlv_nc_put(struct drm_i915_private *dev_priv);
+
+void vlv_cck_get(struct drm_i915_private *dev_priv);
 u32 vlv_cck_read(struct drm_i915_private *dev_priv, u32 reg);
 void vlv_cck_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
+void vlv_cck_put(struct drm_i915_private *dev_priv);
+
+void vlv_ccu_get(struct drm_i915_private *dev_priv);
 u32 vlv_ccu_read(struct drm_i915_private *dev_priv, u32 reg);
 void vlv_ccu_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
+void vlv_ccu_put(struct drm_i915_private *dev_priv);
+
+void vlv_bunit_get(struct drm_i915_private *dev_priv);
 u32 vlv_bunit_read(struct drm_i915_private *dev_priv, u32 reg);
 void vlv_bunit_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
+void vlv_bunit_put(struct drm_i915_private *dev_priv);
+
+void vlv_dpio_get(struct drm_i915_private *dev_priv);
 u32 vlv_dpio_read(struct drm_i915_private *dev_priv, enum pipe pipe, int reg);
 void vlv_dpio_write(struct drm_i915_private *dev_priv, enum pipe pipe, int reg, u32 val);
+void vlv_dpio_put(struct drm_i915_private *dev_priv);
+
+void vlv_flisdsi_get(struct drm_i915_private *dev_priv);
+u32 vlv_flisdsi_read(struct drm_i915_private *dev_priv, u32 reg);
+void vlv_flisdsi_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
+void vlv_flisdsi_put(struct drm_i915_private *dev_priv);
+
 u32 intel_sbi_read(struct drm_i915_private *dev_priv, u16 reg,
 		   enum intel_sbi_destination destination);
 void intel_sbi_write(struct drm_i915_private *dev_priv, u16 reg, u32 value,
 		     enum intel_sbi_destination destination);
-u32 vlv_flisdsi_read(struct drm_i915_private *dev_priv, u32 reg);
-void vlv_flisdsi_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
 
 /* intel_dpio_phy.c */
 void bxt_port_to_phy_channel(struct drm_i915_private *dev_priv, enum port port,
diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c
index 32d24c69da3c..dc680ec9383d 100644
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -552,7 +552,8 @@ static void vlv_set_cdclk(struct drm_i915_private *dev_priv,
 	}
 	mutex_unlock(&dev_priv->pcu_lock);
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_iosf_sb_get(dev_priv,
+			BIT(VLV_IOSF_SB_CCK) | BIT(VLV_IOSF_SB_BUNIT));
 
 	if (cdclk == 400000) {
 		u32 divider;
@@ -586,7 +587,8 @@ static void vlv_set_cdclk(struct drm_i915_private *dev_priv,
 		val |= 3000 / 250; /* 3.0 usec */
 	vlv_bunit_write(dev_priv, BUNIT_REG_BISOC, val);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_iosf_sb_put(dev_priv,
+			BIT(VLV_IOSF_SB_CCK) | BIT(VLV_IOSF_SB_BUNIT));
 
 	intel_update_cdclk(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index d230be4bd587..f0d77c818544 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -165,10 +165,10 @@ int vlv_get_hpll_vco(struct drm_i915_private *dev_priv)
 	int hpll_freq, vco_freq[] = { 800, 1600, 2000, 2400 };
 
 	/* Obtain SKU information */
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_cck_get(dev_priv);
 	hpll_freq = vlv_cck_read(dev_priv, CCK_FUSE_REG) &
 		CCK_FUSE_HPLL_FREQ_MASK;
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_cck_put(dev_priv);
 
 	return vco_freq[hpll_freq] * 1000;
 }
@@ -179,9 +179,9 @@ int vlv_get_cck_clock(struct drm_i915_private *dev_priv,
 	u32 val;
 	int divider;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_cck_get(dev_priv);
 	val = vlv_cck_read(dev_priv, reg);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_cck_put(dev_priv);
 
 	divider = val & CCK_FREQUENCY_VALUES;
 
@@ -1093,9 +1093,9 @@ void assert_dsi_pll(struct drm_i915_private *dev_priv, bool state)
 	u32 val;
 	bool cur_state;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_cck_get(dev_priv);
 	val = vlv_cck_read(dev_priv, CCK_REG_DSI_PLL_CONTROL);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_cck_put(dev_priv);
 
 	cur_state = val & DSI_PLL_VCO_EN;
 	I915_STATE_WARN(cur_state != state,
@@ -1443,14 +1443,14 @@ static void _chv_enable_pll(struct intel_crtc *crtc,
 	enum dpio_channel port = vlv_pipe_to_channel(pipe);
 	u32 tmp;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* Enable back the 10bit clock to display controller */
 	tmp = vlv_dpio_read(dev_priv, pipe, CHV_CMN_DW14(port));
 	tmp |= DPIO_DCLKP_EN;
 	vlv_dpio_write(dev_priv, pipe, CHV_CMN_DW14(port), tmp);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 
 	/*
 	 * Need to wait > 100ns between dclkp clock enable bit and PLL enable.
@@ -1635,14 +1635,14 @@ static void chv_disable_pll(struct drm_i915_private *dev_priv, enum pipe pipe)
 	I915_WRITE(DPLL(pipe), val);
 	POSTING_READ(DPLL(pipe));
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* Disable 10bit clock to display controller */
 	val = vlv_dpio_read(dev_priv, pipe, CHV_CMN_DW14(port));
 	val &= ~DPIO_DCLKP_EN;
 	vlv_dpio_write(dev_priv, pipe, CHV_CMN_DW14(port), val);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 }
 
 void vlv_wait_port_ready(struct drm_i915_private *dev_priv,
@@ -6803,7 +6803,7 @@ static void vlv_prepare_pll(struct intel_crtc *crtc,
 	if ((pipe_config->dpll_hw_state.dpll & DPLL_VCO_ENABLE) == 0)
 		return;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	bestn = pipe_config->dpll.n;
 	bestm1 = pipe_config->dpll.m1;
@@ -6880,7 +6880,8 @@ static void vlv_prepare_pll(struct intel_crtc *crtc,
 	vlv_dpio_write(dev_priv, pipe, VLV_PLL_DW7(pipe), coreclk);
 
 	vlv_dpio_write(dev_priv, pipe, VLV_PLL_DW11(pipe), 0x87871000);
-	mutex_unlock(&dev_priv->sb_lock);
+
+	vlv_dpio_put(dev_priv);
 }
 
 static void chv_prepare_pll(struct intel_crtc *crtc,
@@ -6913,7 +6914,7 @@ static void chv_prepare_pll(struct intel_crtc *crtc,
 	dpio_val = 0;
 	loopfilter = 0;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* p1 and p2 divider */
 	vlv_dpio_write(dev_priv, pipe, CHV_CMN_DW13(port),
@@ -6985,7 +6986,7 @@ static void chv_prepare_pll(struct intel_crtc *crtc,
 			vlv_dpio_read(dev_priv, pipe, CHV_CMN_DW14(port)) |
 			DPIO_AFC_RECAL);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 }
 
 /**
@@ -7587,9 +7588,9 @@ static void vlv_crtc_clock_get(struct intel_crtc *crtc,
 	if ((pipe_config->dpll_hw_state.dpll & DPLL_VCO_ENABLE) == 0)
 		return;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 	mdiv = vlv_dpio_read(dev_priv, pipe, VLV_PLL_DW3(pipe));
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 
 	clock.m1 = (mdiv >> DPIO_M1DIV_SHIFT) & 7;
 	clock.m2 = mdiv & DPIO_M2DIV_MASK;
@@ -7689,13 +7690,13 @@ static void chv_crtc_clock_get(struct intel_crtc *crtc,
 	if ((pipe_config->dpll_hw_state.dpll & DPLL_VCO_ENABLE) == 0)
 		return;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 	cmn_dw13 = vlv_dpio_read(dev_priv, pipe, CHV_CMN_DW13(port));
 	pll_dw0 = vlv_dpio_read(dev_priv, pipe, CHV_PLL_DW0(port));
 	pll_dw1 = vlv_dpio_read(dev_priv, pipe, CHV_PLL_DW1(port));
 	pll_dw2 = vlv_dpio_read(dev_priv, pipe, CHV_PLL_DW2(port));
 	pll_dw3 = vlv_dpio_read(dev_priv, pipe, CHV_PLL_DW3(port));
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 
 	clock.m1 = (pll_dw1 & 0x7) == DPIO_CHV_M1_DIV_BY_2 ? 2 : 0;
 	clock.m2 = (pll_dw0 & 0xff) << 22;
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 83da50b13d81..3737b18bb209 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -2881,12 +2881,12 @@ static void chv_post_disable_dp(struct intel_encoder *encoder,
 
 	intel_dp_link_down(encoder, old_crtc_state);
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* Assert data lane reset */
 	chv_data_lane_soft_reset(encoder, old_crtc_state, true);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/intel_dpio_phy.c b/drivers/gpu/drm/i915/intel_dpio_phy.c
index 00b3ab656b06..79c449aabc7f 100644
--- a/drivers/gpu/drm/i915/intel_dpio_phy.c
+++ b/drivers/gpu/drm/i915/intel_dpio_phy.c
@@ -646,7 +646,7 @@ void chv_set_phy_signal_level(struct intel_encoder *encoder,
 	u32 val;
 	int i;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* Clear calc init */
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW10(ch));
@@ -727,8 +727,7 @@ void chv_set_phy_signal_level(struct intel_encoder *encoder,
 		vlv_dpio_write(dev_priv, pipe, VLV_PCS23_DW10(ch), val);
 	}
 
-	mutex_unlock(&dev_priv->sb_lock);
-
+	vlv_dpio_put(dev_priv);
 }
 
 void chv_data_lane_soft_reset(struct intel_encoder *encoder,
@@ -798,7 +797,7 @@ void chv_phy_pre_pll_enable(struct intel_encoder *encoder,
 
 	chv_phy_powergate_lanes(encoder, true, lane_mask);
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* Assert data lane reset */
 	chv_data_lane_soft_reset(encoder, crtc_state, true);
@@ -853,7 +852,7 @@ void chv_phy_pre_pll_enable(struct intel_encoder *encoder,
 		val |= CHV_CMN_USEDCLKCHANNEL;
 	vlv_dpio_write(dev_priv, pipe, CHV_CMN_DW19(ch), val);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 }
 
 void chv_phy_pre_encoder_enable(struct intel_encoder *encoder,
@@ -868,7 +867,7 @@ void chv_phy_pre_encoder_enable(struct intel_encoder *encoder,
 	int data, i, stagger;
 	u32 val;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* allow hardware to manage TX FIFO reset source */
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW11(ch));
@@ -933,7 +932,7 @@ void chv_phy_pre_encoder_enable(struct intel_encoder *encoder,
 	/* Deassert data lane reset */
 	chv_data_lane_soft_reset(encoder, crtc_state, false);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 }
 
 void chv_phy_release_cl2_override(struct intel_encoder *encoder)
@@ -954,7 +953,7 @@ void chv_phy_post_pll_disable(struct intel_encoder *encoder,
 	enum pipe pipe = to_intel_crtc(old_crtc_state->base.crtc)->pipe;
 	u32 val;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* disable left/right clock distribution */
 	if (pipe != PIPE_B) {
@@ -967,7 +966,7 @@ void chv_phy_post_pll_disable(struct intel_encoder *encoder,
 		vlv_dpio_write(dev_priv, pipe, _CHV_CMN_DW1_CH1, val);
 	}
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 
 	/*
 	 * Leave the power down bit cleared for at least one
@@ -991,7 +990,8 @@ void vlv_set_phy_signal_level(struct intel_encoder *encoder,
 	enum dpio_channel port = vlv_dport_to_channel(dport);
 	enum pipe pipe = intel_crtc->pipe;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
+
 	vlv_dpio_write(dev_priv, pipe, VLV_TX_DW5(port), 0x00000000);
 	vlv_dpio_write(dev_priv, pipe, VLV_TX_DW4(port), demph_reg_value);
 	vlv_dpio_write(dev_priv, pipe, VLV_TX_DW2(port),
@@ -1004,7 +1004,8 @@ void vlv_set_phy_signal_level(struct intel_encoder *encoder,
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS_DW11(port), 0x00030000);
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS_DW9(port), preemph_reg_value);
 	vlv_dpio_write(dev_priv, pipe, VLV_TX_DW5(port), DPIO_TX_OCALINIT_EN);
-	mutex_unlock(&dev_priv->sb_lock);
+
+	vlv_dpio_put(dev_priv);
 }
 
 void vlv_phy_pre_pll_enable(struct intel_encoder *encoder,
@@ -1017,7 +1018,8 @@ void vlv_phy_pre_pll_enable(struct intel_encoder *encoder,
 	enum pipe pipe = crtc->pipe;
 
 	/* Program Tx lane resets to default */
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
+
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS_DW0(port),
 			 DPIO_PCS_TX_LANE2_RESET |
 			 DPIO_PCS_TX_LANE1_RESET);
@@ -1031,7 +1033,8 @@ void vlv_phy_pre_pll_enable(struct intel_encoder *encoder,
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS_DW12(port), 0x00750f00);
 	vlv_dpio_write(dev_priv, pipe, VLV_TX_DW11(port), 0x00001500);
 	vlv_dpio_write(dev_priv, pipe, VLV_TX_DW14(port), 0x40400000);
-	mutex_unlock(&dev_priv->sb_lock);
+
+	vlv_dpio_put(dev_priv);
 }
 
 void vlv_phy_pre_encoder_enable(struct intel_encoder *encoder,
@@ -1045,7 +1048,7 @@ void vlv_phy_pre_encoder_enable(struct intel_encoder *encoder,
 	enum pipe pipe = crtc->pipe;
 	u32 val;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* Enable clock channels for this port */
 	val = vlv_dpio_read(dev_priv, pipe, VLV_PCS01_DW8(port));
@@ -1061,7 +1064,7 @@ void vlv_phy_pre_encoder_enable(struct intel_encoder *encoder,
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS_DW14(port), 0x00760018);
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS_DW23(port), 0x00400888);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 }
 
 void vlv_phy_reset_lanes(struct intel_encoder *encoder,
@@ -1073,8 +1076,8 @@ void vlv_phy_reset_lanes(struct intel_encoder *encoder,
 	enum dpio_channel port = vlv_dport_to_channel(dport);
 	enum pipe pipe = crtc->pipe;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS_DW0(port), 0x00000000);
 	vlv_dpio_write(dev_priv, pipe, VLV_PCS_DW1(port), 0x00e00060);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 }
diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index 51a1d6868b1e..355aa8717af2 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -278,7 +278,7 @@ static int dpi_send_cmd(struct intel_dsi *intel_dsi, u32 cmd, bool hs,
 
 static void band_gap_reset(struct drm_i915_private *dev_priv)
 {
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_flisdsi_get(dev_priv);
 
 	vlv_flisdsi_write(dev_priv, 0x08, 0x0001);
 	vlv_flisdsi_write(dev_priv, 0x0F, 0x0005);
@@ -287,7 +287,7 @@ static void band_gap_reset(struct drm_i915_private *dev_priv)
 	vlv_flisdsi_write(dev_priv, 0x0F, 0x0000);
 	vlv_flisdsi_write(dev_priv, 0x08, 0x0000);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_flisdsi_put(dev_priv);
 }
 
 static inline bool is_vid_mode(struct intel_dsi *intel_dsi)
@@ -509,11 +509,11 @@ static void vlv_dsi_device_ready(struct intel_encoder *encoder)
 
 	DRM_DEBUG_KMS("\n");
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_flisdsi_get(dev_priv);
 	/* program rcomp for compliance, reduce from 50 ohms to 45 ohms
 	 * needed everytime after power gate */
 	vlv_flisdsi_write(dev_priv, 0x04, 0x0004);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_flisdsi_put(dev_priv);
 
 	/* bandgap reset is needed after everytime we do power gate */
 	band_gap_reset(dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_dsi_pll.c b/drivers/gpu/drm/i915/intel_dsi_pll.c
index 2ff2ee7f3b78..b73336e7dcd2 100644
--- a/drivers/gpu/drm/i915/intel_dsi_pll.c
+++ b/drivers/gpu/drm/i915/intel_dsi_pll.c
@@ -149,7 +149,7 @@ static void vlv_enable_dsi_pll(struct intel_encoder *encoder,
 
 	DRM_DEBUG_KMS("\n");
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_cck_get(dev_priv);
 
 	vlv_cck_write(dev_priv, CCK_REG_DSI_PLL_CONTROL, 0);
 	vlv_cck_write(dev_priv, CCK_REG_DSI_PLL_DIVIDER, config->dsi_pll.div);
@@ -166,11 +166,11 @@ static void vlv_enable_dsi_pll(struct intel_encoder *encoder,
 	if (wait_for(vlv_cck_read(dev_priv, CCK_REG_DSI_PLL_CONTROL) &
 						DSI_PLL_LOCK, 20)) {
 
-		mutex_unlock(&dev_priv->sb_lock);
+		vlv_cck_put(dev_priv);
 		DRM_ERROR("DSI PLL lock failed\n");
 		return;
 	}
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_cck_put(dev_priv);
 
 	DRM_DEBUG_KMS("DSI PLL locked\n");
 }
@@ -182,14 +182,14 @@ static void vlv_disable_dsi_pll(struct intel_encoder *encoder)
 
 	DRM_DEBUG_KMS("\n");
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_cck_get(dev_priv);
 
 	tmp = vlv_cck_read(dev_priv, CCK_REG_DSI_PLL_CONTROL);
 	tmp &= ~DSI_PLL_VCO_EN;
 	tmp |= DSI_PLL_LDO_GATE;
 	vlv_cck_write(dev_priv, CCK_REG_DSI_PLL_CONTROL, tmp);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_cck_put(dev_priv);
 }
 
 static bool bxt_dsi_pll_is_enabled(struct drm_i915_private *dev_priv)
@@ -274,10 +274,10 @@ static u32 vlv_dsi_get_pclk(struct intel_encoder *encoder, int pipe_bpp,
 
 	DRM_DEBUG_KMS("\n");
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_cck_get(dev_priv);
 	pll_ctl = vlv_cck_read(dev_priv, CCK_REG_DSI_PLL_CONTROL);
 	pll_div = vlv_cck_read(dev_priv, CCK_REG_DSI_PLL_DIVIDER);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_cck_put(dev_priv);
 
 	config->dsi_pll.ctrl = pll_ctl & ~DSI_PLL_LOCK;
 	config->dsi_pll.div = pll_div;
diff --git a/drivers/gpu/drm/i915/intel_dsi_vbt.c b/drivers/gpu/drm/i915/intel_dsi_vbt.c
index 4d6ffa7b3e7b..515ab68f319c 100644
--- a/drivers/gpu/drm/i915/intel_dsi_vbt.c
+++ b/drivers/gpu/drm/i915/intel_dsi_vbt.c
@@ -234,7 +234,7 @@ static void vlv_exec_gpio(struct drm_i915_private *dev_priv,
 	pconf0 = VLV_GPIO_PCONF0(map->base_offset);
 	padval = VLV_GPIO_PAD_VAL(map->base_offset);
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_GPIO));
 	if (!map->init) {
 		/* FIXME: remove constant below */
 		vlv_iosf_sb_write(dev_priv, port, pconf0, 0x2000CC00);
@@ -243,7 +243,7 @@ static void vlv_exec_gpio(struct drm_i915_private *dev_priv,
 
 	tmp = 0x4 | value;
 	vlv_iosf_sb_write(dev_priv, port, padval, tmp);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_GPIO));
 }
 
 static void chv_exec_gpio(struct drm_i915_private *dev_priv,
@@ -289,12 +289,12 @@ static void chv_exec_gpio(struct drm_i915_private *dev_priv,
 	cfg0 = CHV_GPIO_PAD_CFG0(family_num, gpio_index);
 	cfg1 = CHV_GPIO_PAD_CFG1(family_num, gpio_index);
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_GPIO));
 	vlv_iosf_sb_write(dev_priv, port, cfg1, 0);
 	vlv_iosf_sb_write(dev_priv, port, cfg0,
 			  CHV_GPIO_GPIOEN | CHV_GPIO_GPIOCFG_GPO |
 			  CHV_GPIO_GPIOTXSTATE(value));
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_GPIO));
 }
 
 static void bxt_exec_gpio(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c
index ee929f31f7db..4257209c75f3 100644
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -1995,12 +1995,12 @@ static void chv_hdmi_post_disable(struct intel_encoder *encoder,
 	struct drm_device *dev = encoder->base.dev;
 	struct drm_i915_private *dev_priv = to_i915(dev);
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* Assert data lane reset */
 	chv_data_lane_soft_reset(encoder, old_crtc_state, true);
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 }
 
 static void chv_hdmi_pre_enable(struct intel_encoder *encoder,
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 4126132eb707..565390383c2a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7421,9 +7421,9 @@ static void cherryview_init_gt_powersave(struct drm_i915_private *dev_priv)
 
 	vlv_init_gpll_ref_freq(dev_priv);
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_cck_get(dev_priv);
 	val = vlv_cck_read(dev_priv, CCK_FUSE_REG);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_cck_put(dev_priv);
 
 	switch ((val >> 2) & 0x7) {
 	case 3:
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 3fffbfe4521d..8de3ec9409b9 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -1191,7 +1191,7 @@ static void chv_dpio_cmn_power_well_enable(struct drm_i915_private *dev_priv,
 				    1))
 		DRM_ERROR("Display PHY %d is not power up\n", phy);
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 
 	/* Enable dynamic power down */
 	tmp = vlv_dpio_read(dev_priv, pipe, CHV_CMN_DW28);
@@ -1214,7 +1214,7 @@ static void chv_dpio_cmn_power_well_enable(struct drm_i915_private *dev_priv,
 		vlv_dpio_write(dev_priv, pipe, CHV_CMN_DW30, tmp);
 	}
 
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 
 	dev_priv->chv_phy_control |= PHY_COM_LANE_RESET_DEASSERT(phy);
 	I915_WRITE(DISPLAY_PHY_CONTROL, dev_priv->chv_phy_control);
@@ -1277,9 +1277,9 @@ static void assert_chv_phy_powergate(struct drm_i915_private *dev_priv, enum dpi
 	else
 		reg = _CHV_CMN_DW6_CH1;
 
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_dpio_get(dev_priv);
 	val = vlv_dpio_read(dev_priv, pipe, reg);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_dpio_put(dev_priv);
 
 	/*
 	 * This assumes !override is only used when the port is disabled.
diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
index d56eda33734e..3d7c5917b97c 100644
--- a/drivers/gpu/drm/i915/intel_sideband.c
+++ b/drivers/gpu/drm/i915/intel_sideband.c
@@ -73,6 +73,22 @@ static void __vlv_punit_put(struct drm_i915_private *dev_priv)
 	iosf_mbi_punit_release();
 }
 
+void vlv_iosf_sb_get(struct drm_i915_private *dev_priv, unsigned long ports)
+{
+	if (ports & BIT(VLV_IOSF_SB_PUNIT))
+		__vlv_punit_get(dev_priv);
+
+	mutex_lock(&dev_priv->sb_lock);
+}
+
+void vlv_iosf_sb_put(struct drm_i915_private *dev_priv, unsigned long ports)
+{
+	mutex_unlock(&dev_priv->sb_lock);
+
+	if (ports & BIT(VLV_IOSF_SB_PUNIT))
+		__vlv_punit_put(dev_priv);
+}
+
 static int vlv_sideband_rw(struct drm_i915_private *dev_priv,
 			   u32 devfn, u32 port, u32 opcode,
 			   u32 addr, u32 *val)
@@ -81,6 +97,8 @@ static int vlv_sideband_rw(struct drm_i915_private *dev_priv,
 	int err;
 
 	lockdep_assert_held(&dev_priv->sb_lock);
+	if (port == IOSF_PORT_PUNIT)
+		iosf_mbi_assert_punit_acquired();
 
 	/* Flush the previous comms, just in case it failed last time. */
 	if (intel_wait_for_register(dev_priv,
@@ -124,16 +142,14 @@ u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr)
 {
 	u32 val = 0;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
+	lockdep_assert_held(&dev_priv->pcu_lock);
 
-	mutex_lock(&dev_priv->sb_lock);
-	__vlv_punit_get(dev_priv);
+	vlv_punit_get(dev_priv);
 
 	vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
 			SB_CRRDDA_NP, addr, &val);
 
-	__vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_punit_put(dev_priv);
 
 	return val;
 }
@@ -142,20 +158,28 @@ int vlv_punit_write(struct drm_i915_private *dev_priv, u32 addr, u32 val)
 {
 	int err;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
+	lockdep_assert_held(&dev_priv->pcu_lock);
 
-	mutex_lock(&dev_priv->sb_lock);
-	__vlv_punit_get(dev_priv);
+	vlv_punit_get(dev_priv);
 
 	err = vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
 			      SB_CRWRDA_NP, addr, &val);
 
-	__vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_punit_put(dev_priv);
 
 	return err;
 }
 
+void vlv_punit_get(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_PUNIT));
+}
+
+void vlv_punit_put(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_PUNIT));
+}
+
 u32 vlv_bunit_read(struct drm_i915_private *dev_priv, u32 reg)
 {
 	u32 val = 0;
@@ -172,20 +196,38 @@ void vlv_bunit_write(struct drm_i915_private *dev_priv, u32 reg, u32 val)
 			SB_CRWRDA_NP, reg, &val);
 }
 
+void vlv_bunit_get(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_BUNIT));
+}
+
+void vlv_bunit_put(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_BUNIT));
+}
+
 u32 vlv_nc_read(struct drm_i915_private *dev_priv, u8 addr)
 {
 	u32 val = 0;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
-
-	mutex_lock(&dev_priv->sb_lock);
+	vlv_nc_get(dev_priv);
 	vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_NC,
 			SB_CRRDDA_NP, addr, &val);
-	mutex_unlock(&dev_priv->sb_lock);
+	vlv_nc_put(dev_priv);
 
 	return val;
 }
 
+void vlv_nc_get(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_NC));
+}
+
+void vlv_nc_put(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_NC));
+}
+
 u32 vlv_iosf_sb_read(struct drm_i915_private *dev_priv, u8 port, u32 reg)
 {
 	u32 val = 0;
@@ -215,6 +257,16 @@ void vlv_cck_write(struct drm_i915_private *dev_priv, u32 reg, u32 val)
 			SB_CRWRDA_NP, reg, &val);
 }
 
+void vlv_cck_get(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_CCK));
+}
+
+void vlv_cck_put(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_CCK));
+}
+
 u32 vlv_ccu_read(struct drm_i915_private *dev_priv, u32 reg)
 {
 	u32 val = 0;
@@ -229,6 +281,16 @@ void vlv_ccu_write(struct drm_i915_private *dev_priv, u32 reg, u32 val)
 			SB_CRWRDA_NP, reg, &val);
 }
 
+void vlv_ccu_get(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_CCU));
+}
+
+void vlv_ccu_put(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_CCU));
+}
+
 u32 vlv_dpio_read(struct drm_i915_private *dev_priv, enum pipe pipe, int reg)
 {
 	u32 val = 0;
@@ -252,12 +314,23 @@ void vlv_dpio_write(struct drm_i915_private *dev_priv, enum pipe pipe, int reg,
 			SB_MWR_NP, reg, &val);
 }
 
+void vlv_dpio_get(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_DPIO));
+}
+
+void vlv_dpio_put(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_DPIO));
+}
+
 /* SBI access */
 u32 intel_sbi_read(struct drm_i915_private *dev_priv, u16 reg,
 		   enum intel_sbi_destination destination)
 {
 	u32 value = 0;
-	WARN_ON(!mutex_is_locked(&dev_priv->sb_lock));
+
+	lockdep_assert_held(&dev_priv->sb_lock);
 
 	if (intel_wait_for_register(dev_priv,
 				    SBI_CTL_STAT, SBI_BUSY, 0,
@@ -297,7 +370,7 @@ void intel_sbi_write(struct drm_i915_private *dev_priv, u16 reg, u32 value,
 {
 	u32 tmp;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->sb_lock));
+	lockdep_assert_held(&dev_priv->sb_lock);
 
 	if (intel_wait_for_register(dev_priv,
 				    SBI_CTL_STAT, SBI_BUSY, 0,
@@ -344,3 +417,13 @@ void vlv_flisdsi_write(struct drm_i915_private *dev_priv, u32 reg, u32 val)
 	vlv_sideband_rw(dev_priv, DPIO_DEVFN, IOSF_PORT_FLISDSI, SB_CRWRDA_NP,
 			reg, &val);
 }
+
+void vlv_flisdsi_get(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_get(dev_priv, BIT(VLV_IOSF_SB_FLISDSI));
+}
+
+void vlv_flisdsi_put(struct drm_i915_private *dev_priv)
+{
+	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_FLISDSI));
+}
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 27/71] drm/i915: Lift sideband locking for vlv_punit_(read|write)
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (24 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 26/71] drm/i915: Lift acquiring the vlv punit magic to a common sb-get Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 28/71] drm/i915: Reduce RPS update frequency on Valleyview/Cherryview Chris Wilson
                   ` (25 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Lift the sideband acquisition for vlv_punit_read and vlv_punit_write
into their callers, so that we can lock the sideband once for a sequence
of operations, rather than perform the heavyweight acquisition on each
request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  3 ++
 drivers/gpu/drm/i915/i915_sysfs.c       | 14 ++++----
 drivers/gpu/drm/i915/intel_cdclk.c      | 24 ++++++++++---
 drivers/gpu/drm/i915/intel_display.c    | 16 +++++----
 drivers/gpu/drm/i915/intel_pm.c         | 46 ++++++++++++++++++++-----
 drivers/gpu/drm/i915/intel_runtime_pm.c |  8 +++++
 drivers/gpu/drm/i915/intel_sideband.c   | 18 ++--------
 7 files changed, 86 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3118d5de195b..4a6a5033e914 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1088,7 +1088,10 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 			   yesno((rpmodectl & GEN6_RP_MEDIA_MODE_MASK) ==
 				  GEN6_RP_MEDIA_SW_MODE));
 
+		vlv_punit_get(dev_priv);
 		freq_sts = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
+		vlv_punit_put(dev_priv);
+
 		seq_printf(m, "PUNIT_REG_GPU_FREQ_STS: 0x%08x\n", freq_sts);
 		seq_printf(m, "DDR freq: %d MHz\n", dev_priv->mem_freq);
 
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index e5e6f6bb2b05..0519e00b3720 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -258,25 +258,25 @@ static ssize_t gt_act_freq_mhz_show(struct device *kdev,
 				    struct device_attribute *attr, char *buf)
 {
 	struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
-	int ret;
+	u32 freq;
 
 	intel_runtime_pm_get(dev_priv);
 
 	mutex_lock(&dev_priv->pcu_lock);
 	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
-		u32 freq;
+		vlv_punit_get(dev_priv);
 		freq = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
-		ret = intel_gpu_freq(dev_priv, (freq >> 8) & 0xff);
+		vlv_punit_put(dev_priv);
+
+		freq = (freq >> 8) & 0xff;
 	} else {
-		ret = intel_gpu_freq(dev_priv,
-				     intel_get_cagf(dev_priv,
-						    I915_READ(GEN6_RPSTAT1)));
+		freq = intel_get_cagf(dev_priv, I915_READ(GEN6_RPSTAT1));
 	}
 	mutex_unlock(&dev_priv->pcu_lock);
 
 	intel_runtime_pm_put(dev_priv);
 
-	return snprintf(buf, PAGE_SIZE, "%d\n", ret);
+	return snprintf(buf, PAGE_SIZE, "%d\n", intel_gpu_freq(dev_priv, freq));
 }
 
 static ssize_t gt_cur_freq_mhz_show(struct device *kdev,
diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c
index dc680ec9383d..30341147121f 100644
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -461,13 +461,19 @@ static void vlv_get_cdclk(struct drm_i915_private *dev_priv,
 {
 	u32 val;
 
+	mutex_lock(&dev_priv->pcu_lock);
+	vlv_iosf_sb_get(dev_priv,
+			BIT(VLV_IOSF_SB_CCK) | BIT(VLV_IOSF_SB_PUNIT));
+
 	cdclk_state->vco = vlv_get_hpll_vco(dev_priv);
 	cdclk_state->cdclk = vlv_get_cck_clock(dev_priv, "cdclk",
 					       CCK_DISPLAY_CLOCK_CONTROL,
 					       cdclk_state->vco);
 
-	mutex_lock(&dev_priv->pcu_lock);
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
+
+	vlv_iosf_sb_put(dev_priv,
+			BIT(VLV_IOSF_SB_CCK) | BIT(VLV_IOSF_SB_PUNIT));
 	mutex_unlock(&dev_priv->pcu_lock);
 
 	if (IS_VALLEYVIEW(dev_priv))
@@ -540,6 +546,11 @@ static void vlv_set_cdclk(struct drm_i915_private *dev_priv,
 	 */
 	intel_display_power_get(dev_priv, POWER_DOMAIN_PIPE_A);
 
+	vlv_iosf_sb_get(dev_priv,
+			BIT(VLV_IOSF_SB_CCK) |
+			BIT(VLV_IOSF_SB_BUNIT) |
+			BIT(VLV_IOSF_SB_PUNIT));
+
 	mutex_lock(&dev_priv->pcu_lock);
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
 	val &= ~DSPFREQGUAR_MASK;
@@ -552,9 +563,6 @@ static void vlv_set_cdclk(struct drm_i915_private *dev_priv,
 	}
 	mutex_unlock(&dev_priv->pcu_lock);
 
-	vlv_iosf_sb_get(dev_priv,
-			BIT(VLV_IOSF_SB_CCK) | BIT(VLV_IOSF_SB_BUNIT));
-
 	if (cdclk == 400000) {
 		u32 divider;
 
@@ -588,7 +596,9 @@ static void vlv_set_cdclk(struct drm_i915_private *dev_priv,
 	vlv_bunit_write(dev_priv, BUNIT_REG_BISOC, val);
 
 	vlv_iosf_sb_put(dev_priv,
-			BIT(VLV_IOSF_SB_CCK) | BIT(VLV_IOSF_SB_BUNIT));
+			BIT(VLV_IOSF_SB_CCK) |
+			BIT(VLV_IOSF_SB_BUNIT) |
+			BIT(VLV_IOSF_SB_PUNIT));
 
 	intel_update_cdclk(dev_priv);
 
@@ -623,6 +633,8 @@ static void chv_set_cdclk(struct drm_i915_private *dev_priv,
 	intel_display_power_get(dev_priv, POWER_DOMAIN_PIPE_A);
 
 	mutex_lock(&dev_priv->pcu_lock);
+	vlv_punit_get(dev_priv);
+
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
 	val &= ~DSPFREQGUAR_MASK_CHV;
 	val |= (cmd << DSPFREQGUAR_SHIFT_CHV);
@@ -632,6 +644,8 @@ static void chv_set_cdclk(struct drm_i915_private *dev_priv,
 		     50)) {
 		DRM_ERROR("timed out waiting for CDclk change\n");
 	}
+
+	vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->pcu_lock);
 
 	intel_update_cdclk(dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f0d77c818544..71124cdacd90 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -165,10 +165,8 @@ int vlv_get_hpll_vco(struct drm_i915_private *dev_priv)
 	int hpll_freq, vco_freq[] = { 800, 1600, 2000, 2400 };
 
 	/* Obtain SKU information */
-	vlv_cck_get(dev_priv);
 	hpll_freq = vlv_cck_read(dev_priv, CCK_FUSE_REG) &
 		CCK_FUSE_HPLL_FREQ_MASK;
-	vlv_cck_put(dev_priv);
 
 	return vco_freq[hpll_freq] * 1000;
 }
@@ -179,10 +177,7 @@ int vlv_get_cck_clock(struct drm_i915_private *dev_priv,
 	u32 val;
 	int divider;
 
-	vlv_cck_get(dev_priv);
 	val = vlv_cck_read(dev_priv, reg);
-	vlv_cck_put(dev_priv);
-
 	divider = val & CCK_FREQUENCY_VALUES;
 
 	WARN((val & CCK_FREQUENCY_STATUS) !=
@@ -195,11 +190,18 @@ int vlv_get_cck_clock(struct drm_i915_private *dev_priv,
 int vlv_get_cck_clock_hpll(struct drm_i915_private *dev_priv,
 			   const char *name, u32 reg)
 {
+	int hpll;
+
+	vlv_cck_get(dev_priv);
+
 	if (dev_priv->hpll_freq == 0)
 		dev_priv->hpll_freq = vlv_get_hpll_vco(dev_priv);
 
-	return vlv_get_cck_clock(dev_priv, name, reg,
-				 dev_priv->hpll_freq);
+	hpll = vlv_get_cck_clock(dev_priv, name, reg, dev_priv->hpll_freq);
+
+	vlv_cck_put(dev_priv);
+
+	return hpll;
 }
 
 static void intel_update_czclk(struct drm_i915_private *dev_priv)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 565390383c2a..53a719a84f91 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -311,6 +311,7 @@ static void chv_set_memory_dvfs(struct drm_i915_private *dev_priv, bool enable)
 	u32 val;
 
 	mutex_lock(&dev_priv->pcu_lock);
+	vlv_punit_get(dev_priv);
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DDR_SETUP2);
 	if (enable)
@@ -325,6 +326,7 @@ static void chv_set_memory_dvfs(struct drm_i915_private *dev_priv, bool enable)
 		      FORCE_DDR_FREQ_REQ_ACK) == 0, 3))
 		DRM_ERROR("timed out waiting for Punit DDR DVFS request\n");
 
+	vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->pcu_lock);
 }
 
@@ -333,6 +335,7 @@ static void chv_set_memory_pm5(struct drm_i915_private *dev_priv, bool enable)
 	u32 val;
 
 	mutex_lock(&dev_priv->pcu_lock);
+	vlv_punit_get(dev_priv);
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
 	if (enable)
@@ -341,6 +344,7 @@ static void chv_set_memory_pm5(struct drm_i915_private *dev_priv, bool enable)
 		val &= ~DSP_MAXFIFO_PM5_ENABLE;
 	vlv_punit_write(dev_priv, PUNIT_REG_DSPFREQ, val);
 
+	vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->pcu_lock);
 }
 
@@ -5858,6 +5862,7 @@ void vlv_wm_get_hw_state(struct drm_device *dev)
 
 	if (IS_CHERRYVIEW(dev_priv)) {
 		mutex_lock(&dev_priv->pcu_lock);
+		vlv_punit_get(dev_priv);
 
 		val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
 		if (val & DSP_MAXFIFO_PM5_ENABLE)
@@ -5887,6 +5892,7 @@ void vlv_wm_get_hw_state(struct drm_device *dev)
 				wm->level = VLV_WM_LEVEL_DDR_DVFS;
 		}
 
+		vlv_punit_put(dev_priv);
 		mutex_unlock(&dev_priv->pcu_lock);
 	}
 
@@ -6434,7 +6440,9 @@ static int valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
 
 	if (val != dev_priv->gt_pm.rps.cur_freq) {
+		vlv_punit_get(dev_priv);
 		err = vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
+		vlv_punit_put(dev_priv);
 		if (err)
 			return err;
 
@@ -7373,6 +7381,11 @@ static void valleyview_init_gt_powersave(struct drm_i915_private *dev_priv)
 
 	valleyview_setup_pctx(dev_priv);
 
+	vlv_iosf_sb_get(dev_priv,
+			BIT(VLV_IOSF_SB_PUNIT) |
+			BIT(VLV_IOSF_SB_NC) |
+			BIT(VLV_IOSF_SB_CCK));
+
 	vlv_init_gpll_ref_freq(dev_priv);
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
@@ -7410,6 +7423,11 @@ static void valleyview_init_gt_powersave(struct drm_i915_private *dev_priv)
 	DRM_DEBUG_DRIVER("min GPU freq: %d MHz (%u)\n",
 			 intel_gpu_freq(dev_priv, rps->min_freq),
 			 rps->min_freq);
+
+	vlv_iosf_sb_put(dev_priv,
+			BIT(VLV_IOSF_SB_PUNIT) |
+			BIT(VLV_IOSF_SB_NC) |
+			BIT(VLV_IOSF_SB_CCK));
 }
 
 static void cherryview_init_gt_powersave(struct drm_i915_private *dev_priv)
@@ -7419,11 +7437,14 @@ static void cherryview_init_gt_powersave(struct drm_i915_private *dev_priv)
 
 	cherryview_setup_pctx(dev_priv);
 
+	vlv_iosf_sb_get(dev_priv,
+			BIT(VLV_IOSF_SB_PUNIT) |
+			BIT(VLV_IOSF_SB_NC) |
+			BIT(VLV_IOSF_SB_CCK));
+
 	vlv_init_gpll_ref_freq(dev_priv);
 
-	vlv_cck_get(dev_priv);
 	val = vlv_cck_read(dev_priv, CCK_FUSE_REG);
-	vlv_cck_put(dev_priv);
 
 	switch ((val >> 2) & 0x7) {
 	case 3:
@@ -7456,6 +7477,11 @@ static void cherryview_init_gt_powersave(struct drm_i915_private *dev_priv)
 			 intel_gpu_freq(dev_priv, rps->min_freq),
 			 rps->min_freq);
 
+	vlv_iosf_sb_put(dev_priv,
+			BIT(VLV_IOSF_SB_PUNIT) |
+			BIT(VLV_IOSF_SB_NC) |
+			BIT(VLV_IOSF_SB_CCK));
+
 	WARN_ONCE((rps->max_freq | rps->efficient_freq | rps->rp1_freq |
 		   rps->min_freq) & 1,
 		  "Odd GPU freq values\n");
@@ -7543,13 +7569,15 @@ static void cherryview_enable_rps(struct drm_i915_private *dev_priv)
 		   GEN6_RP_DOWN_IDLE_AVG);
 
 	/* Setting Fixed Bias */
-	val = VLV_OVERRIDE_EN |
-		  VLV_SOC_TDP_EN |
-		  CHV_BIAS_CPU_50_SOC_50;
+	vlv_punit_get(dev_priv);
+
+	val = VLV_OVERRIDE_EN | VLV_SOC_TDP_EN | CHV_BIAS_CPU_50_SOC_50;
 	vlv_punit_write(dev_priv, VLV_TURBO_SOC_OVERRIDE, val);
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
 
+	vlv_punit_put(dev_priv);
+
 	/* RPS code assumes GPLL is used */
 	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
 
@@ -7626,14 +7654,16 @@ static void valleyview_enable_rps(struct drm_i915_private *dev_priv)
 		   GEN6_RP_UP_BUSY_AVG |
 		   GEN6_RP_DOWN_IDLE_CONT);
 
+	vlv_punit_get(dev_priv);
+
 	/* Setting Fixed Bias */
-	val = VLV_OVERRIDE_EN |
-		  VLV_SOC_TDP_EN |
-		  VLV_BIAS_CPU_125_SOC_875;
+	val = VLV_OVERRIDE_EN | VLV_SOC_TDP_EN | VLV_BIAS_CPU_125_SOC_875;
 	vlv_punit_write(dev_priv, VLV_TURBO_SOC_OVERRIDE, val);
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
 
+	vlv_punit_put(dev_priv);
+
 	/* RPS code assumes GPLL is used */
 	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
 
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 8de3ec9409b9..a7efb4113084 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -808,6 +808,7 @@ static void vlv_set_power_well(struct drm_i915_private *dev_priv,
 			 PUNIT_PWRGT_PWR_GATE(power_well_id);
 
 	mutex_lock(&dev_priv->pcu_lock);
+	vlv_punit_get(dev_priv);
 
 #define COND \
 	((vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_STATUS) & mask) == state)
@@ -828,6 +829,7 @@ static void vlv_set_power_well(struct drm_i915_private *dev_priv,
 #undef COND
 
 out:
+	vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->pcu_lock);
 }
 
@@ -856,6 +858,7 @@ static bool vlv_power_well_enabled(struct drm_i915_private *dev_priv,
 	ctrl = PUNIT_PWRGT_PWR_ON(power_well_id);
 
 	mutex_lock(&dev_priv->pcu_lock);
+	vlv_punit_get(dev_priv);
 
 	state = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_STATUS) & mask;
 	/*
@@ -874,6 +877,7 @@ static bool vlv_power_well_enabled(struct drm_i915_private *dev_priv,
 	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_CTRL) & mask;
 	WARN_ON(ctrl != state);
 
+	vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->pcu_lock);
 
 	return enabled;
@@ -1387,6 +1391,7 @@ static bool chv_pipe_power_well_enabled(struct drm_i915_private *dev_priv,
 	u32 state, ctrl;
 
 	mutex_lock(&dev_priv->pcu_lock);
+	vlv_punit_get(dev_priv);
 
 	state = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSS_MASK(pipe);
 	/*
@@ -1403,6 +1408,7 @@ static bool chv_pipe_power_well_enabled(struct drm_i915_private *dev_priv,
 	ctrl = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSC_MASK(pipe);
 	WARN_ON(ctrl << 16 != state);
 
+	vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->pcu_lock);
 
 	return enabled;
@@ -1419,6 +1425,7 @@ static void chv_set_pipe_power_well(struct drm_i915_private *dev_priv,
 	state = enable ? DP_SSS_PWR_ON(pipe) : DP_SSS_PWR_GATE(pipe);
 
 	mutex_lock(&dev_priv->pcu_lock);
+	vlv_punit_get(dev_priv);
 
 #define COND \
 	((vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSS_MASK(pipe)) == state)
@@ -1439,6 +1446,7 @@ static void chv_set_pipe_power_well(struct drm_i915_private *dev_priv,
 #undef COND
 
 out:
+	vlv_punit_put(dev_priv);
 	mutex_unlock(&dev_priv->pcu_lock);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
index 3d7c5917b97c..dc3b491b4d00 100644
--- a/drivers/gpu/drm/i915/intel_sideband.c
+++ b/drivers/gpu/drm/i915/intel_sideband.c
@@ -144,30 +144,18 @@ u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr)
 
 	lockdep_assert_held(&dev_priv->pcu_lock);
 
-	vlv_punit_get(dev_priv);
-
 	vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
 			SB_CRRDDA_NP, addr, &val);
 
-	vlv_punit_put(dev_priv);
-
 	return val;
 }
 
 int vlv_punit_write(struct drm_i915_private *dev_priv, u32 addr, u32 val)
 {
-	int err;
-
 	lockdep_assert_held(&dev_priv->pcu_lock);
 
-	vlv_punit_get(dev_priv);
-
-	err = vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
-			      SB_CRWRDA_NP, addr, &val);
-
-	vlv_punit_put(dev_priv);
-
-	return err;
+	return vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
+			       SB_CRWRDA_NP, addr, &val);
 }
 
 void vlv_punit_get(struct drm_i915_private *dev_priv)
@@ -210,10 +198,8 @@ u32 vlv_nc_read(struct drm_i915_private *dev_priv, u8 addr)
 {
 	u32 val = 0;
 
-	vlv_nc_get(dev_priv);
 	vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_NC,
 			SB_CRRDDA_NP, addr, &val);
-	vlv_nc_put(dev_priv);
 
 	return val;
 }
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 28/71] drm/i915: Reduce RPS update frequency on Valleyview/Cherryview
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (25 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 27/71] drm/i915: Lift sideband locking for vlv_punit_(read|write) Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 29/71] Revert "drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3" Chris Wilson
                   ` (24 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Valleyview and Cherryview update the GPU frequency via the punit, which
is very expensive as we have to ensure the cores do not sleep during the
comms. If we perform frequent RPS evaluations, the frequent punit
requests cause measurable system overhead for little benefit, so
increase the evaluation intervals to reduce the number of times we try
and change frequency.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_pm.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 53a719a84f91..5f59cb81d489 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6342,6 +6342,19 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 		break;
 	}
 
+	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
+		/*
+		 * Baytrail and Braswell control the gpu frequency via the
+		 * punit, which is very slow and expensive to communicate with,
+		 * as we synchronously force the package to C0. If we try and
+		 * update the gpufreq too often we cause measurable system
+		 * load for little benefit (effectively stealing CPU time for
+		 * the GPU, negatively impacting overall throughput).
+		 */
+		ei_up <<= 2;
+		ei_down <<= 2;
+	}
+
 	/* When byt can survive without system hang with dynamic
 	 * sw freq adjustments, this restriction can be lifted.
 	 */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 29/71] Revert "drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3"
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (26 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 28/71] drm/i915: Reduce RPS update frequency on Valleyview/Cherryview Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 30/71] drm/i915: Replace pcu_lock with sb_lock Chris Wilson
                   ` (23 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: Len Brown, Jani Nikula, Daniel Vetter, fritsch

With the vlv sideband fixed to avoid sleeping while we talk to the
punit, the system should be much more stable and be able to utilise the
punit without risk.

This reverts commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation
thresholds on Baytrail v3")

References: 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: fritsch@xbmc.org
---
 drivers/gpu/drm/i915/intel_pm.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 5f59cb81d489..f3d88aa7330d 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6355,12 +6355,6 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 		ei_down <<= 2;
 	}
 
-	/* When byt can survive without system hang with dynamic
-	 * sw freq adjustments, this restriction can be lifted.
-	 */
-	if (IS_VALLEYVIEW(dev_priv))
-		goto skip_hw_write;
-
 	I915_WRITE(GEN6_RP_UP_EI,
 		   GT_INTERVAL_FROM_US(dev_priv, ei_up));
 	I915_WRITE(GEN6_RP_UP_THRESHOLD,
@@ -6381,7 +6375,6 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 		   GEN6_RP_UP_BUSY_AVG |
 		   GEN6_RP_DOWN_IDLE_AVG);
 
-skip_hw_write:
 	rps->power = new_power;
 	rps->up_threshold = threshold_up;
 	rps->down_threshold = threshold_down;
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 30/71] drm/i915: Replace pcu_lock with sb_lock
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (27 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 29/71] Revert "drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3" Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 31/71] drm/i915: Separate sideband declarations to intel_sideband.h Chris Wilson
                   ` (22 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

We now have two locks for sideband access. The general one covering
sideband access across all generation, sb_lock, and a specific one
covering sideband access via the punit on vlv/chv. After lifting the
sb_lock around the punit into the callers, the pcu_lock is now redudant
and can be separated from its other use to regulate RPS (essentially
giving RPS a lock all of its own).

v2: Extract a couple of minor bug fixes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  23 +----
 drivers/gpu/drm/i915/i915_drv.h         |  10 +-
 drivers/gpu/drm/i915/i915_irq.c         |   4 +-
 drivers/gpu/drm/i915/i915_sysfs.c       |  32 +++---
 drivers/gpu/drm/i915/intel_cdclk.c      |  28 ------
 drivers/gpu/drm/i915/intel_display.c    |   6 --
 drivers/gpu/drm/i915/intel_hdcp.c       |   2 -
 drivers/gpu/drm/i915/intel_pm.c         | 127 ++++++++++++------------
 drivers/gpu/drm/i915/intel_runtime_pm.c |   8 --
 drivers/gpu/drm/i915/intel_sideband.c   |   4 -
 10 files changed, 81 insertions(+), 163 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 4a6a5033e914..ef9e7f590f5e 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1077,8 +1077,6 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 	} else if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
 		u32 rpmodectl, freq_sts;
 
-		mutex_lock(&dev_priv->pcu_lock);
-
 		rpmodectl = I915_READ(GEN6_RP_CONTROL);
 		seq_printf(m, "Video Turbo Mode: %s\n",
 			   yesno(rpmodectl & GEN6_RP_MEDIA_TURBO));
@@ -1113,7 +1111,6 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 		seq_printf(m,
 			   "efficient (RPe) frequency: %d MHz\n",
 			   intel_gpu_freq(dev_priv, rps->efficient_freq));
-		mutex_unlock(&dev_priv->pcu_lock);
 	} else if (INTEL_GEN(dev_priv) >= 6) {
 		u32 rp_state_limits;
 		u32 gt_perf_status;
@@ -1527,12 +1524,9 @@ static int gen6_drpc_info(struct seq_file *m)
 		gen9_powergate_status = I915_READ(GEN9_PWRGT_DOMAIN_STATUS);
 	}
 
-	if (INTEL_GEN(dev_priv) <= 7) {
-		mutex_lock(&dev_priv->pcu_lock);
+	if (INTEL_GEN(dev_priv) <= 7)
 		sandybridge_pcode_read(dev_priv, GEN6_PCODE_READ_RC6VIDS,
 				       &rc6vids);
-		mutex_unlock(&dev_priv->pcu_lock);
-	}
 
 	seq_printf(m, "RC1e Enabled: %s\n",
 		   yesno(rcctl1 & GEN6_RC_CTL_RC1e_ENABLE));
@@ -1803,17 +1797,10 @@ static int i915_ring_freq_table(struct seq_file *m, void *unused)
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 	unsigned int max_gpu_freq, min_gpu_freq;
 	int gpu_freq, ia_freq;
-	int ret;
 
 	if (!HAS_LLC(dev_priv))
 		return -ENODEV;
 
-	intel_runtime_pm_get(dev_priv);
-
-	ret = mutex_lock_interruptible(&dev_priv->pcu_lock);
-	if (ret)
-		goto out;
-
 	min_gpu_freq = rps->min_freq;
 	max_gpu_freq = rps->max_freq;
 	if (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) {
@@ -1824,6 +1811,7 @@ static int i915_ring_freq_table(struct seq_file *m, void *unused)
 
 	seq_puts(m, "GPU freq (MHz)\tEffective CPU freq (MHz)\tEffective Ring freq (MHz)\n");
 
+	intel_runtime_pm_get(dev_priv);
 	for (gpu_freq = min_gpu_freq; gpu_freq <= max_gpu_freq; gpu_freq++) {
 		ia_freq = gpu_freq;
 		sandybridge_pcode_read(dev_priv,
@@ -1837,12 +1825,9 @@ static int i915_ring_freq_table(struct seq_file *m, void *unused)
 			   ((ia_freq >> 0) & 0xff) * 100,
 			   ((ia_freq >> 8) & 0xff) * 100);
 	}
-
-	mutex_unlock(&dev_priv->pcu_lock);
-
-out:
 	intel_runtime_pm_put(dev_priv);
-	return ret;
+
+	return 0;
 }
 
 static int i915_opregion(struct seq_file *m, void *unused)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 513a8e69e13e..e712cc9c82a5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -741,6 +741,8 @@ struct intel_rps_ei {
 };
 
 struct intel_rps {
+	struct mutex lock;
+
 	/*
 	 * work, interrupts_enabled and pm_iir are protected by
 	 * dev_priv->irq_lock
@@ -1793,14 +1795,6 @@ struct drm_i915_private {
 	/* Cannot be determined by PCIID. You must always read a register. */
 	u32 edram_cap;
 
-	/*
-	 * Protects RPS/RC6 register access and PCU communication.
-	 * Must be taken after struct_mutex if nested. Note that
-	 * this lock may be held for long periods of time when
-	 * talking to hw - so only take it when talking to hw!
-	 */
-	struct mutex pcu_lock;
-
 	/* gen6+ GT PM state */
 	struct intel_gen6_power_mgmt gt_pm;
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f9bc3aaa90d0..34bbf0dd00ed 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1265,7 +1265,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	if ((pm_iir & dev_priv->pm_rps_events) == 0 && !client_boost)
 		goto out;
 
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&rps->lock);
 
 	pm_iir |= vlv_wa_c0_ei(dev_priv, pm_iir);
 
@@ -1319,7 +1319,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
 		rps->last_adj = 0;
 	}
 
-	mutex_unlock(&dev_priv->pcu_lock);
+	mutex_unlock(&rps->lock);
 
 out:
 	/* Make sure not to corrupt PMIMR state used by ringbuffer on GEN6 */
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 0519e00b3720..c98375ba79b4 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -262,7 +262,6 @@ static ssize_t gt_act_freq_mhz_show(struct device *kdev,
 
 	intel_runtime_pm_get(dev_priv);
 
-	mutex_lock(&dev_priv->pcu_lock);
 	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
 		vlv_punit_get(dev_priv);
 		freq = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
@@ -272,7 +271,6 @@ static ssize_t gt_act_freq_mhz_show(struct device *kdev,
 	} else {
 		freq = intel_get_cagf(dev_priv, I915_READ(GEN6_RPSTAT1));
 	}
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	intel_runtime_pm_put(dev_priv);
 
@@ -317,12 +315,12 @@ static ssize_t gt_boost_freq_mhz_store(struct device *kdev,
 	if (val < rps->min_freq || val > rps->max_freq)
 		return -EINVAL;
 
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&rps->lock);
 	if (val != rps->boost_freq) {
 		rps->boost_freq = val;
 		boost = atomic_read(&rps->num_waiters);
 	}
-	mutex_unlock(&dev_priv->pcu_lock);
+	mutex_unlock(&rps->lock);
 	if (boost)
 		schedule_work(&rps->work);
 
@@ -362,17 +360,14 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 		return ret;
 
 	intel_runtime_pm_get(dev_priv);
-
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&rps->lock);
 
 	val = intel_freq_opcode(dev_priv, val);
-
 	if (val < rps->min_freq ||
 	    val > rps->max_freq ||
 	    val < rps->min_freq_softlimit) {
-		mutex_unlock(&dev_priv->pcu_lock);
-		intel_runtime_pm_put(dev_priv);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto unlock;
 	}
 
 	if (val > rps->rp0_freq)
@@ -390,8 +385,8 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 	 * frequency request may be unchanged. */
 	ret = intel_set_rps(dev_priv, val);
 
-	mutex_unlock(&dev_priv->pcu_lock);
-
+unlock:
+	mutex_unlock(&rps->lock);
 	intel_runtime_pm_put(dev_priv);
 
 	return ret ?: count;
@@ -420,17 +415,14 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
 		return ret;
 
 	intel_runtime_pm_get(dev_priv);
-
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&rps->lock);
 
 	val = intel_freq_opcode(dev_priv, val);
-
 	if (val < rps->min_freq ||
 	    val > rps->max_freq ||
 	    val > rps->max_freq_softlimit) {
-		mutex_unlock(&dev_priv->pcu_lock);
-		intel_runtime_pm_put(dev_priv);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto unlock;
 	}
 
 	rps->min_freq_softlimit = val;
@@ -444,8 +436,8 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
 	 * frequency request may be unchanged. */
 	ret = intel_set_rps(dev_priv, val);
 
-	mutex_unlock(&dev_priv->pcu_lock);
-
+unlock:
+	mutex_unlock(&rps->lock);
 	intel_runtime_pm_put(dev_priv);
 
 	return ret ?: count;
diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c
index 30341147121f..ad0c14bbb2e5 100644
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -461,7 +461,6 @@ static void vlv_get_cdclk(struct drm_i915_private *dev_priv,
 {
 	u32 val;
 
-	mutex_lock(&dev_priv->pcu_lock);
 	vlv_iosf_sb_get(dev_priv,
 			BIT(VLV_IOSF_SB_CCK) | BIT(VLV_IOSF_SB_PUNIT));
 
@@ -474,7 +473,6 @@ static void vlv_get_cdclk(struct drm_i915_private *dev_priv,
 
 	vlv_iosf_sb_put(dev_priv,
 			BIT(VLV_IOSF_SB_CCK) | BIT(VLV_IOSF_SB_PUNIT));
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	if (IS_VALLEYVIEW(dev_priv))
 		cdclk_state->voltage_level = (val & DSPFREQGUAR_MASK) >>
@@ -551,7 +549,6 @@ static void vlv_set_cdclk(struct drm_i915_private *dev_priv,
 			BIT(VLV_IOSF_SB_BUNIT) |
 			BIT(VLV_IOSF_SB_PUNIT));
 
-	mutex_lock(&dev_priv->pcu_lock);
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
 	val &= ~DSPFREQGUAR_MASK;
 	val |= (cmd << DSPFREQGUAR_SHIFT);
@@ -561,7 +558,6 @@ static void vlv_set_cdclk(struct drm_i915_private *dev_priv,
 		     50)) {
 		DRM_ERROR("timed out waiting for CDclk change\n");
 	}
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	if (cdclk == 400000) {
 		u32 divider;
@@ -632,7 +628,6 @@ static void chv_set_cdclk(struct drm_i915_private *dev_priv,
 	 */
 	intel_display_power_get(dev_priv, POWER_DOMAIN_PIPE_A);
 
-	mutex_lock(&dev_priv->pcu_lock);
 	vlv_punit_get(dev_priv);
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
@@ -646,7 +641,6 @@ static void chv_set_cdclk(struct drm_i915_private *dev_priv,
 	}
 
 	vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	intel_update_cdclk(dev_priv);
 
@@ -724,10 +718,8 @@ static void bdw_set_cdclk(struct drm_i915_private *dev_priv,
 		 "trying to change cdclk frequency with cdclk not enabled\n"))
 		return;
 
-	mutex_lock(&dev_priv->pcu_lock);
 	ret = sandybridge_pcode_write(dev_priv,
 				      BDW_PCODE_DISPLAY_FREQ_CHANGE_REQ, 0x0);
-	mutex_unlock(&dev_priv->pcu_lock);
 	if (ret) {
 		DRM_ERROR("failed to inform pcode about cdclk change\n");
 		return;
@@ -776,10 +768,8 @@ static void bdw_set_cdclk(struct drm_i915_private *dev_priv,
 			LCPLL_CD_SOURCE_FCLK_DONE) == 0, 1))
 		DRM_ERROR("Switching back to LCPLL failed\n");
 
-	mutex_lock(&dev_priv->pcu_lock);
 	sandybridge_pcode_write(dev_priv, HSW_PCODE_DE_WRITE_FREQ_REQ,
 				cdclk_state->voltage_level);
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	I915_WRITE(CDCLK_FREQ, DIV_ROUND_CLOSEST(cdclk, 1000) - 1);
 
@@ -1007,12 +997,10 @@ static void skl_set_cdclk(struct drm_i915_private *dev_priv,
 	u32 freq_select, cdclk_ctl;
 	int ret;
 
-	mutex_lock(&dev_priv->pcu_lock);
 	ret = skl_pcode_request(dev_priv, SKL_PCODE_CDCLK_CONTROL,
 				SKL_CDCLK_PREPARE_FOR_CHANGE,
 				SKL_CDCLK_READY_FOR_CHANGE,
 				SKL_CDCLK_READY_FOR_CHANGE, 3);
-	mutex_unlock(&dev_priv->pcu_lock);
 	if (ret) {
 		DRM_ERROR("Failed to inform PCU about cdclk change (%d)\n",
 			  ret);
@@ -1076,10 +1064,8 @@ static void skl_set_cdclk(struct drm_i915_private *dev_priv,
 	POSTING_READ(CDCLK_CTL);
 
 	/* inform PCU of the change */
-	mutex_lock(&dev_priv->pcu_lock);
 	sandybridge_pcode_write(dev_priv, SKL_PCODE_CDCLK_CONTROL,
 				cdclk_state->voltage_level);
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	intel_update_cdclk(dev_priv);
 }
@@ -1391,12 +1377,9 @@ static void bxt_set_cdclk(struct drm_i915_private *dev_priv,
 	 * requires us to wait up to 150usec, but that leads to timeouts;
 	 * the 2ms used here is based on experiment.
 	 */
-	mutex_lock(&dev_priv->pcu_lock);
 	ret = sandybridge_pcode_write_timeout(dev_priv,
 					      HSW_PCODE_DE_WRITE_FREQ_REQ,
 					      0x80000000, 150, 2);
-	mutex_unlock(&dev_priv->pcu_lock);
-
 	if (ret) {
 		DRM_ERROR("PCode CDCLK freq change notify failed (err %d, freq %d)\n",
 			  ret, cdclk);
@@ -1424,7 +1407,6 @@ static void bxt_set_cdclk(struct drm_i915_private *dev_priv,
 		val |= BXT_CDCLK_SSA_PRECHARGE_ENABLE;
 	I915_WRITE(CDCLK_CTL, val);
 
-	mutex_lock(&dev_priv->pcu_lock);
 	/*
 	 * The timeout isn't specified, the 2ms used here is based on
 	 * experiment.
@@ -1434,8 +1416,6 @@ static void bxt_set_cdclk(struct drm_i915_private *dev_priv,
 	ret = sandybridge_pcode_write_timeout(dev_priv,
 					      HSW_PCODE_DE_WRITE_FREQ_REQ,
 					      cdclk_state->voltage_level, 150, 2);
-	mutex_unlock(&dev_priv->pcu_lock);
-
 	if (ret) {
 		DRM_ERROR("PCode CDCLK freq set failed, (err %d, freq %d)\n",
 			  ret, cdclk);
@@ -1673,12 +1653,10 @@ static void cnl_set_cdclk(struct drm_i915_private *dev_priv,
 	u32 val, divider;
 	int ret;
 
-	mutex_lock(&dev_priv->pcu_lock);
 	ret = skl_pcode_request(dev_priv, SKL_PCODE_CDCLK_CONTROL,
 				SKL_CDCLK_PREPARE_FOR_CHANGE,
 				SKL_CDCLK_READY_FOR_CHANGE,
 				SKL_CDCLK_READY_FOR_CHANGE, 3);
-	mutex_unlock(&dev_priv->pcu_lock);
 	if (ret) {
 		DRM_ERROR("Failed to inform PCU about cdclk change (%d)\n",
 			  ret);
@@ -1715,10 +1693,8 @@ static void cnl_set_cdclk(struct drm_i915_private *dev_priv,
 	I915_WRITE(CDCLK_CTL, val);
 
 	/* inform PCU of the change */
-	mutex_lock(&dev_priv->pcu_lock);
 	sandybridge_pcode_write(dev_priv, SKL_PCODE_CDCLK_CONTROL,
 				cdclk_state->voltage_level);
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	intel_update_cdclk(dev_priv);
 
@@ -1854,12 +1830,10 @@ static void icl_set_cdclk(struct drm_i915_private *dev_priv,
 	unsigned int vco = cdclk_state->vco;
 	int ret;
 
-	mutex_lock(&dev_priv->pcu_lock);
 	ret = skl_pcode_request(dev_priv, SKL_PCODE_CDCLK_CONTROL,
 				SKL_CDCLK_PREPARE_FOR_CHANGE,
 				SKL_CDCLK_READY_FOR_CHANGE,
 				SKL_CDCLK_READY_FOR_CHANGE, 3);
-	mutex_unlock(&dev_priv->pcu_lock);
 	if (ret) {
 		DRM_ERROR("Failed to inform PCU about cdclk change (%d)\n",
 			  ret);
@@ -1876,10 +1850,8 @@ static void icl_set_cdclk(struct drm_i915_private *dev_priv,
 	I915_WRITE(CDCLK_CTL, ICL_CDCLK_CD2X_PIPE_NONE |
 			      skl_cdclk_decimal(cdclk));
 
-	mutex_lock(&dev_priv->pcu_lock);
 	/* TODO: add proper DVFS support. */
 	sandybridge_pcode_write(dev_priv, SKL_PCODE_CDCLK_CONTROL, 2);
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	intel_update_cdclk(dev_priv);
 }
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 71124cdacd90..1007e589baa0 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -4980,10 +4980,8 @@ void hsw_enable_ips(const struct intel_crtc_state *crtc_state)
 	WARN_ON(!(crtc_state->active_planes & ~BIT(PLANE_CURSOR)));
 
 	if (IS_BROADWELL(dev_priv)) {
-		mutex_lock(&dev_priv->pcu_lock);
 		WARN_ON(sandybridge_pcode_write(dev_priv, DISPLAY_IPS_CONTROL,
 						IPS_ENABLE | IPS_PCODE_CONTROL));
-		mutex_unlock(&dev_priv->pcu_lock);
 		/* Quoting Art Runyan: "its not safe to expect any particular
 		 * value in IPS_CTL bit 31 after enabling IPS through the
 		 * mailbox." Moreover, the mailbox may return a bogus state,
@@ -5013,9 +5011,7 @@ void hsw_disable_ips(const struct intel_crtc_state *crtc_state)
 		return;
 
 	if (IS_BROADWELL(dev_priv)) {
-		mutex_lock(&dev_priv->pcu_lock);
 		WARN_ON(sandybridge_pcode_write(dev_priv, DISPLAY_IPS_CONTROL, 0));
-		mutex_unlock(&dev_priv->pcu_lock);
 		/* wait for pcode to finish disabling IPS, which may take up to 42ms */
 		if (intel_wait_for_register(dev_priv,
 					    IPS_CTL, IPS_ENABLE, 0,
@@ -8901,11 +8897,9 @@ static uint32_t hsw_read_dcomp(struct drm_i915_private *dev_priv)
 static void hsw_write_dcomp(struct drm_i915_private *dev_priv, uint32_t val)
 {
 	if (IS_HASWELL(dev_priv)) {
-		mutex_lock(&dev_priv->pcu_lock);
 		if (sandybridge_pcode_write(dev_priv, GEN6_PCODE_WRITE_D_COMP,
 					    val))
 			DRM_DEBUG_KMS("Failed to write to D_COMP\n");
-		mutex_unlock(&dev_priv->pcu_lock);
 	} else {
 		I915_WRITE(D_COMP_BDW, val);
 		POSTING_READ(D_COMP_BDW);
diff --git a/drivers/gpu/drm/i915/intel_hdcp.c b/drivers/gpu/drm/i915/intel_hdcp.c
index 2db5da550a1c..6fa39d4521ca 100644
--- a/drivers/gpu/drm/i915/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/intel_hdcp.c
@@ -105,10 +105,8 @@ static int intel_hdcp_load_keys(struct drm_i915_private *dev_priv)
 	 * differ in the key load trigger process from other platforms.
 	 */
 	if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv)) {
-		mutex_lock(&dev_priv->pcu_lock);
 		ret = sandybridge_pcode_write(dev_priv,
 					      SKL_PCODE_LOAD_HDCP_KEYS, 1);
-		mutex_unlock(&dev_priv->pcu_lock);
 		if (ret) {
 			DRM_ERROR("Failed to initiate HDCP key load (%d)\n",
 			          ret);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index f3d88aa7330d..dda6fac09952 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -310,7 +310,6 @@ static void chv_set_memory_dvfs(struct drm_i915_private *dev_priv, bool enable)
 {
 	u32 val;
 
-	mutex_lock(&dev_priv->pcu_lock);
 	vlv_punit_get(dev_priv);
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DDR_SETUP2);
@@ -327,14 +326,12 @@ static void chv_set_memory_dvfs(struct drm_i915_private *dev_priv, bool enable)
 		DRM_ERROR("timed out waiting for Punit DDR DVFS request\n");
 
 	vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->pcu_lock);
 }
 
 static void chv_set_memory_pm5(struct drm_i915_private *dev_priv, bool enable)
 {
 	u32 val;
 
-	mutex_lock(&dev_priv->pcu_lock);
 	vlv_punit_get(dev_priv);
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
@@ -345,7 +342,6 @@ static void chv_set_memory_pm5(struct drm_i915_private *dev_priv, bool enable)
 	vlv_punit_write(dev_priv, PUNIT_REG_DSPFREQ, val);
 
 	vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->pcu_lock);
 }
 
 #define FW_WM(value, plane) \
@@ -2810,11 +2806,9 @@ static void intel_read_wm_latency(struct drm_i915_private *dev_priv,
 
 		/* read the first set of memory latencies[0:3] */
 		val = 0; /* data0 to be programmed to 0 for first set */
-		mutex_lock(&dev_priv->pcu_lock);
 		ret = sandybridge_pcode_read(dev_priv,
 					     GEN9_PCODE_READ_MEM_LATENCY,
 					     &val);
-		mutex_unlock(&dev_priv->pcu_lock);
 
 		if (ret) {
 			DRM_ERROR("SKL Mailbox read error = %d\n", ret);
@@ -2831,11 +2825,9 @@ static void intel_read_wm_latency(struct drm_i915_private *dev_priv,
 
 		/* read the second set of memory latencies[4:7] */
 		val = 1; /* data0 to be programmed to 1 for second set */
-		mutex_lock(&dev_priv->pcu_lock);
 		ret = sandybridge_pcode_read(dev_priv,
 					     GEN9_PCODE_READ_MEM_LATENCY,
 					     &val);
-		mutex_unlock(&dev_priv->pcu_lock);
 		if (ret) {
 			DRM_ERROR("SKL Mailbox read error = %d\n", ret);
 			return;
@@ -3639,13 +3631,10 @@ intel_enable_sagv(struct drm_i915_private *dev_priv)
 		return 0;
 
 	DRM_DEBUG_KMS("Enabling the SAGV\n");
-	mutex_lock(&dev_priv->pcu_lock);
-
 	ret = sandybridge_pcode_write(dev_priv, GEN9_PCODE_SAGV_CONTROL,
 				      GEN9_SAGV_ENABLE);
 
 	/* We don't need to wait for the SAGV when enabling */
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	/*
 	 * Some skl systems, pre-release machines in particular,
@@ -3676,15 +3665,11 @@ intel_disable_sagv(struct drm_i915_private *dev_priv)
 		return 0;
 
 	DRM_DEBUG_KMS("Disabling the SAGV\n");
-	mutex_lock(&dev_priv->pcu_lock);
-
 	/* bspec says to keep retrying for at least 1 ms */
 	ret = skl_pcode_request(dev_priv, GEN9_PCODE_SAGV_CONTROL,
 				GEN9_SAGV_DISABLE,
 				GEN9_SAGV_IS_DISABLED, GEN9_SAGV_IS_DISABLED,
 				1);
-	mutex_unlock(&dev_priv->pcu_lock);
-
 	/*
 	 * Some skl systems, pre-release machines in particular,
 	 * don't actually have an SAGV.
@@ -5861,7 +5846,6 @@ void vlv_wm_get_hw_state(struct drm_device *dev)
 	wm->level = VLV_WM_LEVEL_PM2;
 
 	if (IS_CHERRYVIEW(dev_priv)) {
-		mutex_lock(&dev_priv->pcu_lock);
 		vlv_punit_get(dev_priv);
 
 		val = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ);
@@ -5893,7 +5877,6 @@ void vlv_wm_get_hw_state(struct drm_device *dev)
 		}
 
 		vlv_punit_put(dev_priv);
-		mutex_unlock(&dev_priv->pcu_lock);
 	}
 
 	for_each_intel_crtc(dev, crtc) {
@@ -6501,7 +6484,7 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&rps->lock);
 	if (rps->enabled) {
 		u8 freq;
 
@@ -6524,7 +6507,7 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv)
 					rps->max_freq_softlimit)))
 			DRM_DEBUG_DRIVER("Failed to set idle frequency\n");
 	}
-	mutex_unlock(&dev_priv->pcu_lock);
+	mutex_unlock(&rps->lock);
 }
 
 void gen6_rps_idle(struct drm_i915_private *dev_priv)
@@ -6538,7 +6521,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 	 */
 	gen6_disable_rps_interrupts(dev_priv);
 
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&rps->lock);
 	if (rps->enabled) {
 		if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
 			vlv_set_rps_idle(dev_priv);
@@ -6548,7 +6531,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 		I915_WRITE(GEN6_PMINTRMSK,
 			   gen6_sanitize_rps_pm_mask(dev_priv, ~0));
 	}
-	mutex_unlock(&dev_priv->pcu_lock);
+	mutex_unlock(&rps->lock);
 }
 
 void gen6_rps_boost(struct i915_request *rq,
@@ -6589,7 +6572,7 @@ int intel_set_rps(struct drm_i915_private *dev_priv, u8 val)
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 	int err;
 
-	lockdep_assert_held(&dev_priv->pcu_lock);
+	lockdep_assert_held(&rps->lock);
 	GEM_BUG_ON(val > rps->max_freq);
 	GEM_BUG_ON(val < rps->min_freq);
 
@@ -7088,7 +7071,7 @@ static void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
 	unsigned int max_gpu_freq, min_gpu_freq;
 	struct cpufreq_policy *policy;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
+	lockdep_assert_held(&rps->lock);
 
 	if (rps->max_freq <= rps->min_freq)
 		return;
@@ -8167,7 +8150,7 @@ void intel_init_gt_powersave(struct drm_i915_private *dev_priv)
 		intel_runtime_pm_get(dev_priv);
 	}
 
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&rps->lock);
 
 	/* Initialize RPS limits (for userspace) */
 	if (IS_CHERRYVIEW(dev_priv))
@@ -8207,7 +8190,7 @@ void intel_init_gt_powersave(struct drm_i915_private *dev_priv)
 	/* Finally allow us to boost to max by default */
 	rps->boost_freq = rps->max_freq;
 
-	mutex_unlock(&dev_priv->pcu_lock);
+	mutex_unlock(&rps->lock);
 }
 
 void intel_cleanup_gt_powersave(struct drm_i915_private *dev_priv)
@@ -8249,7 +8232,7 @@ void intel_sanitize_gt_powersave(struct drm_i915_private *dev_priv)
 
 static inline void intel_disable_llc_pstate(struct drm_i915_private *i915)
 {
-	lockdep_assert_held(&i915->pcu_lock);
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
 	if (!i915->gt_pm.llc_pstate.enabled)
 		return;
@@ -8261,7 +8244,7 @@ static inline void intel_disable_llc_pstate(struct drm_i915_private *i915)
 
 static void intel_disable_rc6(struct drm_i915_private *dev_priv)
 {
-	lockdep_assert_held(&dev_priv->pcu_lock);
+	lockdep_assert_held(&dev_priv->gt_pm.rps.lock);
 
 	if (!dev_priv->gt_pm.rc6.enabled)
 		return;
@@ -8280,7 +8263,7 @@ static void intel_disable_rc6(struct drm_i915_private *dev_priv)
 
 static void intel_disable_rps(struct drm_i915_private *dev_priv)
 {
-	lockdep_assert_held(&dev_priv->pcu_lock);
+	lockdep_assert_held(&dev_priv->gt_pm.rps.lock);
 
 	if (!dev_priv->gt_pm.rps.enabled)
 		return;
@@ -8301,19 +8284,19 @@ static void intel_disable_rps(struct drm_i915_private *dev_priv)
 
 void intel_disable_gt_powersave(struct drm_i915_private *dev_priv)
 {
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&dev_priv->gt_pm.rps.lock);
 
 	intel_disable_rc6(dev_priv);
 	intel_disable_rps(dev_priv);
 	if (HAS_LLC(dev_priv))
 		intel_disable_llc_pstate(dev_priv);
 
-	mutex_unlock(&dev_priv->pcu_lock);
+	mutex_unlock(&dev_priv->gt_pm.rps.lock);
 }
 
 static inline void intel_enable_llc_pstate(struct drm_i915_private *i915)
 {
-	lockdep_assert_held(&i915->pcu_lock);
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
 	if (i915->gt_pm.llc_pstate.enabled)
 		return;
@@ -8325,7 +8308,7 @@ static inline void intel_enable_llc_pstate(struct drm_i915_private *i915)
 
 static void intel_enable_rc6(struct drm_i915_private *dev_priv)
 {
-	lockdep_assert_held(&dev_priv->pcu_lock);
+	lockdep_assert_held(&dev_priv->gt_pm.rps.lock);
 
 	if (dev_priv->gt_pm.rc6.enabled)
 		return;
@@ -8348,7 +8331,7 @@ static void intel_enable_rps(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
-	lockdep_assert_held(&dev_priv->pcu_lock);
+	lockdep_assert_held(&rps->lock);
 
 	if (rps->enabled)
 		return;
@@ -8383,7 +8366,7 @@ void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
 	if (intel_vgpu_active(dev_priv))
 		return;
 
-	mutex_lock(&dev_priv->pcu_lock);
+	mutex_lock(&dev_priv->gt_pm.rps.lock);
 
 	if (HAS_RC6(dev_priv))
 		intel_enable_rc6(dev_priv);
@@ -8391,7 +8374,7 @@ void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
 	if (HAS_LLC(dev_priv))
 		intel_enable_llc_pstate(dev_priv);
 
-	mutex_unlock(&dev_priv->pcu_lock);
+	mutex_unlock(&dev_priv->gt_pm.rps.lock);
 }
 
 static void ibx_init_clock_gating(struct drm_i915_private *dev_priv)
@@ -9396,22 +9379,19 @@ static inline int gen7_check_mailbox_status(struct drm_i915_private *dev_priv)
 	}
 }
 
-int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val)
+static int __sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val)
 {
 	int status;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
+	lockdep_assert_held(&dev_priv->sb_lock);
 
 	/* GEN6_PCODE_* are outside of the forcewake domain, we can
 	 * use te fw I915_READ variants to reduce the amount of work
 	 * required when reading/writing.
 	 */
 
-	if (I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY) {
-		DRM_DEBUG_DRIVER("warning: pcode (read from mbox %x) mailbox access failed for %ps\n",
-				 mbox, __builtin_return_address(0));
+	if (I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY)
 		return -EAGAIN;
-	}
 
 	I915_WRITE_FW(GEN6_PCODE_DATA, *val);
 	I915_WRITE_FW(GEN6_PCODE_DATA1, 0);
@@ -9419,11 +9399,8 @@ int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val
 
 	if (__intel_wait_for_register_fw(dev_priv,
 					 GEN6_PCODE_MAILBOX, GEN6_PCODE_READY, 0,
-					 500, 0, NULL)) {
-		DRM_ERROR("timeout waiting for pcode read (from mbox %x) to finish for %ps\n",
-			  mbox, __builtin_return_address(0));
+					 500, 0, NULL))
 		return -ETIMEDOUT;
-	}
 
 	*val = I915_READ_FW(GEN6_PCODE_DATA);
 	I915_WRITE_FW(GEN6_PCODE_DATA, 0);
@@ -9433,33 +9410,39 @@ int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val
 	else
 		status = gen6_check_mailbox_status(dev_priv);
 
+	return status;
+}
+
+int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val)
+{
+	int status;
+
+	mutex_lock(&dev_priv->sb_lock);
+	status = __sandybridge_pcode_read(dev_priv, mbox, val);
+	mutex_unlock(&dev_priv->sb_lock);
+
 	if (status) {
 		DRM_DEBUG_DRIVER("warning: pcode (read from mbox %x) mailbox access failed for %ps: %d\n",
 				 mbox, __builtin_return_address(0), status);
-		return status;
 	}
 
-	return 0;
+	return status;
 }
 
-int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
-				    u32 mbox, u32 val,
-				    int fast_timeout_us, int slow_timeout_ms)
+static int __sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
+					     u32 mbox, u32 val,
+					     int fast_timeout_us,
+					     int slow_timeout_ms)
 {
 	int status;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
-
 	/* GEN6_PCODE_* are outside of the forcewake domain, we can
 	 * use te fw I915_READ variants to reduce the amount of work
 	 * required when reading/writing.
 	 */
 
-	if (I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY) {
-		DRM_DEBUG_DRIVER("warning: pcode (write of 0x%08x to mbox %x) mailbox access failed for %ps\n",
-				 val, mbox, __builtin_return_address(0));
+	if (I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY)
 		return -EAGAIN;
-	}
 
 	I915_WRITE_FW(GEN6_PCODE_DATA, val);
 	I915_WRITE_FW(GEN6_PCODE_DATA1, 0);
@@ -9468,11 +9451,8 @@ int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
 	if (__intel_wait_for_register_fw(dev_priv,
 					 GEN6_PCODE_MAILBOX, GEN6_PCODE_READY, 0,
 					 fast_timeout_us, slow_timeout_ms,
-					 NULL)) {
-		DRM_ERROR("timeout waiting for pcode write of 0x%08x to mbox %x to finish for %ps\n",
-			  val, mbox, __builtin_return_address(0));
+					 NULL))
 		return -ETIMEDOUT;
-	}
 
 	I915_WRITE_FW(GEN6_PCODE_DATA, 0);
 
@@ -9481,13 +9461,28 @@ int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
 	else
 		status = gen6_check_mailbox_status(dev_priv);
 
+	return status;
+}
+
+int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
+				    u32 mbox, u32 val,
+				    int fast_timeout_us,
+				    int slow_timeout_ms)
+{
+	int status;
+
+	mutex_lock(&dev_priv->sb_lock);
+	status = __sandybridge_pcode_write_timeout(dev_priv, mbox, val,
+						   fast_timeout_us,
+						   slow_timeout_ms);
+	mutex_unlock(&dev_priv->sb_lock);
+
 	if (status) {
 		DRM_DEBUG_DRIVER("warning: pcode (write of 0x%08x to mbox %x) mailbox access failed for %ps: %d\n",
 				 val, mbox, __builtin_return_address(0), status);
-		return status;
 	}
 
-	return 0;
+	return status;
 }
 
 static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
@@ -9496,7 +9491,7 @@ static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
 {
 	u32 val = request;
 
-	*status = sandybridge_pcode_read(dev_priv, mbox, &val);
+	*status = __sandybridge_pcode_read(dev_priv, mbox, &val);
 
 	return *status || ((val & reply_mask) == reply);
 }
@@ -9526,7 +9521,7 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
 	u32 status;
 	int ret;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->pcu_lock));
+	mutex_lock(&dev_priv->sb_lock);
 
 #define COND skl_pcode_try_request(dev_priv, mbox, request, reply_mask, reply, \
 				   &status)
@@ -9562,6 +9557,7 @@ int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
 	preempt_enable();
 
 out:
+	mutex_unlock(&dev_priv->sb_lock);
 	return ret ? ret : status;
 #undef COND
 }
@@ -9631,8 +9627,7 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
 
 void intel_pm_setup(struct drm_i915_private *dev_priv)
 {
-	mutex_init(&dev_priv->pcu_lock);
-
+	mutex_init(&dev_priv->gt_pm.rps.lock);
 	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
 
 	dev_priv->runtime_pm.suspended = false;
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index a7efb4113084..8b92be53b5dc 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -807,7 +807,6 @@ static void vlv_set_power_well(struct drm_i915_private *dev_priv,
 	state = enable ? PUNIT_PWRGT_PWR_ON(power_well_id) :
 			 PUNIT_PWRGT_PWR_GATE(power_well_id);
 
-	mutex_lock(&dev_priv->pcu_lock);
 	vlv_punit_get(dev_priv);
 
 #define COND \
@@ -830,7 +829,6 @@ static void vlv_set_power_well(struct drm_i915_private *dev_priv,
 
 out:
 	vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->pcu_lock);
 }
 
 static void vlv_power_well_enable(struct drm_i915_private *dev_priv,
@@ -857,7 +855,6 @@ static bool vlv_power_well_enabled(struct drm_i915_private *dev_priv,
 	mask = PUNIT_PWRGT_MASK(power_well_id);
 	ctrl = PUNIT_PWRGT_PWR_ON(power_well_id);
 
-	mutex_lock(&dev_priv->pcu_lock);
 	vlv_punit_get(dev_priv);
 
 	state = vlv_punit_read(dev_priv, PUNIT_REG_PWRGT_STATUS) & mask;
@@ -878,7 +875,6 @@ static bool vlv_power_well_enabled(struct drm_i915_private *dev_priv,
 	WARN_ON(ctrl != state);
 
 	vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	return enabled;
 }
@@ -1390,7 +1386,6 @@ static bool chv_pipe_power_well_enabled(struct drm_i915_private *dev_priv,
 	bool enabled;
 	u32 state, ctrl;
 
-	mutex_lock(&dev_priv->pcu_lock);
 	vlv_punit_get(dev_priv);
 
 	state = vlv_punit_read(dev_priv, PUNIT_REG_DSPFREQ) & DP_SSS_MASK(pipe);
@@ -1409,7 +1404,6 @@ static bool chv_pipe_power_well_enabled(struct drm_i915_private *dev_priv,
 	WARN_ON(ctrl << 16 != state);
 
 	vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->pcu_lock);
 
 	return enabled;
 }
@@ -1424,7 +1418,6 @@ static void chv_set_pipe_power_well(struct drm_i915_private *dev_priv,
 
 	state = enable ? DP_SSS_PWR_ON(pipe) : DP_SSS_PWR_GATE(pipe);
 
-	mutex_lock(&dev_priv->pcu_lock);
 	vlv_punit_get(dev_priv);
 
 #define COND \
@@ -1447,7 +1440,6 @@ static void chv_set_pipe_power_well(struct drm_i915_private *dev_priv,
 
 out:
 	vlv_punit_put(dev_priv);
-	mutex_unlock(&dev_priv->pcu_lock);
 }
 
 static void chv_pipe_power_well_enable(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
index dc3b491b4d00..2d4e48e9e1d5 100644
--- a/drivers/gpu/drm/i915/intel_sideband.c
+++ b/drivers/gpu/drm/i915/intel_sideband.c
@@ -142,8 +142,6 @@ u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr)
 {
 	u32 val = 0;
 
-	lockdep_assert_held(&dev_priv->pcu_lock);
-
 	vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
 			SB_CRRDDA_NP, addr, &val);
 
@@ -152,8 +150,6 @@ u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr)
 
 int vlv_punit_write(struct drm_i915_private *dev_priv, u32 addr, u32 val)
 {
-	lockdep_assert_held(&dev_priv->pcu_lock);
-
 	return vlv_sideband_rw(dev_priv, PCI_DEVFN(0, 0), IOSF_PORT_PUNIT,
 			       SB_CRWRDA_NP, addr, &val);
 }
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 31/71] drm/i915: Separate sideband declarations to intel_sideband.h
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (28 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 30/71] drm/i915: Replace pcu_lock with sb_lock Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 32/71] drm/i915: Merge sbi read/write into a single accessor Chris Wilson
                   ` (21 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Split the sideback declarations out of the ginormous i915_drv.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  2 +
 drivers/gpu/drm/i915/i915_drv.h         | 62 ---------------------
 drivers/gpu/drm/i915/i915_sysfs.c       |  2 +
 drivers/gpu/drm/i915/intel_cdclk.c      |  1 +
 drivers/gpu/drm/i915/intel_display.c    | 19 ++++---
 drivers/gpu/drm/i915/intel_dp.c         |  6 +-
 drivers/gpu/drm/i915/intel_dpio_phy.c   |  1 +
 drivers/gpu/drm/i915/intel_dsi.c        |  7 ++-
 drivers/gpu/drm/i915/intel_dsi_pll.c    |  4 +-
 drivers/gpu/drm/i915/intel_dsi_vbt.c    | 11 +++-
 drivers/gpu/drm/i915/intel_hdmi.c       |  5 +-
 drivers/gpu/drm/i915/intel_pm.c         |  7 ++-
 drivers/gpu/drm/i915/intel_runtime_pm.c |  1 +
 drivers/gpu/drm/i915/intel_sideband.c   |  2 +
 drivers/gpu/drm/i915/intel_sideband.h   | 73 +++++++++++++++++++++++++
 15 files changed, 123 insertions(+), 80 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_sideband.h

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ef9e7f590f5e..b61de2838bd7 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -29,8 +29,10 @@
 #include <linux/debugfs.h>
 #include <linux/sort.h>
 #include <linux/sched/mm.h>
+
 #include "intel_drv.h"
 #include "intel_guc_submission.h"
+#include "intel_sideband.h"
 
 static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node)
 {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e712cc9c82a5..ff810ee016a8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -635,11 +635,6 @@ enum intel_pch {
 	PCH_NOP,
 };
 
-enum intel_sbi_destination {
-	SBI_ICLK,
-	SBI_MPHY,
-};
-
 #define QUIRK_LVDS_SSC_DISABLE (1<<1)
 #define QUIRK_INVERT_BRIGHTNESS (1<<2)
 #define QUIRK_BACKLIGHT_PRESENT (1<<3)
@@ -3455,63 +3450,6 @@ int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv, u32 mbox,
 int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
 		      u32 reply_mask, u32 reply, int timeout_base_ms);
 
-/* intel_sideband.c */
-
-enum {
-	VLV_IOSF_SB_BUNIT,
-	VLV_IOSF_SB_CCK,
-	VLV_IOSF_SB_CCU,
-	VLV_IOSF_SB_DPIO,
-	VLV_IOSF_SB_FLISDSI,
-	VLV_IOSF_SB_GPIO,
-	VLV_IOSF_SB_NC,
-	VLV_IOSF_SB_PUNIT,
-};
-
-void vlv_iosf_sb_get(struct drm_i915_private *dev_priv, unsigned long ports);
-u32 vlv_iosf_sb_read(struct drm_i915_private *dev_priv, u8 port, u32 reg);
-void vlv_iosf_sb_write(struct drm_i915_private *dev_priv, u8 port, u32 reg, u32 val);
-void vlv_iosf_sb_put(struct drm_i915_private *dev_priv, unsigned long ports);
-
-void vlv_punit_get(struct drm_i915_private *dev_priv);
-u32 vlv_punit_read(struct drm_i915_private *dev_priv, u32 addr);
-int vlv_punit_write(struct drm_i915_private *dev_priv, u32 addr, u32 val);
-void vlv_punit_put(struct drm_i915_private *dev_priv);
-
-void vlv_nc_get(struct drm_i915_private *dev_priv);
-u32 vlv_nc_read(struct drm_i915_private *dev_priv, u8 addr);
-void vlv_nc_put(struct drm_i915_private *dev_priv);
-
-void vlv_cck_get(struct drm_i915_private *dev_priv);
-u32 vlv_cck_read(struct drm_i915_private *dev_priv, u32 reg);
-void vlv_cck_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
-void vlv_cck_put(struct drm_i915_private *dev_priv);
-
-void vlv_ccu_get(struct drm_i915_private *dev_priv);
-u32 vlv_ccu_read(struct drm_i915_private *dev_priv, u32 reg);
-void vlv_ccu_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
-void vlv_ccu_put(struct drm_i915_private *dev_priv);
-
-void vlv_bunit_get(struct drm_i915_private *dev_priv);
-u32 vlv_bunit_read(struct drm_i915_private *dev_priv, u32 reg);
-void vlv_bunit_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
-void vlv_bunit_put(struct drm_i915_private *dev_priv);
-
-void vlv_dpio_get(struct drm_i915_private *dev_priv);
-u32 vlv_dpio_read(struct drm_i915_private *dev_priv, enum pipe pipe, int reg);
-void vlv_dpio_write(struct drm_i915_private *dev_priv, enum pipe pipe, int reg, u32 val);
-void vlv_dpio_put(struct drm_i915_private *dev_priv);
-
-void vlv_flisdsi_get(struct drm_i915_private *dev_priv);
-u32 vlv_flisdsi_read(struct drm_i915_private *dev_priv, u32 reg);
-void vlv_flisdsi_write(struct drm_i915_private *dev_priv, u32 reg, u32 val);
-void vlv_flisdsi_put(struct drm_i915_private *dev_priv);
-
-u32 intel_sbi_read(struct drm_i915_private *dev_priv, u16 reg,
-		   enum intel_sbi_destination destination);
-void intel_sbi_write(struct drm_i915_private *dev_priv, u16 reg, u32 value,
-		     enum intel_sbi_destination destination);
-
 /* intel_dpio_phy.c */
 void bxt_port_to_phy_channel(struct drm_i915_private *dev_priv, enum port port,
 			     enum dpio_phy *phy, enum dpio_channel *ch);
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index c98375ba79b4..55554697133b 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -29,7 +29,9 @@
 #include <linux/module.h>
 #include <linux/stat.h>
 #include <linux/sysfs.h>
+
 #include "intel_drv.h"
+#include "intel_sideband.h"
 #include "i915_drv.h"
 
 static inline struct drm_i915_private *kdev_minor_to_i915(struct device *kdev)
diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c
index ad0c14bbb2e5..14f44b30c55d 100644
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -22,6 +22,7 @@
  */
 
 #include "intel_drv.h"
+#include "intel_sideband.h"
 
 /**
  * DOC: CDCLK / RAWCLK
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 1007e589baa0..cd1a848a21ff 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -31,23 +31,26 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/vgaarb.h>
+#include <linux/dma_remapping.h>
+#include <linux/reservation.h>
+
 #include <drm/drm_edid.h>
 #include <drm/drmP.h>
-#include "intel_drv.h"
-#include "intel_frontbuffer.h"
 #include <drm/i915_drm.h>
-#include "i915_drv.h"
-#include "i915_gem_clflush.h"
-#include "intel_dsi.h"
-#include "i915_trace.h"
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_dp_helper.h>
 #include <drm/drm_crtc_helper.h>
 #include <drm/drm_plane_helper.h>
 #include <drm/drm_rect.h>
-#include <linux/dma_remapping.h>
-#include <linux/reservation.h>
+
+#include "i915_drv.h"
+#include "i915_gem_clflush.h"
+#include "i915_trace.h"
+#include "intel_dsi.h"
+#include "intel_drv.h"
+#include "intel_frontbuffer.h"
+#include "intel_sideband.h"
 
 /* Primary plane formats for gen <= 3 */
 static const uint32_t i8xx_primary_formats[] = {
diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
index 3737b18bb209..02dc25906b0f 100644
--- a/drivers/gpu/drm/i915/intel_dp.c
+++ b/drivers/gpu/drm/i915/intel_dp.c
@@ -31,7 +31,9 @@
 #include <linux/types.h>
 #include <linux/notifier.h>
 #include <linux/reboot.h>
+
 #include <asm/byteorder.h>
+
 #include <drm/drmP.h>
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_crtc.h>
@@ -39,9 +41,11 @@
 #include <drm/drm_dp_helper.h>
 #include <drm/drm_edid.h>
 #include <drm/drm_hdcp.h>
-#include "intel_drv.h"
 #include <drm/i915_drm.h>
+
 #include "i915_drv.h"
+#include "intel_drv.h"
+#include "intel_sideband.h"
 
 #define DP_DPRX_ESI_LEN 14
 
diff --git a/drivers/gpu/drm/i915/intel_dpio_phy.c b/drivers/gpu/drm/i915/intel_dpio_phy.c
index 79c449aabc7f..f2f2a953a2d3 100644
--- a/drivers/gpu/drm/i915/intel_dpio_phy.c
+++ b/drivers/gpu/drm/i915/intel_dpio_phy.c
@@ -22,6 +22,7 @@
  */
 
 #include "intel_drv.h"
+#include "intel_sideband.h"
 
 /**
  * DOC: DPIO
diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index 355aa8717af2..626c6791d018 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -23,17 +23,20 @@
  * Author: Jani Nikula <jani.nikula@intel.com>
  */
 
+#include <linux/slab.h>
+#include <linux/gpio/consumer.h>
+
 #include <drm/drmP.h>
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_edid.h>
 #include <drm/i915_drm.h>
 #include <drm/drm_mipi_dsi.h>
-#include <linux/slab.h>
-#include <linux/gpio/consumer.h>
+
 #include "i915_drv.h"
 #include "intel_drv.h"
 #include "intel_dsi.h"
+#include "intel_sideband.h"
 
 /* return pixels in terms of txbyteclkhs */
 static u16 txbyteclkhs(u16 pixels, int bpp, int lane_count,
diff --git a/drivers/gpu/drm/i915/intel_dsi_pll.c b/drivers/gpu/drm/i915/intel_dsi_pll.c
index b73336e7dcd2..ebb3dba75d06 100644
--- a/drivers/gpu/drm/i915/intel_dsi_pll.c
+++ b/drivers/gpu/drm/i915/intel_dsi_pll.c
@@ -26,9 +26,11 @@
  */
 
 #include <linux/kernel.h>
-#include "intel_drv.h"
+
 #include "i915_drv.h"
+#include "intel_drv.h"
 #include "intel_dsi.h"
+#include "intel_sideband.h"
 
 static const u16 lfsr_converts[] = {
 	426, 469, 234, 373, 442, 221, 110, 311, 411,		/* 62 - 70 */
diff --git a/drivers/gpu/drm/i915/intel_dsi_vbt.c b/drivers/gpu/drm/i915/intel_dsi_vbt.c
index 515ab68f319c..a97dcd653696 100644
--- a/drivers/gpu/drm/i915/intel_dsi_vbt.c
+++ b/drivers/gpu/drm/i915/intel_dsi_vbt.c
@@ -24,18 +24,23 @@
  *
  */
 
+#include <linux/gpio/consumer.h>
+#include <linux/slab.h>
+
+#include <asm/intel-mid.h>
+
 #include <drm/drmP.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_edid.h>
 #include <drm/i915_drm.h>
-#include <linux/gpio/consumer.h>
-#include <linux/slab.h>
+
 #include <video/mipi_display.h>
-#include <asm/intel-mid.h>
 #include <video/mipi_display.h>
+
 #include "i915_drv.h"
 #include "intel_drv.h"
 #include "intel_dsi.h"
+#include "intel_sideband.h"
 
 #define MIPI_TRANSFER_MODE_SHIFT	0
 #define MIPI_VIRTUAL_CHANNEL_SHIFT	1
diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c
index 4257209c75f3..0a3ecebdbebd 100644
--- a/drivers/gpu/drm/i915/intel_hdmi.c
+++ b/drivers/gpu/drm/i915/intel_hdmi.c
@@ -30,16 +30,19 @@
 #include <linux/slab.h>
 #include <linux/delay.h>
 #include <linux/hdmi.h>
+
 #include <drm/drmP.h>
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_edid.h>
 #include <drm/drm_hdcp.h>
 #include <drm/drm_scdc_helper.h>
-#include "intel_drv.h"
 #include <drm/i915_drm.h>
 #include <drm/intel_lpe_audio.h>
+
 #include "i915_drv.h"
+#include "intel_drv.h"
+#include "intel_sideband.h"
 
 static struct drm_device *intel_hdmi_to_dev(struct intel_hdmi *intel_hdmi)
 {
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index dda6fac09952..354c1a9f5dd4 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -26,12 +26,15 @@
  */
 
 #include <linux/cpufreq.h>
+#include <linux/module.h>
+
+#include <drm/drm_atomic_helper.h>
 #include <drm/drm_plane_helper.h>
+
 #include "i915_drv.h"
 #include "intel_drv.h"
+#include "intel_sideband.h"
 #include "../../../platform/x86/intel_ips.h"
-#include <linux/module.h>
-#include <drm/drm_atomic_helper.h>
 
 /**
  * DOC: RC6
diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
index 8b92be53b5dc..713124b94fb6 100644
--- a/drivers/gpu/drm/i915/intel_runtime_pm.c
+++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
@@ -31,6 +31,7 @@
 
 #include "i915_drv.h"
 #include "intel_drv.h"
+#include "intel_sideband.h"
 
 /**
  * DOC: runtime pm
diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
index 2d4e48e9e1d5..87e34787939b 100644
--- a/drivers/gpu/drm/i915/intel_sideband.c
+++ b/drivers/gpu/drm/i915/intel_sideband.c
@@ -24,6 +24,8 @@
 
 #include <asm/iosf_mbi.h>
 
+#include "intel_sideband.h"
+
 #include "i915_drv.h"
 #include "intel_drv.h"
 
diff --git a/drivers/gpu/drm/i915/intel_sideband.h b/drivers/gpu/drm/i915/intel_sideband.h
new file mode 100644
index 000000000000..7204648eb9d4
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_sideband.h
@@ -0,0 +1,73 @@
+/* SPDX-License-Identifier: MIT */
+
+#ifndef _INTEL_SIDEBAND_H_
+#define _INTEL_SIDEBAND_H_
+
+#include <linux/types.h>
+
+struct drm_i915_private;
+enum pipe;
+
+enum {
+	VLV_IOSF_SB_BUNIT,
+	VLV_IOSF_SB_CCK,
+	VLV_IOSF_SB_CCU,
+	VLV_IOSF_SB_DPIO,
+	VLV_IOSF_SB_FLISDSI,
+	VLV_IOSF_SB_GPIO,
+	VLV_IOSF_SB_NC,
+	VLV_IOSF_SB_PUNIT,
+};
+
+void vlv_iosf_sb_get(struct drm_i915_private *i915, unsigned long ports);
+u32 vlv_iosf_sb_read(struct drm_i915_private *i915, u8 port, u32 reg);
+void vlv_iosf_sb_write(struct drm_i915_private *i915,
+		       u8 port, u32 reg, u32 val);
+void vlv_iosf_sb_put(struct drm_i915_private *i915, unsigned long ports);
+
+void vlv_punit_get(struct drm_i915_private *i915);
+u32 vlv_punit_read(struct drm_i915_private *i915, u32 addr);
+int vlv_punit_write(struct drm_i915_private *i915, u32 addr, u32 val);
+void vlv_punit_put(struct drm_i915_private *i915);
+
+void vlv_nc_get(struct drm_i915_private *i915);
+u32 vlv_nc_read(struct drm_i915_private *i915, u8 addr);
+void vlv_nc_put(struct drm_i915_private *i915);
+
+void vlv_cck_get(struct drm_i915_private *i915);
+u32 vlv_cck_read(struct drm_i915_private *i915, u32 reg);
+void vlv_cck_write(struct drm_i915_private *i915, u32 reg, u32 val);
+void vlv_cck_put(struct drm_i915_private *i915);
+
+void vlv_ccu_get(struct drm_i915_private *i915);
+u32 vlv_ccu_read(struct drm_i915_private *i915, u32 reg);
+void vlv_ccu_write(struct drm_i915_private *i915, u32 reg, u32 val);
+void vlv_ccu_put(struct drm_i915_private *i915);
+
+void vlv_bunit_get(struct drm_i915_private *i915);
+u32 vlv_bunit_read(struct drm_i915_private *i915, u32 reg);
+void vlv_bunit_write(struct drm_i915_private *i915, u32 reg, u32 val);
+void vlv_bunit_put(struct drm_i915_private *i915);
+
+void vlv_dpio_get(struct drm_i915_private *i915);
+u32 vlv_dpio_read(struct drm_i915_private *i915, enum pipe pipe, int reg);
+void vlv_dpio_write(struct drm_i915_private *i915,
+		    enum pipe pipe, int reg, u32 val);
+void vlv_dpio_put(struct drm_i915_private *i915);
+
+void vlv_flisdsi_get(struct drm_i915_private *i915);
+u32 vlv_flisdsi_read(struct drm_i915_private *i915, u32 reg);
+void vlv_flisdsi_write(struct drm_i915_private *i915, u32 reg, u32 val);
+void vlv_flisdsi_put(struct drm_i915_private *i915);
+
+enum intel_sbi_destination {
+	SBI_ICLK,
+	SBI_MPHY,
+};
+
+u32 intel_sbi_read(struct drm_i915_private *i915, u16 reg,
+		   enum intel_sbi_destination destination);
+void intel_sbi_write(struct drm_i915_private *i915, u16 reg, u32 value,
+		     enum intel_sbi_destination destination);
+
+#endif /* _INTEL_SIDEBAND_H */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 32/71] drm/i915: Merge sbi read/write into a single accessor
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (29 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 31/71] drm/i915: Separate sideband declarations to intel_sideband.h Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 33/71] drm/i915: Merge sandybridge_pcode_(read|write) Chris Wilson
                   ` (20 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Since intel_sideband_read and intel_sideband_write differ by only a
couple of lines (depending on whether we feed the value in or out),
merge the two into a single common accessor.

v2: Restore vlv_flisdsi_read() lost during rebasing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_sideband.c | 91 +++++++++++----------------
 1 file changed, 37 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
index 87e34787939b..427c8c36f0c5 100644
--- a/drivers/gpu/drm/i915/intel_sideband.c
+++ b/drivers/gpu/drm/i915/intel_sideband.c
@@ -309,90 +309,73 @@ void vlv_dpio_put(struct drm_i915_private *dev_priv)
 }
 
 /* SBI access */
-u32 intel_sbi_read(struct drm_i915_private *dev_priv, u16 reg,
-		   enum intel_sbi_destination destination)
+static int intel_sbi_rw(struct drm_i915_private *dev_priv, u16 reg,
+			enum intel_sbi_destination destination,
+			u32 *val, bool is_read)
 {
-	u32 value = 0;
+	u32 cmd;
 
 	lockdep_assert_held(&dev_priv->sb_lock);
 
-	if (intel_wait_for_register(dev_priv,
-				    SBI_CTL_STAT, SBI_BUSY, 0,
-				    100)) {
+	if (intel_wait_for_register_fw(dev_priv,
+				       SBI_CTL_STAT, SBI_BUSY, 0,
+				       100)) {
 		DRM_ERROR("timeout waiting for SBI to become ready\n");
-		return 0;
+		return -EBUSY;
 	}
 
-	I915_WRITE(SBI_ADDR, (reg << 16));
-	I915_WRITE(SBI_DATA, 0);
+	I915_WRITE_FW(SBI_ADDR, (u32)reg << 16);
+	I915_WRITE_FW(SBI_DATA, is_read ? 0 : *val);
 
 	if (destination == SBI_ICLK)
-		value = SBI_CTL_DEST_ICLK | SBI_CTL_OP_CRRD;
+		cmd = SBI_CTL_DEST_ICLK | SBI_CTL_OP_CRRD;
 	else
-		value = SBI_CTL_DEST_MPHY | SBI_CTL_OP_IORD;
-	I915_WRITE(SBI_CTL_STAT, value | SBI_BUSY);
+		cmd = SBI_CTL_DEST_MPHY | SBI_CTL_OP_IORD;
+	if (!is_read)
+		cmd |= BIT(8);
+	I915_WRITE_FW(SBI_CTL_STAT, cmd | SBI_BUSY);
 
-	if (intel_wait_for_register(dev_priv,
-				    SBI_CTL_STAT,
-				    SBI_BUSY,
-				    0,
-				    100)) {
+	if (__intel_wait_for_register_fw(dev_priv,
+					 SBI_CTL_STAT, SBI_BUSY, 0,
+					 100, 100, &cmd)) {
 		DRM_ERROR("timeout waiting for SBI to complete read\n");
-		return 0;
+		return -ETIMEDOUT;
 	}
 
-	if (I915_READ(SBI_CTL_STAT) & SBI_RESPONSE_FAIL) {
+	if (cmd & SBI_RESPONSE_FAIL) {
 		DRM_ERROR("error during SBI read of reg %x\n", reg);
-		return 0;
+		return -ENXIO;
 	}
 
-	return I915_READ(SBI_DATA);
+	if (is_read)
+		*val = I915_READ_FW(SBI_DATA);
+
+	return 0;
 }
 
-void intel_sbi_write(struct drm_i915_private *dev_priv, u16 reg, u32 value,
-		     enum intel_sbi_destination destination)
+u32 intel_sbi_read(struct drm_i915_private *dev_priv, u16 reg,
+		   enum intel_sbi_destination destination)
 {
-	u32 tmp;
-
-	lockdep_assert_held(&dev_priv->sb_lock);
-
-	if (intel_wait_for_register(dev_priv,
-				    SBI_CTL_STAT, SBI_BUSY, 0,
-				    100)) {
-		DRM_ERROR("timeout waiting for SBI to become ready\n");
-		return;
-	}
+	u32 result = 0;
 
-	I915_WRITE(SBI_ADDR, (reg << 16));
-	I915_WRITE(SBI_DATA, value);
+	intel_sbi_rw(dev_priv, reg, destination, &result, true);
 
-	if (destination == SBI_ICLK)
-		tmp = SBI_CTL_DEST_ICLK | SBI_CTL_OP_CRWR;
-	else
-		tmp = SBI_CTL_DEST_MPHY | SBI_CTL_OP_IOWR;
-	I915_WRITE(SBI_CTL_STAT, SBI_BUSY | tmp);
-
-	if (intel_wait_for_register(dev_priv,
-				    SBI_CTL_STAT,
-				    SBI_BUSY,
-				    0,
-				    100)) {
-		DRM_ERROR("timeout waiting for SBI to complete write\n");
-		return;
-	}
+	return result;
+}
 
-	if (I915_READ(SBI_CTL_STAT) & SBI_RESPONSE_FAIL) {
-		DRM_ERROR("error during SBI write of %x to reg %x\n",
-			  value, reg);
-		return;
-	}
+void intel_sbi_write(struct drm_i915_private *dev_priv, u16 reg, u32 value,
+		     enum intel_sbi_destination destination)
+{
+	intel_sbi_rw(dev_priv, reg, destination, &value, false);
 }
 
 u32 vlv_flisdsi_read(struct drm_i915_private *dev_priv, u32 reg)
 {
 	u32 val = 0;
+
 	vlv_sideband_rw(dev_priv, DPIO_DEVFN, IOSF_PORT_FLISDSI, SB_CRRDDA_NP,
 			reg, &val);
+
 	return val;
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 33/71] drm/i915: Merge sandybridge_pcode_(read|write)
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (30 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 32/71] drm/i915: Merge sbi read/write into a single accessor Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 34/71] drm/i915: Move sandybride pcode access to intel_sideband.c Chris Wilson
                   ` (19 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

These routines are identical except in the nature of the value parameter.
For writes it is a pure in-param, but for a read, we need an out-param.
Since they differ in a single line, merge the two routines into one.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c | 114 +++++++++++---------------------
 1 file changed, 40 insertions(+), 74 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 354c1a9f5dd4..2e9c8da96bf7 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -9337,12 +9337,10 @@ void intel_init_pm(struct drm_i915_private *dev_priv)
 	}
 }
 
-static inline int gen6_check_mailbox_status(struct drm_i915_private *dev_priv)
+static inline int gen6_check_mailbox_status(struct drm_i915_private *dev_priv,
+					    u32 mbox)
 {
-	uint32_t flags =
-		I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_ERROR_MASK;
-
-	switch (flags) {
+	switch (mbox & GEN6_PCODE_ERROR_MASK) {
 	case GEN6_PCODE_SUCCESS:
 		return 0;
 	case GEN6_PCODE_UNIMPLEMENTED_CMD:
@@ -9355,17 +9353,15 @@ static inline int gen6_check_mailbox_status(struct drm_i915_private *dev_priv)
 	case GEN6_PCODE_TIMEOUT:
 		return -ETIMEDOUT;
 	default:
-		MISSING_CASE(flags);
+		MISSING_CASE(mbox & GEN6_PCODE_ERROR_MASK);
 		return 0;
 	}
 }
 
-static inline int gen7_check_mailbox_status(struct drm_i915_private *dev_priv)
+static inline int gen7_check_mailbox_status(struct drm_i915_private *dev_priv,
+					    u32 mbox)
 {
-	uint32_t flags =
-		I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_ERROR_MASK;
-
-	switch (flags) {
+	switch (mbox & GEN6_PCODE_ERROR_MASK) {
 	case GEN6_PCODE_SUCCESS:
 		return 0;
 	case GEN6_PCODE_ILLEGAL_CMD:
@@ -9377,18 +9373,21 @@ static inline int gen7_check_mailbox_status(struct drm_i915_private *dev_priv)
 	case GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE:
 		return -EOVERFLOW;
 	default:
-		MISSING_CASE(flags);
+		MISSING_CASE(mbox & GEN6_PCODE_ERROR_MASK);
 		return 0;
 	}
 }
 
-static int __sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val)
+static int __sandybridge_pcode_rw(struct drm_i915_private *dev_priv,
+				  u32 mbox, u32 *val,
+				  int fast_timeout_us,
+				  int slow_timeout_ms,
+				  bool is_read)
 {
-	int status;
-
 	lockdep_assert_held(&dev_priv->sb_lock);
 
-	/* GEN6_PCODE_* are outside of the forcewake domain, we can
+	/*
+	 * GEN6_PCODE_* are outside of the forcewake domain, we can
 	 * use te fw I915_READ variants to reduce the amount of work
 	 * required when reading/writing.
 	 */
@@ -9402,69 +9401,36 @@ static int __sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox,
 
 	if (__intel_wait_for_register_fw(dev_priv,
 					 GEN6_PCODE_MAILBOX, GEN6_PCODE_READY, 0,
-					 500, 0, NULL))
+					 fast_timeout_us,
+					 slow_timeout_ms,
+					 &mbox))
 		return -ETIMEDOUT;
 
-	*val = I915_READ_FW(GEN6_PCODE_DATA);
-	I915_WRITE_FW(GEN6_PCODE_DATA, 0);
+	if (is_read)
+		*val = I915_READ_FW(GEN6_PCODE_DATA);
 
 	if (INTEL_GEN(dev_priv) > 6)
-		status = gen7_check_mailbox_status(dev_priv);
+		return gen7_check_mailbox_status(dev_priv, mbox);
 	else
-		status = gen6_check_mailbox_status(dev_priv);
-
-	return status;
+		return gen6_check_mailbox_status(dev_priv, mbox);
 }
 
 int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val)
 {
-	int status;
+	int err;
 
 	mutex_lock(&dev_priv->sb_lock);
-	status = __sandybridge_pcode_read(dev_priv, mbox, val);
+	err = __sandybridge_pcode_rw(dev_priv, mbox, val,
+				     500, 0,
+				     true);
 	mutex_unlock(&dev_priv->sb_lock);
 
-	if (status) {
+	if (err) {
 		DRM_DEBUG_DRIVER("warning: pcode (read from mbox %x) mailbox access failed for %ps: %d\n",
-				 mbox, __builtin_return_address(0), status);
+				 mbox, __builtin_return_address(0), err);
 	}
 
-	return status;
-}
-
-static int __sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
-					     u32 mbox, u32 val,
-					     int fast_timeout_us,
-					     int slow_timeout_ms)
-{
-	int status;
-
-	/* GEN6_PCODE_* are outside of the forcewake domain, we can
-	 * use te fw I915_READ variants to reduce the amount of work
-	 * required when reading/writing.
-	 */
-
-	if (I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY)
-		return -EAGAIN;
-
-	I915_WRITE_FW(GEN6_PCODE_DATA, val);
-	I915_WRITE_FW(GEN6_PCODE_DATA1, 0);
-	I915_WRITE_FW(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | mbox);
-
-	if (__intel_wait_for_register_fw(dev_priv,
-					 GEN6_PCODE_MAILBOX, GEN6_PCODE_READY, 0,
-					 fast_timeout_us, slow_timeout_ms,
-					 NULL))
-		return -ETIMEDOUT;
-
-	I915_WRITE_FW(GEN6_PCODE_DATA, 0);
-
-	if (INTEL_GEN(dev_priv) > 6)
-		status = gen7_check_mailbox_status(dev_priv);
-	else
-		status = gen6_check_mailbox_status(dev_priv);
-
-	return status;
+	return err;
 }
 
 int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
@@ -9472,31 +9438,31 @@ int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
 				    int fast_timeout_us,
 				    int slow_timeout_ms)
 {
-	int status;
+	int err;
 
 	mutex_lock(&dev_priv->sb_lock);
-	status = __sandybridge_pcode_write_timeout(dev_priv, mbox, val,
-						   fast_timeout_us,
-						   slow_timeout_ms);
+	err = __sandybridge_pcode_rw(dev_priv, mbox, &val,
+				     fast_timeout_us, slow_timeout_ms,
+				     false);
 	mutex_unlock(&dev_priv->sb_lock);
 
-	if (status) {
+	if (err) {
 		DRM_DEBUG_DRIVER("warning: pcode (write of 0x%08x to mbox %x) mailbox access failed for %ps: %d\n",
-				 val, mbox, __builtin_return_address(0), status);
+				 val, mbox, __builtin_return_address(0), err);
 	}
 
-	return status;
+	return err;
 }
 
 static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
 				  u32 request, u32 reply_mask, u32 reply,
 				  u32 *status)
 {
-	u32 val = request;
-
-	*status = __sandybridge_pcode_read(dev_priv, mbox, &val);
+	*status = __sandybridge_pcode_rw(dev_priv, mbox, &request,
+					 500, 0,
+					 true);
 
-	return *status || ((val & reply_mask) == reply);
+	return *status || ((request & reply_mask) == reply);
 }
 
 /**
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 34/71] drm/i915: Move sandybride pcode access to intel_sideband.c
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (31 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 33/71] drm/i915: Merge sandybridge_pcode_(read|write) Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 35/71] drm/i915: Mark up Ironlake ips with rpm wakerefs Chris Wilson
                   ` (18 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

sandybride_pcode is another sideband, so move it to their new home.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |  10 --
 drivers/gpu/drm/i915/intel_hdcp.c     |   3 +-
 drivers/gpu/drm/i915/intel_pm.c       | 194 --------------------------
 drivers/gpu/drm/i915/intel_sideband.c | 194 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_sideband.h |  10 ++
 5 files changed, 206 insertions(+), 205 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ff810ee016a8..ac89206e8906 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3440,16 +3440,6 @@ intel_display_capture_error_state(struct drm_i915_private *dev_priv);
 extern void intel_display_print_error_state(struct drm_i915_error_state_buf *e,
 					    struct intel_display_error_state *error);
 
-int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val);
-int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv, u32 mbox,
-				    u32 val, int fast_timeout_us,
-				    int slow_timeout_ms);
-#define sandybridge_pcode_write(dev_priv, mbox, val)	\
-	sandybridge_pcode_write_timeout(dev_priv, mbox, val, 500, 0)
-
-int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
-		      u32 reply_mask, u32 reply, int timeout_base_ms);
-
 /* intel_dpio_phy.c */
 void bxt_port_to_phy_channel(struct drm_i915_private *dev_priv, enum port port,
 			     enum dpio_phy *phy, enum dpio_channel *ch);
diff --git a/drivers/gpu/drm/i915/intel_hdcp.c b/drivers/gpu/drm/i915/intel_hdcp.c
index 6fa39d4521ca..62215adcdb1a 100644
--- a/drivers/gpu/drm/i915/intel_hdcp.c
+++ b/drivers/gpu/drm/i915/intel_hdcp.c
@@ -11,8 +11,9 @@
 #include <linux/i2c.h>
 #include <linux/random.h>
 
-#include "intel_drv.h"
 #include "i915_reg.h"
+#include "intel_drv.h"
+#include "intel_sideband.h"
 
 #define KEY_LOAD_TRIES	5
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 2e9c8da96bf7..d1d93bacc0ea 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -9337,200 +9337,6 @@ void intel_init_pm(struct drm_i915_private *dev_priv)
 	}
 }
 
-static inline int gen6_check_mailbox_status(struct drm_i915_private *dev_priv,
-					    u32 mbox)
-{
-	switch (mbox & GEN6_PCODE_ERROR_MASK) {
-	case GEN6_PCODE_SUCCESS:
-		return 0;
-	case GEN6_PCODE_UNIMPLEMENTED_CMD:
-		return -ENODEV;
-	case GEN6_PCODE_ILLEGAL_CMD:
-		return -ENXIO;
-	case GEN6_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE:
-	case GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE:
-		return -EOVERFLOW;
-	case GEN6_PCODE_TIMEOUT:
-		return -ETIMEDOUT;
-	default:
-		MISSING_CASE(mbox & GEN6_PCODE_ERROR_MASK);
-		return 0;
-	}
-}
-
-static inline int gen7_check_mailbox_status(struct drm_i915_private *dev_priv,
-					    u32 mbox)
-{
-	switch (mbox & GEN6_PCODE_ERROR_MASK) {
-	case GEN6_PCODE_SUCCESS:
-		return 0;
-	case GEN6_PCODE_ILLEGAL_CMD:
-		return -ENXIO;
-	case GEN7_PCODE_TIMEOUT:
-		return -ETIMEDOUT;
-	case GEN7_PCODE_ILLEGAL_DATA:
-		return -EINVAL;
-	case GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE:
-		return -EOVERFLOW;
-	default:
-		MISSING_CASE(mbox & GEN6_PCODE_ERROR_MASK);
-		return 0;
-	}
-}
-
-static int __sandybridge_pcode_rw(struct drm_i915_private *dev_priv,
-				  u32 mbox, u32 *val,
-				  int fast_timeout_us,
-				  int slow_timeout_ms,
-				  bool is_read)
-{
-	lockdep_assert_held(&dev_priv->sb_lock);
-
-	/*
-	 * GEN6_PCODE_* are outside of the forcewake domain, we can
-	 * use te fw I915_READ variants to reduce the amount of work
-	 * required when reading/writing.
-	 */
-
-	if (I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY)
-		return -EAGAIN;
-
-	I915_WRITE_FW(GEN6_PCODE_DATA, *val);
-	I915_WRITE_FW(GEN6_PCODE_DATA1, 0);
-	I915_WRITE_FW(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | mbox);
-
-	if (__intel_wait_for_register_fw(dev_priv,
-					 GEN6_PCODE_MAILBOX, GEN6_PCODE_READY, 0,
-					 fast_timeout_us,
-					 slow_timeout_ms,
-					 &mbox))
-		return -ETIMEDOUT;
-
-	if (is_read)
-		*val = I915_READ_FW(GEN6_PCODE_DATA);
-
-	if (INTEL_GEN(dev_priv) > 6)
-		return gen7_check_mailbox_status(dev_priv, mbox);
-	else
-		return gen6_check_mailbox_status(dev_priv, mbox);
-}
-
-int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val)
-{
-	int err;
-
-	mutex_lock(&dev_priv->sb_lock);
-	err = __sandybridge_pcode_rw(dev_priv, mbox, val,
-				     500, 0,
-				     true);
-	mutex_unlock(&dev_priv->sb_lock);
-
-	if (err) {
-		DRM_DEBUG_DRIVER("warning: pcode (read from mbox %x) mailbox access failed for %ps: %d\n",
-				 mbox, __builtin_return_address(0), err);
-	}
-
-	return err;
-}
-
-int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
-				    u32 mbox, u32 val,
-				    int fast_timeout_us,
-				    int slow_timeout_ms)
-{
-	int err;
-
-	mutex_lock(&dev_priv->sb_lock);
-	err = __sandybridge_pcode_rw(dev_priv, mbox, &val,
-				     fast_timeout_us, slow_timeout_ms,
-				     false);
-	mutex_unlock(&dev_priv->sb_lock);
-
-	if (err) {
-		DRM_DEBUG_DRIVER("warning: pcode (write of 0x%08x to mbox %x) mailbox access failed for %ps: %d\n",
-				 val, mbox, __builtin_return_address(0), err);
-	}
-
-	return err;
-}
-
-static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
-				  u32 request, u32 reply_mask, u32 reply,
-				  u32 *status)
-{
-	*status = __sandybridge_pcode_rw(dev_priv, mbox, &request,
-					 500, 0,
-					 true);
-
-	return *status || ((request & reply_mask) == reply);
-}
-
-/**
- * skl_pcode_request - send PCODE request until acknowledgment
- * @dev_priv: device private
- * @mbox: PCODE mailbox ID the request is targeted for
- * @request: request ID
- * @reply_mask: mask used to check for request acknowledgment
- * @reply: value used to check for request acknowledgment
- * @timeout_base_ms: timeout for polling with preemption enabled
- *
- * Keep resending the @request to @mbox until PCODE acknowledges it, PCODE
- * reports an error or an overall timeout of @timeout_base_ms+50 ms expires.
- * The request is acknowledged once the PCODE reply dword equals @reply after
- * applying @reply_mask. Polling is first attempted with preemption enabled
- * for @timeout_base_ms and if this times out for another 50 ms with
- * preemption disabled.
- *
- * Returns 0 on success, %-ETIMEDOUT in case of a timeout, <0 in case of some
- * other error as reported by PCODE.
- */
-int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
-		      u32 reply_mask, u32 reply, int timeout_base_ms)
-{
-	u32 status;
-	int ret;
-
-	mutex_lock(&dev_priv->sb_lock);
-
-#define COND skl_pcode_try_request(dev_priv, mbox, request, reply_mask, reply, \
-				   &status)
-
-	/*
-	 * Prime the PCODE by doing a request first. Normally it guarantees
-	 * that a subsequent request, at most @timeout_base_ms later, succeeds.
-	 * _wait_for() doesn't guarantee when its passed condition is evaluated
-	 * first, so send the first request explicitly.
-	 */
-	if (COND) {
-		ret = 0;
-		goto out;
-	}
-	ret = _wait_for(COND, timeout_base_ms * 1000, 10, 10);
-	if (!ret)
-		goto out;
-
-	/*
-	 * The above can time out if the number of requests was low (2 in the
-	 * worst case) _and_ PCODE was busy for some reason even after a
-	 * (queued) request and @timeout_base_ms delay. As a workaround retry
-	 * the poll with preemption disabled to maximize the number of
-	 * requests. Increase the timeout from @timeout_base_ms to 50ms to
-	 * account for interrupts that could reduce the number of these
-	 * requests, and for any quirks of the PCODE firmware that delays
-	 * the request completion.
-	 */
-	DRM_DEBUG_KMS("PCODE timeout, retrying with preemption disabled\n");
-	WARN_ON_ONCE(timeout_base_ms > 3);
-	preempt_disable();
-	ret = wait_for_atomic(COND, 50);
-	preempt_enable();
-
-out:
-	mutex_unlock(&dev_priv->sb_lock);
-	return ret ? ret : status;
-#undef COND
-}
-
 static int byt_gpu_freq(struct drm_i915_private *dev_priv, int val)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
diff --git a/drivers/gpu/drm/i915/intel_sideband.c b/drivers/gpu/drm/i915/intel_sideband.c
index 427c8c36f0c5..eb2850736e39 100644
--- a/drivers/gpu/drm/i915/intel_sideband.c
+++ b/drivers/gpu/drm/i915/intel_sideband.c
@@ -394,3 +394,197 @@ void vlv_flisdsi_put(struct drm_i915_private *dev_priv)
 {
 	vlv_iosf_sb_put(dev_priv, BIT(VLV_IOSF_SB_FLISDSI));
 }
+
+static inline int gen6_check_mailbox_status(struct drm_i915_private *dev_priv,
+					    u32 mbox)
+{
+	switch (mbox & GEN6_PCODE_ERROR_MASK) {
+	case GEN6_PCODE_SUCCESS:
+		return 0;
+	case GEN6_PCODE_UNIMPLEMENTED_CMD:
+		return -ENODEV;
+	case GEN6_PCODE_ILLEGAL_CMD:
+		return -ENXIO;
+	case GEN6_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE:
+	case GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE:
+		return -EOVERFLOW;
+	case GEN6_PCODE_TIMEOUT:
+		return -ETIMEDOUT;
+	default:
+		MISSING_CASE(mbox & GEN6_PCODE_ERROR_MASK);
+		return 0;
+	}
+}
+
+static inline int gen7_check_mailbox_status(struct drm_i915_private *dev_priv,
+					    u32 mbox)
+{
+	switch (mbox & GEN6_PCODE_ERROR_MASK) {
+	case GEN6_PCODE_SUCCESS:
+		return 0;
+	case GEN6_PCODE_ILLEGAL_CMD:
+		return -ENXIO;
+	case GEN7_PCODE_TIMEOUT:
+		return -ETIMEDOUT;
+	case GEN7_PCODE_ILLEGAL_DATA:
+		return -EINVAL;
+	case GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE:
+		return -EOVERFLOW;
+	default:
+		MISSING_CASE(mbox & GEN6_PCODE_ERROR_MASK);
+		return 0;
+	}
+}
+
+static int __sandybridge_pcode_rw(struct drm_i915_private *dev_priv,
+				  u32 mbox, u32 *val,
+				  int fast_timeout_us,
+				  int slow_timeout_ms,
+				  bool is_read)
+{
+	lockdep_assert_held(&dev_priv->sb_lock);
+
+	/*
+	 * GEN6_PCODE_* are outside of the forcewake domain, we can
+	 * use te fw I915_READ variants to reduce the amount of work
+	 * required when reading/writing.
+	 */
+
+	if (I915_READ_FW(GEN6_PCODE_MAILBOX) & GEN6_PCODE_READY)
+		return -EAGAIN;
+
+	I915_WRITE_FW(GEN6_PCODE_DATA, *val);
+	I915_WRITE_FW(GEN6_PCODE_DATA1, 0);
+	I915_WRITE_FW(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | mbox);
+
+	if (__intel_wait_for_register_fw(dev_priv,
+					 GEN6_PCODE_MAILBOX, GEN6_PCODE_READY, 0,
+					 fast_timeout_us,
+					 slow_timeout_ms,
+					 &mbox))
+		return -ETIMEDOUT;
+
+	if (is_read)
+		*val = I915_READ_FW(GEN6_PCODE_DATA);
+
+	if (INTEL_GEN(dev_priv) > 6)
+		return gen7_check_mailbox_status(dev_priv, mbox);
+	else
+		return gen6_check_mailbox_status(dev_priv, mbox);
+}
+
+int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val)
+{
+	int err;
+
+	mutex_lock(&dev_priv->sb_lock);
+	err = __sandybridge_pcode_rw(dev_priv, mbox, val,
+				     500, 0,
+				     true);
+	mutex_unlock(&dev_priv->sb_lock);
+
+	if (err) {
+		DRM_DEBUG_DRIVER("warning: pcode (read from mbox %x) mailbox access failed for %ps: %d\n",
+				 mbox, __builtin_return_address(0), err);
+	}
+
+	return err;
+}
+
+int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
+				    u32 mbox, u32 val,
+				    int fast_timeout_us,
+				    int slow_timeout_ms)
+{
+	int err;
+
+	mutex_lock(&dev_priv->sb_lock);
+	err = __sandybridge_pcode_rw(dev_priv, mbox, &val,
+				     fast_timeout_us, slow_timeout_ms,
+				     false);
+	mutex_unlock(&dev_priv->sb_lock);
+
+	if (err) {
+		DRM_DEBUG_DRIVER("warning: pcode (write of 0x%08x to mbox %x) mailbox access failed for %ps: %d\n",
+				 val, mbox, __builtin_return_address(0), err);
+	}
+
+	return err;
+}
+
+static bool skl_pcode_try_request(struct drm_i915_private *dev_priv, u32 mbox,
+				  u32 request, u32 reply_mask, u32 reply,
+				  u32 *status)
+{
+	*status = __sandybridge_pcode_rw(dev_priv, mbox, &request,
+					 500, 0,
+					 true);
+
+	return *status || ((request & reply_mask) == reply);
+}
+
+/**
+ * skl_pcode_request - send PCODE request until acknowledgment
+ * @dev_priv: device private
+ * @mbox: PCODE mailbox ID the request is targeted for
+ * @request: request ID
+ * @reply_mask: mask used to check for request acknowledgment
+ * @reply: value used to check for request acknowledgment
+ * @timeout_base_ms: timeout for polling with preemption enabled
+ *
+ * Keep resending the @request to @mbox until PCODE acknowledges it, PCODE
+ * reports an error or an overall timeout of @timeout_base_ms+50 ms expires.
+ * The request is acknowledged once the PCODE reply dword equals @reply after
+ * applying @reply_mask. Polling is first attempted with preemption enabled
+ * for @timeout_base_ms and if this times out for another 50 ms with
+ * preemption disabled.
+ *
+ * Returns 0 on success, %-ETIMEDOUT in case of a timeout, <0 in case of some
+ * other error as reported by PCODE.
+ */
+int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
+		      u32 reply_mask, u32 reply, int timeout_base_ms)
+{
+	u32 status;
+	int ret;
+
+	mutex_lock(&dev_priv->sb_lock);
+
+#define COND skl_pcode_try_request(dev_priv, mbox, request, reply_mask, reply, \
+				   &status)
+
+	/*
+	 * Prime the PCODE by doing a request first. Normally it guarantees
+	 * that a subsequent request, at most @timeout_base_ms later, succeeds.
+	 * _wait_for() doesn't guarantee when its passed condition is evaluated
+	 * first, so send the first request explicitly.
+	 */
+	if (COND) {
+		ret = 0;
+		goto out;
+	}
+	ret = _wait_for(COND, timeout_base_ms * 1000, 10, 10);
+	if (!ret)
+		goto out;
+
+	/*
+	 * The above can time out if the number of requests was low (2 in the
+	 * worst case) _and_ PCODE was busy for some reason even after a
+	 * (queued) request and @timeout_base_ms delay. As a workaround retry
+	 * the poll with preemption disabled to maximize the number of
+	 * requests. Increase the timeout from @timeout_base_ms to 50ms to
+	 * account for interrupts that could reduce the number of these
+	 * requests, and for any quirks of the PCODE firmware that delays
+	 * the request completion.
+	 */
+	DRM_DEBUG_KMS("PCODE timeout, retrying with preemption disabled\n");
+	WARN_ON_ONCE(timeout_base_ms > 3);
+	preempt_disable();
+	ret = wait_for_atomic(COND, 50);
+	preempt_enable();
+
+out:
+	mutex_unlock(&dev_priv->sb_lock);
+	return ret ? ret : status;
+#undef COND
+}
diff --git a/drivers/gpu/drm/i915/intel_sideband.h b/drivers/gpu/drm/i915/intel_sideband.h
index 7204648eb9d4..16a708d1c1d8 100644
--- a/drivers/gpu/drm/i915/intel_sideband.h
+++ b/drivers/gpu/drm/i915/intel_sideband.h
@@ -70,4 +70,14 @@ u32 intel_sbi_read(struct drm_i915_private *i915, u16 reg,
 void intel_sbi_write(struct drm_i915_private *i915, u16 reg, u32 value,
 		     enum intel_sbi_destination destination);
 
+int sandybridge_pcode_read(struct drm_i915_private *i915, u32 mbox, u32 *val);
+int sandybridge_pcode_write_timeout(struct drm_i915_private *i915, u32 mbox,
+				    u32 val, int fast_timeout_us,
+				    int slow_timeout_ms);
+#define sandybridge_pcode_write(i915, mbox, val)	\
+	sandybridge_pcode_write_timeout(i915, mbox, val, 500, 0)
+
+int skl_pcode_request(struct drm_i915_private *i915, u32 mbox, u32 request,
+		      u32 reply_mask, u32 reply, int timeout_base_ms);
+
 #endif /* _INTEL_SIDEBAND_H */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 35/71] drm/i915: Mark up Ironlake ips with rpm wakerefs
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (32 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 34/71] drm/i915: Move sandybride pcode access to intel_sideband.c Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 36/71] drm/i915: Record logical context support in driver caps Chris Wilson
                   ` (17 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Currently Ironlake operates under the assumption that rpm awake (and its
error checking is disabled). As such, we have missed a few places where we
access registers without taking the rpm wakeref and thus trigger
warnings. intel_ips being one culprit.

As this involved adding a potentially sleeping rpm_get, we have to
rearrange the spinlocks slightly and so switch to acquiring a device-ref
under the spinlock rather than hold the spinlock for the whole
operation. To be consistent, we make the change in pattern common to the
intel_ips interface even though this adds a few more atomic operations
than necessary in a few cases.

v2: Sagar noted the mb around setting mch_dev were overkill as we only
need ordering there, and that i915_emon_status was still using
struct_mutex for no reason, but lacked rpm.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  19 ++--
 drivers/gpu/drm/i915/i915_drv.c     |   3 +
 drivers/gpu/drm/i915/intel_pm.c     | 138 ++++++++++++++--------------
 3 files changed, 81 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b61de2838bd7..b4148723f8f4 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1768,22 +1768,19 @@ static int i915_sr_status(struct seq_file *m, void *unused)
 
 static int i915_emon_status(struct seq_file *m, void *unused)
 {
-	struct drm_i915_private *dev_priv = node_to_i915(m->private);
-	struct drm_device *dev = &dev_priv->drm;
+	struct drm_i915_private *i915 = node_to_i915(m->private);
 	unsigned long temp, chipset, gfx;
-	int ret;
 
-	if (!IS_GEN5(dev_priv))
+	if (!IS_GEN5(i915))
 		return -ENODEV;
 
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
+	intel_runtime_pm_get(i915);
 
-	temp = i915_mch_val(dev_priv);
-	chipset = i915_chipset_val(dev_priv);
-	gfx = i915_gfx_val(dev_priv);
-	mutex_unlock(&dev->struct_mutex);
+	temp = i915_mch_val(i915);
+	chipset = i915_chipset_val(i915);
+	gfx = i915_gfx_val(i915);
+
+	intel_runtime_pm_put(i915);
 
 	seq_printf(m, "GMCH temp: %ld\n", temp);
 	seq_printf(m, "Chipset power: %ld\n", chipset);
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4d6a45f20e42..7c2ef128adcd 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1443,6 +1443,9 @@ void i915_driver_unload(struct drm_device *dev)
 
 	i915_driver_unregister(dev_priv);
 
+	/* Flush any external code that still may be under the RCU lock */
+	synchronize_rcu();
+
 	if (i915_gem_suspend(dev_priv))
 		DRM_ERROR("failed to idle hardware; continuing to unload!\n");
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d1d93bacc0ea..4f549c7cdd19 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6107,10 +6107,6 @@ void intel_init_ipc(struct drm_i915_private *dev_priv)
  */
 DEFINE_SPINLOCK(mchdev_lock);
 
-/* Global for IPS driver to get at the current i915 device. Protected by
- * mchdev_lock. */
-static struct drm_i915_private *i915_mch_dev;
-
 bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val)
 {
 	u16 rgvswctl;
@@ -7757,11 +7753,13 @@ unsigned long i915_chipset_val(struct drm_i915_private *dev_priv)
 	if (!IS_GEN5(dev_priv))
 		return 0;
 
+	intel_runtime_pm_get(dev_priv);
 	spin_lock_irq(&mchdev_lock);
 
 	val = __i915_chipset_val(dev_priv);
 
 	spin_unlock_irq(&mchdev_lock);
+	intel_runtime_pm_put(dev_priv);
 
 	return val;
 }
@@ -7841,11 +7839,13 @@ void i915_update_gfx_val(struct drm_i915_private *dev_priv)
 	if (!IS_GEN5(dev_priv))
 		return;
 
+	intel_runtime_pm_get(dev_priv);
 	spin_lock_irq(&mchdev_lock);
 
 	__i915_update_gfx_val(dev_priv);
 
 	spin_unlock_irq(&mchdev_lock);
+	intel_runtime_pm_put(dev_priv);
 }
 
 static unsigned long __i915_gfx_val(struct drm_i915_private *dev_priv)
@@ -7892,15 +7892,32 @@ unsigned long i915_gfx_val(struct drm_i915_private *dev_priv)
 	if (!IS_GEN5(dev_priv))
 		return 0;
 
+	intel_runtime_pm_get(dev_priv);
 	spin_lock_irq(&mchdev_lock);
 
 	val = __i915_gfx_val(dev_priv);
 
 	spin_unlock_irq(&mchdev_lock);
+	intel_runtime_pm_put(dev_priv);
 
 	return val;
 }
 
+static struct drm_i915_private *i915_mch_dev;
+
+static struct drm_i915_private *mchdev_get(void)
+{
+	struct drm_i915_private *i915;
+
+	rcu_read_lock();
+	i915 = i915_mch_dev;
+	if (!kref_get_unless_zero(&i915->drm.ref))
+		i915 = NULL;
+	rcu_read_unlock();
+
+	return i915;
+}
+
 /**
  * i915_read_mch_val - return value for IPS use
  *
@@ -7909,23 +7926,22 @@ unsigned long i915_gfx_val(struct drm_i915_private *dev_priv)
  */
 unsigned long i915_read_mch_val(void)
 {
-	struct drm_i915_private *dev_priv;
-	unsigned long chipset_val, graphics_val, ret = 0;
-
-	spin_lock_irq(&mchdev_lock);
-	if (!i915_mch_dev)
-		goto out_unlock;
-	dev_priv = i915_mch_dev;
-
-	chipset_val = __i915_chipset_val(dev_priv);
-	graphics_val = __i915_gfx_val(dev_priv);
+	struct drm_i915_private *i915;
+	unsigned long chipset_val, graphics_val;
 
-	ret = chipset_val + graphics_val;
+	i915 = mchdev_get();
+	if (!i915)
+		return 0;
 
-out_unlock:
+	intel_runtime_pm_get(i915);
+	spin_lock_irq(&mchdev_lock);
+	chipset_val = __i915_chipset_val(i915);
+	graphics_val = __i915_gfx_val(i915);
 	spin_unlock_irq(&mchdev_lock);
+	intel_runtime_pm_put(i915);
 
-	return ret;
+	drm_dev_put(&i915->drm);
+	return chipset_val + graphics_val;
 }
 EXPORT_SYMBOL_GPL(i915_read_mch_val);
 
@@ -7936,23 +7952,19 @@ EXPORT_SYMBOL_GPL(i915_read_mch_val);
  */
 bool i915_gpu_raise(void)
 {
-	struct drm_i915_private *dev_priv;
-	bool ret = true;
-
-	spin_lock_irq(&mchdev_lock);
-	if (!i915_mch_dev) {
-		ret = false;
-		goto out_unlock;
-	}
-	dev_priv = i915_mch_dev;
+	struct drm_i915_private *i915;
 
-	if (dev_priv->ips.max_delay > dev_priv->ips.fmax)
-		dev_priv->ips.max_delay--;
+	i915 = mchdev_get();
+	if (!i915)
+		return false;
 
-out_unlock:
+	spin_lock_irq(&mchdev_lock);
+	if (i915->ips.max_delay > i915->ips.fmax)
+		i915->ips.max_delay--;
 	spin_unlock_irq(&mchdev_lock);
 
-	return ret;
+	drm_dev_put(&i915->drm);
+	return true;
 }
 EXPORT_SYMBOL_GPL(i915_gpu_raise);
 
@@ -7964,23 +7976,19 @@ EXPORT_SYMBOL_GPL(i915_gpu_raise);
  */
 bool i915_gpu_lower(void)
 {
-	struct drm_i915_private *dev_priv;
-	bool ret = true;
+	struct drm_i915_private *i915;
 
-	spin_lock_irq(&mchdev_lock);
-	if (!i915_mch_dev) {
-		ret = false;
-		goto out_unlock;
-	}
-	dev_priv = i915_mch_dev;
-
-	if (dev_priv->ips.max_delay < dev_priv->ips.min_delay)
-		dev_priv->ips.max_delay++;
+	i915 = mchdev_get();
+	if (!i915)
+		return false;
 
-out_unlock:
+	spin_lock_irq(&mchdev_lock);
+	if (i915->ips.max_delay < i915->ips.min_delay)
+		i915->ips.max_delay++;
 	spin_unlock_irq(&mchdev_lock);
 
-	return ret;
+	drm_dev_put(&i915->drm);
+	return true;
 }
 EXPORT_SYMBOL_GPL(i915_gpu_lower);
 
@@ -7991,13 +7999,16 @@ EXPORT_SYMBOL_GPL(i915_gpu_lower);
  */
 bool i915_gpu_busy(void)
 {
-	bool ret = false;
+	struct drm_i915_private *i915;
+	bool ret;
 
-	spin_lock_irq(&mchdev_lock);
-	if (i915_mch_dev)
-		ret = i915_mch_dev->gt.awake;
-	spin_unlock_irq(&mchdev_lock);
+	i915 = mchdev_get();
+	if (!i915)
+		return false;
+
+	ret = i915->gt.awake;
 
+	drm_dev_put(&i915->drm);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(i915_gpu_busy);
@@ -8010,24 +8021,19 @@ EXPORT_SYMBOL_GPL(i915_gpu_busy);
  */
 bool i915_gpu_turbo_disable(void)
 {
-	struct drm_i915_private *dev_priv;
-	bool ret = true;
-
-	spin_lock_irq(&mchdev_lock);
-	if (!i915_mch_dev) {
-		ret = false;
-		goto out_unlock;
-	}
-	dev_priv = i915_mch_dev;
-
-	dev_priv->ips.max_delay = dev_priv->ips.fstart;
+	struct drm_i915_private *i915;
+	bool ret;
 
-	if (!ironlake_set_drps(dev_priv, dev_priv->ips.fstart))
-		ret = false;
+	i915 = mchdev_get();
+	if (!i915)
+		return false;
 
-out_unlock:
+	spin_lock_irq(&mchdev_lock);
+	i915->ips.max_delay = i915->ips.fstart;
+	ret = ironlake_set_drps(i915, i915->ips.fstart);
 	spin_unlock_irq(&mchdev_lock);
 
+	drm_dev_put(&i915->drm);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(i915_gpu_turbo_disable);
@@ -8056,18 +8062,14 @@ void intel_gpu_ips_init(struct drm_i915_private *dev_priv)
 {
 	/* We only register the i915 ips part with intel-ips once everything is
 	 * set up, to avoid intel-ips sneaking in and reading bogus values. */
-	spin_lock_irq(&mchdev_lock);
-	i915_mch_dev = dev_priv;
-	spin_unlock_irq(&mchdev_lock);
+	rcu_assign_pointer(i915_mch_dev, dev_priv);
 
 	ips_ping_for_i915_load();
 }
 
 void intel_gpu_ips_teardown(void)
 {
-	spin_lock_irq(&mchdev_lock);
-	i915_mch_dev = NULL;
-	spin_unlock_irq(&mchdev_lock);
+	rcu_assign_pointer(i915_mch_dev, NULL);
 }
 
 static void intel_init_emon(struct drm_i915_private *dev_priv)
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 36/71] drm/i915: Record logical context support in driver caps
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (33 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 35/71] drm/i915: Mark up Ironlake ips with rpm wakerefs Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 37/71] drm/i915: Generalize i915_gem_sanitize() to reset contexts Chris Wilson
                   ` (16 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Avoid looking at the magical engines[RCS] to decide if the HW and driver
supports logical contexts, and instead record that knowledge during
initialisation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h          | 1 +
 drivers/gpu/drm/i915/i915_gem_context.c  | 5 ++---
 drivers/gpu/drm/i915/intel_device_info.c | 1 +
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 drivers/gpu/drm/i915/intel_engine_cs.c   | 2 ++
 5 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ac89206e8906..41be5a0b6032 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2296,6 +2296,7 @@ intel_info(const struct drm_i915_private *dev_priv)
 }
 
 #define INTEL_INFO(dev_priv)	intel_info((dev_priv))
+#define DRIVER_CAPS(dev_priv)	(&(dev_priv)->caps)
 
 #define INTEL_GEN(dev_priv)	((dev_priv)->info.gen)
 #define INTEL_DEVID(dev_priv)	((dev_priv)->info.device_id)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index dccae45211d1..fe937e3b1d9e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -503,8 +503,7 @@ int i915_gem_contexts_init(struct drm_i915_private *dev_priv)
 	}
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
-			 dev_priv->engine[RCS]->context_size ? "logical" :
-			 "fake");
+			 DRIVER_CAPS(dev_priv)->has_contexts ? "logical" : "fake");
 	return 0;
 }
 
@@ -657,7 +656,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	struct i915_gem_context *ctx;
 	int ret;
 
-	if (!dev_priv->engine[RCS]->context_size)
+	if (!DRIVER_CAPS(dev_priv)->has_contexts)
 		return -ENODEV;
 
 	if (args->pad != 0)
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index 0fd13df424cf..f1395ddcdda0 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -858,6 +858,7 @@ void intel_device_info_runtime_init(struct intel_device_info *info)
 void intel_driver_caps_print(const struct intel_driver_caps *caps,
 			     struct drm_printer *p)
 {
+	drm_printf(p, "Has contexts? %s\n", yesno(caps->has_contexts));
 	drm_printf(p, "scheduler: %x\n", caps->scheduler);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 933e31669557..3f9881c548ef 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -186,6 +186,7 @@ struct intel_device_info {
 
 struct intel_driver_caps {
 	unsigned int scheduler;
+	bool has_contexts:1;
 };
 
 static inline unsigned int sseu_subslice_total(const struct sseu_dev_info *sseu)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 61dcedddb799..2a972efb5dfb 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -302,6 +302,8 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 							   engine->class);
 	if (WARN_ON(engine->context_size > BIT(20)))
 		engine->context_size = 0;
+	if (engine->context_size)
+		DRIVER_CAPS(dev_priv)->has_contexts = true;
 
 	/* Nothing to do here, execute in order of dependencies */
 	engine->schedule = NULL;
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 37/71] drm/i915: Generalize i915_gem_sanitize() to reset contexts
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (34 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 36/71] drm/i915: Record logical context support in driver caps Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 38/71] drm/i915: Enable render context support for Ironlake (gen5) Chris Wilson
                   ` (15 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

While we believe that we need to always reset the GPU to scrub the state
on transition to/from the driver, it is essential when we enable contexts.
Generalize the gen test to be on context-support instead.

References: d2b4b97933f5 ("drm/i915: Record the default hw state after reset upon load"
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 344e3d98acd5..9cc9d7031684 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4926,7 +4926,7 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	 * it may impact the display and we are uncertain about the stability
 	 * of the reset, so this could be applied to even earlier gen.
 	 */
-	if (INTEL_GEN(i915) >= 5 && intel_has_gpu_reset(i915))
+	if (DRIVER_CAPS(i915)->has_contexts && intel_has_gpu_reset(i915))
 		WARN_ON(intel_gpu_reset(i915, ALL_ENGINES));
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 38/71] drm/i915: Enable render context support for Ironlake (gen5)
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (35 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 37/71] drm/i915: Generalize i915_gem_sanitize() to reset contexts Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  8:47   ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 39/71] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga) Chris Wilson
                   ` (14 subsequent siblings)
  51 siblings, 1 reply; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: Kenneth Graunke

Ironlake does support being able to saving and reloading context specific
registers between contexts, providing isolation of the basic GPU state
(as programmable by userspace). This allows userspace to assume that the
GPU retains their state from one batch to the next, minimising the
amount of state it needs to reload, or manually save and restore.

v2: Fix off-by-one in reading CXT_SIZE, and add a comment that the
CXT_SIZE and context-layout do not match in bspec, but the difference is
irrelevant as we overallocate the full page anyway (Ville).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  | 16 ++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 13 +++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2a972efb5dfb..027653785cb6 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -220,6 +220,22 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
 			return round_up(GEN6_CXT_TOTAL_SIZE(cxt_size) * 64,
 					PAGE_SIZE);
 		case 5:
+			/*
+			 * There is a discrepancy here between the size reported
+			 * by the register and the size of the context layout
+			 * in the docs. Both are described as authorative!
+			 *
+			 * The discrepancy is on the order of a few cachelines,
+			 * but the total is under one page (4k), which is our
+			 * minimum allocation anyway so it should all come
+			 * out in the wash.
+			 */
+			cxt_size = I915_READ(CXT_SIZE) + 1;
+			DRM_DEBUG_DRIVER("gen%d CXT_SIZE = %d bytes [0x%08x]\n",
+					 INTEL_GEN(dev_priv),
+					 cxt_size * 64,
+					 cxt_size - 1);
+			return round_up(cxt_size * 64, PAGE_SIZE);
 		case 4:
 		case 3:
 		case 2:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 166901cbfa88..d17dbaacec80 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1467,11 +1467,14 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		/* These flags are for resource streamer on HSW+ */
 		flags |= HSW_MI_RS_SAVE_STATE_EN | HSW_MI_RS_RESTORE_STATE_EN;
 	else
+		/* We need to save the extended state for powersaving modes */
 		flags |= MI_SAVE_EXT_STATE_EN | MI_RESTORE_EXT_STATE_EN;
 
 	len = 4;
 	if (IS_GEN7(i915))
 		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
+	if (IS_GEN5(i915))
+		len += 2;
 
 	cs = intel_ring_begin(rq, len);
 	if (IS_ERR(cs))
@@ -1494,6 +1497,14 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 						GEN6_PSMI_SLEEP_MSG_DISABLE);
 			}
 		}
+	} else if (IS_GEN5(i915)) {
+		/*
+		 * This w/a is only listed for pre-production ilk a/b steppings,
+		 * but is also mentioned for programming the powerctx. To be
+		 * safe, just apply the workaround; we do not use SyncFlush so
+		 * this should never take effect and so be a no-op!
+		 */
+		*cs++ = MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN;
 	}
 
 	*cs++ = MI_NOOP;
@@ -1528,6 +1539,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 			*cs++ = MI_NOOP;
 		}
 		*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+	} else if (IS_GEN5(i915)) {
+		*cs++ = MI_SUSPEND_FLUSH;
 	}
 
 	intel_ring_advance(rq, cs);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 39/71] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga)
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (36 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 38/71] drm/i915: Enable render context support for Ironlake (gen5) Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 40/71] drm/i915: Split GT powermanagement functions to intel_gt_pm.c Chris Wilson
                   ` (13 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: Kenneth Graunke

Broadwater and the rest of gen4  do support being able to saving and
reloading context specific registers between contexts, providing isolation
of the basic GPU state (as programmable by userspace). This allows
userspace to assume that the GPU retains their state from one batch to the
next, minimising the amount of state it needs to reload and manually save
across batches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 027653785cb6..ed7d64f6d73f 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -220,6 +220,7 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
 			return round_up(GEN6_CXT_TOTAL_SIZE(cxt_size) * 64,
 					PAGE_SIZE);
 		case 5:
+		case 4:
 			/*
 			 * There is a discrepancy here between the size reported
 			 * by the register and the size of the context layout
@@ -236,7 +237,6 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
 					 cxt_size * 64,
 					 cxt_size - 1);
 			return round_up(cxt_size * 64, PAGE_SIZE);
-		case 4:
 		case 3:
 		case 2:
 		/* For the special day when i810 gets merged. */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 40/71] drm/i915: Split GT powermanagement functions to intel_gt_pm.c
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (37 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 39/71] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga) Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 41/71] drm/i915: Move rps worker " Chris Wilson
                   ` (12 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

intel_pm.c has grown to several thousand lines of loosely connected code
handling various powermanagement tasks. Split out the GT portion (IPS,
RPS and RC6) into its own file for easier maintenance.

v2: Move struct intel_gt_pm to intel_gt_pm.h

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/Makefile        |    1 +
 drivers/gpu/drm/i915/i915_debugfs.c  |    1 +
 drivers/gpu/drm/i915/i915_drv.c      |    5 +
 drivers/gpu/drm/i915/i915_drv.h      |   78 +-
 drivers/gpu/drm/i915/i915_gem.c      |   21 +-
 drivers/gpu/drm/i915/i915_irq.c      |    2 +
 drivers/gpu/drm/i915/i915_pmu.c      |    1 +
 drivers/gpu/drm/i915/i915_request.c  |    1 +
 drivers/gpu/drm/i915/i915_sysfs.c    |    1 +
 drivers/gpu/drm/i915/intel_display.c |    1 +
 drivers/gpu/drm/i915/intel_drv.h     |   12 -
 drivers/gpu/drm/i915/intel_gt_pm.c   | 2424 ++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_gt_pm.h   |  105 +
 drivers/gpu/drm/i915/intel_pm.c      | 2785 ++------------------------
 drivers/gpu/drm/i915/intel_uncore.c  |    2 -
 15 files changed, 2765 insertions(+), 2675 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_gt_pm.c
 create mode 100644 drivers/gpu/drm/i915/intel_gt_pm.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 00c13382b008..21f31bb06168 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -46,6 +46,7 @@ i915-y := i915_drv.o \
 	  i915_sysfs.o \
 	  intel_csr.o \
 	  intel_device_info.o \
+	  intel_gt_pm.o \
 	  intel_pm.o \
 	  intel_runtime_pm.o \
 	  intel_workarounds.o
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index b4148723f8f4..3c3ddf8ff2ae 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -31,6 +31,7 @@
 #include <linux/sched/mm.h>
 
 #include "intel_drv.h"
+#include "intel_gt_pm.h"
 #include "intel_guc_submission.h"
 #include "intel_sideband.h"
 
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 7c2ef128adcd..fef245f01a32 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -52,6 +52,7 @@
 #include "i915_query.h"
 #include "i915_vgpu.h"
 #include "intel_drv.h"
+#include "intel_gt_pm.h"
 #include "intel_uc.h"
 
 static struct drm_driver driver;
@@ -1066,6 +1067,7 @@ static int i915_driver_init_mmio(struct drm_i915_private *dev_priv)
  */
 static void i915_driver_cleanup_mmio(struct drm_i915_private *dev_priv)
 {
+	intel_sanitize_gt_powersave(dev_priv);
 	intel_uncore_fini(dev_priv);
 	i915_mmio_cleanup(dev_priv);
 	pci_dev_put(dev_priv->bridge_dev);
@@ -1173,6 +1175,9 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv)
 
 	intel_uncore_sanitize(dev_priv);
 
+	/* BIOS often leaves RC6 enabled, but disable it for hw init */
+	intel_sanitize_gt_powersave(dev_priv);
+
 	intel_opregion_setup(dev_priv);
 
 	i915_gem_load_init_fences(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 41be5a0b6032..26e5b9ff91e7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -60,6 +60,7 @@
 #include "intel_device_info.h"
 #include "intel_display.h"
 #include "intel_dpll_mgr.h"
+#include "intel_gt_pm.h"
 #include "intel_lrc.h"
 #include "intel_opregion.h"
 #include "intel_ringbuffer.h"
@@ -729,78 +730,6 @@ struct vlv_s0ix_state {
 	u32 clock_gate_dis2;
 };
 
-struct intel_rps_ei {
-	ktime_t ktime;
-	u32 render_c0;
-	u32 media_c0;
-};
-
-struct intel_rps {
-	struct mutex lock;
-
-	/*
-	 * work, interrupts_enabled and pm_iir are protected by
-	 * dev_priv->irq_lock
-	 */
-	struct work_struct work;
-	bool interrupts_enabled;
-	u32 pm_iir;
-
-	/* PM interrupt bits that should never be masked */
-	u32 pm_intrmsk_mbz;
-
-	/* Frequencies are stored in potentially platform dependent multiples.
-	 * In other words, *_freq needs to be multiplied by X to be interesting.
-	 * Soft limits are those which are used for the dynamic reclocking done
-	 * by the driver (raise frequencies under heavy loads, and lower for
-	 * lighter loads). Hard limits are those imposed by the hardware.
-	 *
-	 * A distinction is made for overclocking, which is never enabled by
-	 * default, and is considered to be above the hard limit if it's
-	 * possible at all.
-	 */
-	u8 cur_freq;		/* Current frequency (cached, may not == HW) */
-	u8 min_freq_softlimit;	/* Minimum frequency permitted by the driver */
-	u8 max_freq_softlimit;	/* Max frequency permitted by the driver */
-	u8 max_freq;		/* Maximum frequency, RP0 if not overclocking */
-	u8 min_freq;		/* AKA RPn. Minimum frequency */
-	u8 boost_freq;		/* Frequency to request when wait boosting */
-	u8 idle_freq;		/* Frequency to request when we are idle */
-	u8 efficient_freq;	/* AKA RPe. Pre-determined balanced frequency */
-	u8 rp1_freq;		/* "less than" RP0 power/freqency */
-	u8 rp0_freq;		/* Non-overclocked max frequency. */
-	u16 gpll_ref_freq;	/* vlv/chv GPLL reference frequency */
-
-	u8 up_threshold; /* Current %busy required to uplock */
-	u8 down_threshold; /* Current %busy required to downclock */
-
-	int last_adj;
-	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
-
-	bool enabled;
-	atomic_t num_waiters;
-	atomic_t boosts;
-
-	/* manual wa residency calculations */
-	struct intel_rps_ei ei;
-};
-
-struct intel_rc6 {
-	bool enabled;
-	u64 prev_hw_residency[4];
-	u64 cur_residency[4];
-};
-
-struct intel_llc_pstate {
-	bool enabled;
-};
-
-struct intel_gen6_power_mgmt {
-	struct intel_rps rps;
-	struct intel_rc6 rc6;
-	struct intel_llc_pstate llc_pstate;
-};
-
 /* defined intel_pm.c */
 extern spinlock_t mchdev_lock;
 
@@ -1791,7 +1720,7 @@ struct drm_i915_private {
 	u32 edram_cap;
 
 	/* gen6+ GT PM state */
-	struct intel_gen6_power_mgmt gt_pm;
+	struct intel_gt_pm gt_pm;
 
 	/* ilk-only ips/rps state. Everything in here is protected by the global
 	 * mchdev_lock in intel_pm.c */
@@ -2300,6 +2229,7 @@ intel_info(const struct drm_i915_private *dev_priv)
 
 #define INTEL_GEN(dev_priv)	((dev_priv)->info.gen)
 #define INTEL_DEVID(dev_priv)	((dev_priv)->info.device_id)
+#define INTEL_SSEU(dev_priv)	(&INTEL_INFO(dev_priv)->sseu)
 
 #define REVID_FOREVER		0xff
 #define INTEL_REVID(dev_priv)	((dev_priv)->drm.pdev->revision)
@@ -3482,8 +3412,6 @@ void vlv_phy_pre_encoder_enable(struct intel_encoder *encoder,
 void vlv_phy_reset_lanes(struct intel_encoder *encoder,
 			 const struct intel_crtc_state *old_crtc_state);
 
-int intel_gpu_freq(struct drm_i915_private *dev_priv, int val);
-int intel_freq_opcode(struct drm_i915_private *dev_priv, int val);
 u64 intel_rc6_residency_ns(struct drm_i915_private *dev_priv,
 			   const i915_reg_t reg);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9cc9d7031684..7393034fb806 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -28,15 +28,7 @@
 #include <drm/drmP.h>
 #include <drm/drm_vma_manager.h>
 #include <drm/i915_drm.h>
-#include "i915_drv.h"
-#include "i915_gem_clflush.h"
-#include "i915_vgpu.h"
-#include "i915_trace.h"
-#include "intel_drv.h"
-#include "intel_frontbuffer.h"
-#include "intel_mocs.h"
-#include "intel_workarounds.h"
-#include "i915_gemfs.h"
+
 #include <linux/dma-fence-array.h>
 #include <linux/kthread.h>
 #include <linux/reservation.h>
@@ -47,6 +39,17 @@
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
 
+#include "i915_drv.h"
+#include "i915_gemfs.h"
+#include "i915_gem_clflush.h"
+#include "i915_vgpu.h"
+#include "i915_trace.h"
+#include "intel_drv.h"
+#include "intel_frontbuffer.h"
+#include "intel_gt_pm.h"
+#include "intel_mocs.h"
+#include "intel_workarounds.h"
+
 static void i915_gem_flush_free_objects(struct drm_i915_private *i915);
 
 static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 34bbf0dd00ed..043dbca25b2f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -33,9 +33,11 @@
 #include <linux/circ_buf.h>
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
+
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "intel_gt_pm.h"
 
 /**
  * DOC: interrupt handling
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index dc87797db500..f374af971395 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -5,6 +5,7 @@
  */
 
 #include "i915_pmu.h"
+#include "intel_gt_pm.h"
 #include "intel_ringbuffer.h"
 #include "i915_drv.h"
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 17842549177a..b9c3242967df 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -29,6 +29,7 @@
 #include <linux/sched/signal.h>
 
 #include "i915_drv.h"
+#include "intel_gt_pm.h"
 
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
 {
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 55554697133b..fde5f0139ca1 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -31,6 +31,7 @@
 #include <linux/sysfs.h>
 
 #include "intel_drv.h"
+#include "intel_gt_pm.h"
 #include "intel_sideband.h"
 #include "i915_drv.h"
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index cd1a848a21ff..444d09539f70 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -50,6 +50,7 @@
 #include "intel_dsi.h"
 #include "intel_drv.h"
 #include "intel_frontbuffer.h"
+#include "intel_gt_pm.h"
 #include "intel_sideband.h"
 
 /* Primary plane formats for gen <= 3 */
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 11a1932cde6e..d5c680094979 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -2015,18 +2015,6 @@ void intel_update_watermarks(struct intel_crtc *crtc);
 void intel_init_pm(struct drm_i915_private *dev_priv);
 void intel_init_clock_gating_hooks(struct drm_i915_private *dev_priv);
 void intel_pm_setup(struct drm_i915_private *dev_priv);
-void intel_gpu_ips_init(struct drm_i915_private *dev_priv);
-void intel_gpu_ips_teardown(void);
-void intel_init_gt_powersave(struct drm_i915_private *dev_priv);
-void intel_cleanup_gt_powersave(struct drm_i915_private *dev_priv);
-void intel_sanitize_gt_powersave(struct drm_i915_private *dev_priv);
-void intel_enable_gt_powersave(struct drm_i915_private *dev_priv);
-void intel_disable_gt_powersave(struct drm_i915_private *dev_priv);
-void intel_suspend_gt_powersave(struct drm_i915_private *dev_priv);
-void gen6_rps_busy(struct drm_i915_private *dev_priv);
-void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
-void gen6_rps_idle(struct drm_i915_private *dev_priv);
-void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
 void g4x_wm_get_hw_state(struct drm_device *dev);
 void vlv_wm_get_hw_state(struct drm_device *dev);
 void ilk_wm_get_hw_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
new file mode 100644
index 000000000000..733d346601ca
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -0,0 +1,2424 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2012-2018 Intel Corporation
+ */
+
+#include <linux/cpufreq.h>
+#include <linux/module.h>
+
+#include "../../../platform/x86/intel_ips.h"
+
+#include "intel_gt_pm.h"
+
+#include "i915_drv.h"
+#include "intel_drv.h"
+#include "intel_sideband.h"
+
+/**
+ * DOC: RC6
+ *
+ * RC6 is a special power stage which allows the GPU to enter an very
+ * low-voltage mode when idle, using down to 0V while at this stage.  This
+ * stage is entered automatically when the GPU is idle when RC6 support is
+ * enabled, and as soon as new workload arises GPU wakes up automatically as
+ * well.
+ *
+ * There are different RC6 modes available in Intel GPU, which differentiate
+ * among each other with the latency required to enter and leave RC6 and
+ * voltage consumed by the GPU in different states.
+ *
+ * The combination of the following flags define which states GPU is allowed
+ * to enter, while RC6 is the normal RC6 state, RC6p is the deep RC6, and
+ * RC6pp is deepest RC6. Their support by hardware varies according to the
+ * GPU, BIOS, chipset and platform. RC6 is usually the safest one and the one
+ * which brings the most power savings; deeper states save more power, but
+ * require higher latency to switch to and wake up.
+ */
+
+/*
+ * Lock protecting IPS related data structures
+ */
+DEFINE_SPINLOCK(mchdev_lock);
+
+bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val)
+{
+	u16 rgvswctl;
+
+	lockdep_assert_held(&mchdev_lock);
+
+	rgvswctl = I915_READ16(MEMSWCTL);
+	if (rgvswctl & MEMCTL_CMD_STS) {
+		DRM_DEBUG("gpu busy, RCS change rejected\n");
+		return false; /* still busy with another command */
+	}
+
+	rgvswctl = (MEMCTL_CMD_CHFREQ << MEMCTL_CMD_SHIFT) |
+		(val << MEMCTL_FREQ_SHIFT) | MEMCTL_SFCAVM;
+	I915_WRITE16(MEMSWCTL, rgvswctl);
+	POSTING_READ16(MEMSWCTL);
+
+	rgvswctl |= MEMCTL_CMD_STS;
+	I915_WRITE16(MEMSWCTL, rgvswctl);
+
+	return true;
+}
+
+static void ironlake_enable_drps(struct drm_i915_private *dev_priv)
+{
+	u32 rgvmodectl;
+	u8 fmax, fmin, fstart, vstart;
+
+	spin_lock_irq(&mchdev_lock);
+
+	rgvmodectl = I915_READ(MEMMODECTL);
+
+	/* Enable temp reporting */
+	I915_WRITE16(PMMISC, I915_READ(PMMISC) | MCPPCE_EN);
+	I915_WRITE16(TSC1, I915_READ(TSC1) | TSE);
+
+	/* 100ms RC evaluation intervals */
+	I915_WRITE(RCUPEI, 100000);
+	I915_WRITE(RCDNEI, 100000);
+
+	/* Set max/min thresholds to 90ms and 80ms respectively */
+	I915_WRITE(RCBMAXAVG, 90000);
+	I915_WRITE(RCBMINAVG, 80000);
+
+	I915_WRITE(MEMIHYST, 1);
+
+	/* Set up min, max, and cur for interrupt handling */
+	fmax = (rgvmodectl & MEMMODE_FMAX_MASK) >> MEMMODE_FMAX_SHIFT;
+	fmin = (rgvmodectl & MEMMODE_FMIN_MASK);
+	fstart = (rgvmodectl & MEMMODE_FSTART_MASK) >>
+		MEMMODE_FSTART_SHIFT;
+
+	vstart = (I915_READ(PXVFREQ(fstart)) & PXVFREQ_PX_MASK) >>
+		PXVFREQ_PX_SHIFT;
+
+	dev_priv->ips.fmax = fmax; /* IPS callback will increase this */
+	dev_priv->ips.fstart = fstart;
+
+	dev_priv->ips.max_delay = fstart;
+	dev_priv->ips.min_delay = fmin;
+	dev_priv->ips.cur_delay = fstart;
+
+	DRM_DEBUG_DRIVER("fmax: %d, fmin: %d, fstart: %d\n",
+			 fmax, fmin, fstart);
+
+	I915_WRITE(MEMINTREN, MEMINT_CX_SUPR_EN | MEMINT_EVAL_CHG_EN);
+
+	/*
+	 * Interrupts will be enabled in ironlake_irq_postinstall
+	 */
+
+	I915_WRITE(VIDSTART, vstart);
+	POSTING_READ(VIDSTART);
+
+	rgvmodectl |= MEMMODE_SWMODE_EN;
+	I915_WRITE(MEMMODECTL, rgvmodectl);
+
+	if (wait_for_atomic((I915_READ(MEMSWCTL) & MEMCTL_CMD_STS) == 0, 10))
+		DRM_ERROR("stuck trying to change perf mode\n");
+	mdelay(1);
+
+	ironlake_set_drps(dev_priv, fstart);
+
+	dev_priv->ips.last_count1 = I915_READ(DMIEC) +
+		I915_READ(DDREC) + I915_READ(CSIEC);
+	dev_priv->ips.last_time1 = jiffies_to_msecs(jiffies);
+	dev_priv->ips.last_count2 = I915_READ(GFXEC);
+	dev_priv->ips.last_time2 = ktime_get_raw_ns();
+
+	spin_unlock_irq(&mchdev_lock);
+}
+
+static void ironlake_disable_drps(struct drm_i915_private *dev_priv)
+{
+	u16 rgvswctl;
+
+	spin_lock_irq(&mchdev_lock);
+
+	rgvswctl = I915_READ16(MEMSWCTL);
+
+	/* Ack interrupts, disable EFC interrupt */
+	I915_WRITE(MEMINTREN, I915_READ(MEMINTREN) & ~MEMINT_EVAL_CHG_EN);
+	I915_WRITE(MEMINTRSTS, MEMINT_EVAL_CHG);
+	I915_WRITE(DEIER, I915_READ(DEIER) & ~DE_PCU_EVENT);
+	I915_WRITE(DEIIR, DE_PCU_EVENT);
+	I915_WRITE(DEIMR, I915_READ(DEIMR) | DE_PCU_EVENT);
+
+	/* Go back to the starting frequency */
+	ironlake_set_drps(dev_priv, dev_priv->ips.fstart);
+	mdelay(1);
+	rgvswctl |= MEMCTL_CMD_STS;
+	I915_WRITE(MEMSWCTL, rgvswctl);
+	mdelay(1);
+
+	spin_unlock_irq(&mchdev_lock);
+}
+
+/*
+ * There's a funny hw issue where the hw returns all 0 when reading from
+ * GEN6_RP_INTERRUPT_LIMITS. Hence we always need to compute the desired value
+ * ourselves, instead of doing a rmw cycle (which might result in us clearing
+ * all limits and the gpu stuck at whatever frequency it is at atm).
+ */
+static u32 intel_rps_limits(struct drm_i915_private *i915, u8 val)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	u32 limits;
+
+	/*
+	 * Only set the down limit when we've reached the lowest level to avoid
+	 * getting more interrupts, otherwise leave this clear. This prevents a
+	 * race in the hw when coming out of rc6: There's a tiny window where
+	 * the hw runs at the minimal clock before selecting the desired
+	 * frequency, if the down threshold expires in that window we will not
+	 * receive a down interrupt.
+	 */
+	if (INTEL_GEN(i915) >= 9) {
+		limits = (rps->max_freq_softlimit) << 23;
+		if (val <= rps->min_freq_softlimit)
+			limits |= (rps->min_freq_softlimit) << 14;
+	} else {
+		limits = rps->max_freq_softlimit << 24;
+		if (val <= rps->min_freq_softlimit)
+			limits |= rps->min_freq_softlimit << 16;
+	}
+
+	return limits;
+}
+
+static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	int new_power;
+	u32 threshold_up = 0, threshold_down = 0; /* in % */
+	u32 ei_up = 0, ei_down = 0;
+
+	new_power = rps->power;
+	switch (rps->power) {
+	case LOW_POWER:
+		if (val > rps->efficient_freq + 1 &&
+		    val > rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+
+	case BETWEEN:
+		if (val <= rps->efficient_freq &&
+		    val < rps->cur_freq)
+			new_power = LOW_POWER;
+		else if (val >= rps->rp0_freq &&
+			 val > rps->cur_freq)
+			new_power = HIGH_POWER;
+		break;
+
+	case HIGH_POWER:
+		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
+		    val < rps->cur_freq)
+			new_power = BETWEEN;
+		break;
+	}
+	/* Max/min bins are special */
+	if (val <= rps->min_freq_softlimit)
+		new_power = LOW_POWER;
+	if (val >= rps->max_freq_softlimit)
+		new_power = HIGH_POWER;
+	if (new_power == rps->power)
+		return;
+
+	/* Note the units here are not exactly 1us, but 1280ns. */
+	switch (new_power) {
+	case LOW_POWER:
+		/* Upclock if more than 95% busy over 16ms */
+		ei_up = 16000;
+		threshold_up = 95;
+
+		/* Downclock if less than 85% busy over 32ms */
+		ei_down = 32000;
+		threshold_down = 85;
+		break;
+
+	case BETWEEN:
+		/* Upclock if more than 90% busy over 13ms */
+		ei_up = 13000;
+		threshold_up = 90;
+
+		/* Downclock if less than 75% busy over 32ms */
+		ei_down = 32000;
+		threshold_down = 75;
+		break;
+
+	case HIGH_POWER:
+		/* Upclock if more than 85% busy over 10ms */
+		ei_up = 10000;
+		threshold_up = 85;
+
+		/* Downclock if less than 60% busy over 32ms */
+		ei_down = 32000;
+		threshold_down = 60;
+		break;
+	}
+
+	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
+		/*
+		 * Baytrail and Braswell control the gpu frequency via the
+		 * punit, which is very slow and expensive to communicate with,
+		 * as we synchronously force the package to C0. If we try and
+		 * update the gpufreq too often we cause measurable system
+		 * load for little benefit (effectively stealing CPU time for
+		 * the GPU, negatively impacting overall throughput).
+		 */
+		ei_up <<= 2;
+		ei_down <<= 2;
+	}
+
+	I915_WRITE(GEN6_RP_UP_EI,
+		   GT_INTERVAL_FROM_US(dev_priv, ei_up));
+	I915_WRITE(GEN6_RP_UP_THRESHOLD,
+		   GT_INTERVAL_FROM_US(dev_priv,
+				       ei_up * threshold_up / 100));
+
+	I915_WRITE(GEN6_RP_DOWN_EI,
+		   GT_INTERVAL_FROM_US(dev_priv, ei_down));
+	I915_WRITE(GEN6_RP_DOWN_THRESHOLD,
+		   GT_INTERVAL_FROM_US(dev_priv,
+				       ei_down * threshold_down / 100));
+
+	I915_WRITE(GEN6_RP_CONTROL,
+		   GEN6_RP_MEDIA_TURBO |
+		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
+		   GEN6_RP_MEDIA_IS_GFX |
+		   GEN6_RP_ENABLE |
+		   GEN6_RP_UP_BUSY_AVG |
+		   GEN6_RP_DOWN_IDLE_AVG);
+
+	rps->power = new_power;
+	rps->up_threshold = threshold_up;
+	rps->down_threshold = threshold_down;
+	rps->last_adj = 0;
+}
+
+static u32 gen6_rps_pm_mask(struct drm_i915_private *i915, u8 val)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	u32 mask = 0;
+
+	/* We use UP_EI_EXPIRED interupts for both up/down in manual mode */
+	if (val > rps->min_freq_softlimit)
+		mask |= (GEN6_PM_RP_UP_EI_EXPIRED |
+			 GEN6_PM_RP_DOWN_THRESHOLD |
+			 GEN6_PM_RP_DOWN_TIMEOUT);
+
+	if (val < rps->max_freq_softlimit)
+		mask |= (GEN6_PM_RP_UP_EI_EXPIRED |
+			 GEN6_PM_RP_UP_THRESHOLD);
+
+	mask &= i915->pm_rps_events;
+
+	return gen6_sanitize_rps_pm_mask(i915, ~mask);
+}
+
+/*
+ * gen6_set_rps is called to update the frequency request, but should also be
+ * called when the range (min_delay and max_delay) is modified so that we can
+ * update the GEN6_RP_INTERRUPT_LIMITS register accordingly.
+ */
+static int gen6_set_rps(struct drm_i915_private *dev_priv, u8 val)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	/*
+	 * min/max delay may still have been modified so be sure to
+	 * write the limits value.
+	 */
+	if (val != rps->cur_freq) {
+		gen6_set_rps_thresholds(dev_priv, val);
+
+		if (INTEL_GEN(dev_priv) >= 9)
+			I915_WRITE(GEN6_RPNSWREQ,
+				   GEN9_FREQUENCY(val));
+		else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
+			I915_WRITE(GEN6_RPNSWREQ,
+				   HSW_FREQUENCY(val));
+		else
+			I915_WRITE(GEN6_RPNSWREQ,
+				   GEN6_FREQUENCY(val) |
+				   GEN6_OFFSET(0) |
+				   GEN6_AGGRESSIVE_TURBO);
+	}
+
+	/*
+	 * Make sure we continue to get interrupts
+	 * until we hit the minimum or maximum frequencies.
+	 */
+	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS, intel_rps_limits(dev_priv, val));
+	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
+
+	rps->cur_freq = val;
+	trace_intel_gpu_freq_change(intel_gpu_freq(dev_priv, val));
+
+	return 0;
+}
+
+static int valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
+{
+	int err;
+
+	if (WARN_ONCE(IS_CHERRYVIEW(dev_priv) && (val & 1),
+		      "Odd GPU freq value\n"))
+		val &= ~1;
+
+	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
+
+	if (val != dev_priv->gt_pm.rps.cur_freq) {
+		vlv_punit_get(dev_priv);
+		err = vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
+		vlv_punit_put(dev_priv);
+		if (err)
+			return err;
+
+		gen6_set_rps_thresholds(dev_priv, val);
+	}
+
+	dev_priv->gt_pm.rps.cur_freq = val;
+	trace_intel_gpu_freq_change(intel_gpu_freq(dev_priv, val));
+
+	return 0;
+}
+
+/*
+ * vlv_set_rps_idle: Set the frequency to idle, if Gfx clocks are down
+ *
+ * If Gfx is Idle, then
+ * 1. Forcewake Media well.
+ * 2. Request idle freq.
+ * 3. Release Forcewake of Media well.
+ */
+static void vlv_set_rps_idle(struct drm_i915_private *i915)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	u32 val = rps->idle_freq;
+	int err;
+
+	if (rps->cur_freq <= val)
+		return;
+
+	/*
+	 * The punit delays the write of the frequency and voltage until it
+	 * determines the GPU is awake. During normal usage we don't want to
+	 * waste power changing the frequency if the GPU is sleeping (rc6).
+	 * However, the GPU and driver is now idle and we do not want to delay
+	 * switching to minimum voltage (reducing power whilst idle) as we do
+	 * not expect to be woken in the near future and so must flush the
+	 * change by waking the device.
+	 *
+	 * We choose to take the media powerwell (either would do to trick the
+	 * punit into committing the voltage change) as that takes a lot less
+	 * power than the render powerwell.
+	 */
+	intel_uncore_forcewake_get(i915, FORCEWAKE_MEDIA);
+	err = valleyview_set_rps(i915, val);
+	intel_uncore_forcewake_put(i915, FORCEWAKE_MEDIA);
+
+	if (err)
+		DRM_ERROR("Failed to set RPS for idle\n");
+}
+
+void gen6_rps_busy(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	mutex_lock(&rps->lock);
+	if (rps->enabled) {
+		u8 freq;
+
+		if (dev_priv->pm_rps_events & GEN6_PM_RP_UP_EI_EXPIRED)
+			gen6_rps_reset_ei(dev_priv);
+		I915_WRITE(GEN6_PMINTRMSK,
+			   gen6_rps_pm_mask(dev_priv, rps->cur_freq));
+
+		gen6_enable_rps_interrupts(dev_priv);
+
+		/*
+		 * Use the user's desired frequency as a guide, but for better
+		 * performance, jump directly to RPe as our starting frequency.
+		 */
+		freq = max(rps->cur_freq,
+			   rps->efficient_freq);
+
+		if (intel_set_rps(dev_priv,
+				  clamp(freq,
+					rps->min_freq_softlimit,
+					rps->max_freq_softlimit)))
+			DRM_DEBUG_DRIVER("Failed to set idle frequency\n");
+	}
+	mutex_unlock(&rps->lock);
+}
+
+void gen6_rps_idle(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	/*
+	 * Flush our bottom-half so that it does not race with us
+	 * setting the idle frequency and so that it is bounded by
+	 * our rpm wakeref. And then disable the interrupts to stop any
+	 * futher RPS reclocking whilst we are asleep.
+	 */
+	gen6_disable_rps_interrupts(dev_priv);
+
+	mutex_lock(&rps->lock);
+	if (rps->enabled) {
+		if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
+			vlv_set_rps_idle(dev_priv);
+		else
+			gen6_set_rps(dev_priv, rps->idle_freq);
+		rps->last_adj = 0;
+		I915_WRITE(GEN6_PMINTRMSK,
+			   gen6_sanitize_rps_pm_mask(dev_priv, ~0));
+	}
+	mutex_unlock(&rps->lock);
+}
+
+void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *client)
+{
+	struct intel_rps *rps = &rq->i915->gt_pm.rps;
+	unsigned long flags;
+	bool boost;
+
+	/*
+	 * This is intentionally racy! We peek at the state here, then
+	 * validate inside the RPS worker.
+	 */
+	if (!rps->enabled)
+		return;
+
+	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
+		return;
+
+	/* Serializes with i915_request_retire() */
+	boost = false;
+	spin_lock_irqsave(&rq->lock, flags);
+	if (!rq->waitboost && !dma_fence_is_signaled_locked(&rq->fence)) {
+		boost = !atomic_fetch_inc(&rps->num_waiters);
+		rq->waitboost = true;
+	}
+	spin_unlock_irqrestore(&rq->lock, flags);
+	if (!boost)
+		return;
+
+	if (READ_ONCE(rps->cur_freq) < rps->boost_freq)
+		schedule_work(&rps->work);
+
+	atomic_inc(client ? &client->boosts : &rps->boosts);
+}
+
+int intel_set_rps(struct drm_i915_private *i915, u8 val)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	int err;
+
+	lockdep_assert_held(&rps->lock);
+	GEM_BUG_ON(val > rps->max_freq);
+	GEM_BUG_ON(val < rps->min_freq);
+
+	if (!rps->enabled) {
+		rps->cur_freq = val;
+		return 0;
+	}
+
+	if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
+		err = valleyview_set_rps(i915, val);
+	else
+		err = gen6_set_rps(i915, val);
+
+	return err;
+}
+
+static void gen9_disable_rc6(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+	I915_WRITE(GEN9_PG_ENABLE, 0);
+}
+
+static void gen9_disable_rps(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_RP_CONTROL, 0);
+}
+
+static void gen6_disable_rc6(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+}
+
+static void gen6_disable_rps(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_RPNSWREQ, 1 << 31);
+	I915_WRITE(GEN6_RP_CONTROL, 0);
+}
+
+static void cherryview_disable_rc6(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+}
+
+static void cherryview_disable_rps(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_RP_CONTROL, 0);
+}
+
+static void valleyview_disable_rc6(struct drm_i915_private *dev_priv)
+{
+	/*
+	 * We're doing forcewake before Disabling RC6,
+	 * This what the BIOS expects when going into suspend.
+	 */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void valleyview_disable_rps(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_RP_CONTROL, 0);
+}
+
+static bool bxt_check_bios_rc6_setup(struct drm_i915_private *dev_priv)
+{
+	bool enable_rc6 = true;
+	unsigned long rc6_ctx_base;
+	u32 rc_ctl;
+	int rc_sw_target;
+
+	rc_ctl = I915_READ(GEN6_RC_CONTROL);
+	rc_sw_target = (I915_READ(GEN6_RC_STATE) & RC_SW_TARGET_STATE_MASK) >>
+		       RC_SW_TARGET_STATE_SHIFT;
+	DRM_DEBUG_DRIVER("BIOS enabled RC states: "
+			 "HW_CTRL %s HW_RC6 %s SW_TARGET_STATE %x\n",
+			 onoff(rc_ctl & GEN6_RC_CTL_HW_ENABLE),
+			 onoff(rc_ctl & GEN6_RC_CTL_RC6_ENABLE),
+			 rc_sw_target);
+
+	if (!(I915_READ(RC6_LOCATION) & RC6_CTX_IN_DRAM)) {
+		DRM_DEBUG_DRIVER("RC6 Base location not set properly.\n");
+		enable_rc6 = false;
+	}
+
+	/*
+	 * The exact context size is not known for BXT, so assume a page size
+	 * for this check.
+	 */
+	rc6_ctx_base = I915_READ(RC6_CTX_BASE) & RC6_CTX_BASE_MASK;
+	if (!(rc6_ctx_base >= dev_priv->dsm_reserved.start &&
+	      rc6_ctx_base + PAGE_SIZE <= dev_priv->dsm_reserved.end + 1)) {
+		DRM_DEBUG_DRIVER("RC6 Base address not as expected.\n");
+		enable_rc6 = false;
+	}
+
+	if (!(((I915_READ(PWRCTX_MAXCNT_RCSUNIT) & IDLE_TIME_MASK) > 1) &&
+	      ((I915_READ(PWRCTX_MAXCNT_VCSUNIT0) & IDLE_TIME_MASK) > 1) &&
+	      ((I915_READ(PWRCTX_MAXCNT_BCSUNIT) & IDLE_TIME_MASK) > 1) &&
+	      ((I915_READ(PWRCTX_MAXCNT_VECSUNIT) & IDLE_TIME_MASK) > 1))) {
+		DRM_DEBUG_DRIVER("Engine Idle wait time not set properly.\n");
+		enable_rc6 = false;
+	}
+
+	if (!I915_READ(GEN8_PUSHBUS_CONTROL) ||
+	    !I915_READ(GEN8_PUSHBUS_ENABLE) ||
+	    !I915_READ(GEN8_PUSHBUS_SHIFT)) {
+		DRM_DEBUG_DRIVER("Pushbus not setup properly.\n");
+		enable_rc6 = false;
+	}
+
+	if (!I915_READ(GEN6_GFXPAUSE)) {
+		DRM_DEBUG_DRIVER("GFX pause not setup properly.\n");
+		enable_rc6 = false;
+	}
+
+	if (!I915_READ(GEN8_MISC_CTRL0)) {
+		DRM_DEBUG_DRIVER("GPM control not setup properly.\n");
+		enable_rc6 = false;
+	}
+
+	return enable_rc6;
+}
+
+static bool sanitize_rc6(struct drm_i915_private *i915)
+{
+	struct intel_device_info *info = mkwrite_device_info(i915);
+
+	/* Powersaving is controlled by the host when inside a VM */
+	if (intel_vgpu_active(i915))
+		info->has_rc6 = 0;
+
+	if (info->has_rc6 &&
+	    IS_GEN9_LP(i915) && !bxt_check_bios_rc6_setup(i915)) {
+		DRM_INFO("RC6 disabled by BIOS\n");
+		info->has_rc6 = 0;
+	}
+
+	/*
+	 * We assume that we do not have any deep rc6 levels if we don't have
+	 * have the previous rc6 level supported, i.e. we use HAS_RC6()
+	 * as the initial coarse check for rc6 in general, moving on to
+	 * progressively finer/deeper levels.
+	 */
+	if (!info->has_rc6 && info->has_rc6p)
+		info->has_rc6p = 0;
+
+	return info->has_rc6;
+}
+
+static void gen6_init_rps_frequencies(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	/* All of these values are in units of 50MHz */
+
+	/* static values from HW: RP0 > RP1 > RPn (min_freq) */
+	if (IS_GEN9_LP(dev_priv)) {
+		u32 rp_state_cap = I915_READ(BXT_RP_STATE_CAP);
+
+		rps->rp0_freq = (rp_state_cap >> 16) & 0xff;
+		rps->rp1_freq = (rp_state_cap >>  8) & 0xff;
+		rps->min_freq = (rp_state_cap >>  0) & 0xff;
+	} else {
+		u32 rp_state_cap = I915_READ(GEN6_RP_STATE_CAP);
+
+		rps->rp0_freq = (rp_state_cap >>  0) & 0xff;
+		rps->rp1_freq = (rp_state_cap >>  8) & 0xff;
+		rps->min_freq = (rp_state_cap >> 16) & 0xff;
+	}
+	/* hw_max = RP0 until we check for overclocking */
+	rps->max_freq = rps->rp0_freq;
+
+	rps->efficient_freq = rps->rp1_freq;
+	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv) ||
+	    IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) {
+		u32 ddcc_status = 0;
+
+		if (sandybridge_pcode_read(dev_priv,
+					   HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL,
+					   &ddcc_status) == 0)
+			rps->efficient_freq =
+				clamp_t(u8,
+					((ddcc_status >> 8) & 0xff),
+					rps->min_freq,
+					rps->max_freq);
+	}
+
+	if (IS_GEN9_BC(dev_priv) || IS_CANNONLAKE(dev_priv)) {
+		/*
+		 * Store the frequency values in 16.66 MHZ units, which is
+		 * the natural hardware unit for SKL
+		 */
+		rps->rp0_freq *= GEN9_FREQ_SCALER;
+		rps->rp1_freq *= GEN9_FREQ_SCALER;
+		rps->min_freq *= GEN9_FREQ_SCALER;
+		rps->max_freq *= GEN9_FREQ_SCALER;
+		rps->efficient_freq *= GEN9_FREQ_SCALER;
+	}
+}
+
+static void reset_rps(struct drm_i915_private *i915,
+		      int (*set)(struct drm_i915_private *, u8))
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	u8 freq = rps->cur_freq;
+
+	/* force a reset */
+	rps->power = -1;
+	rps->cur_freq = -1;
+
+	if (set(i915, freq))
+		DRM_ERROR("Failed to reset RPS to initial values\n");
+}
+
+/* See the Gen9_GT_PM_Programming_Guide doc for the below */
+static void gen9_enable_rps(struct drm_i915_private *dev_priv)
+{
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* Program defaults and thresholds for RPS */
+	if (IS_GEN9(dev_priv))
+		I915_WRITE(GEN6_RC_VIDEO_FREQ,
+			   GEN9_FREQUENCY(dev_priv->gt_pm.rps.rp1_freq));
+
+	/* 1 second timeout*/
+	I915_WRITE(GEN6_RP_DOWN_TIMEOUT,
+		   GT_INTERVAL_FROM_US(dev_priv, 1000000));
+
+	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 0xa);
+
+	/*
+	 * Leaning on the below call to gen6_set_rps to program/setup the
+	 * Up/Down EI & threshold registers, as well as the RP_CONTROL,
+	 * RP_INTERRUPT_LIMITS & RPNSWREQ registers.
+	 */
+	reset_rps(dev_priv, gen6_set_rps);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void gen9_enable_rc6(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 rc6_mode;
+
+	/* 1a: Software RC state - RC0 */
+	I915_WRITE(GEN6_RC_STATE, 0);
+
+	/*
+	 * 1b: Get forcewake during program sequence. Although the driver
+	 * hasn't enabled a state yet where we need forcewake, BIOS may have.
+	 */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* 2a: Disable RC states. */
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+
+	/* 2b: Program RC6 thresholds.*/
+	if (INTEL_GEN(dev_priv) >= 10) {
+		I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85);
+		I915_WRITE(GEN10_MEDIA_WAKE_RATE_LIMIT, 150);
+	} else if (IS_SKYLAKE(dev_priv)) {
+		/*
+		 * WaRsDoubleRc6WrlWithCoarsePowerGating:skl Doubling WRL only
+		 * when CPG is enabled
+		 */
+		I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 108 << 16);
+	} else {
+		I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16);
+	}
+
+	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
+	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
+	for_each_engine(engine, dev_priv, id)
+		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
+
+	if (HAS_GUC(dev_priv))
+		I915_WRITE(GUC_MAX_IDLE_COUNT, 0xA);
+
+	I915_WRITE(GEN6_RC_SLEEP, 0);
+
+	/*
+	 * 2c: Program Coarse Power Gating Policies.
+	 *
+	 * Bspec's guidance is to use 25us (really 25 * 1280ns) here. What we
+	 * use instead is a more conservative estimate for the maximum time
+	 * it takes us to service a CS interrupt and submit a new ELSP - that
+	 * is the time which the GPU is idle waiting for the CPU to select the
+	 * next request to execute. If the idle hysteresis is less than that
+	 * interrupt service latency, the hardware will automatically gate
+	 * the power well and we will then incur the wake up cost on top of
+	 * the service latency. A similar guide from intel_pstate is that we
+	 * do not want the enable hysteresis to less than the wakeup latency.
+	 *
+	 * igt/gem_exec_nop/sequential provides a rough estimate for the
+	 * service latency, and puts it around 10us for Broadwell (and other
+	 * big core) and around 40us for Broxton (and other low power cores).
+	 * [Note that for legacy ringbuffer submission, this is less than 1us!]
+	 * However, the wakeup latency on Broxton is closer to 100us. To be
+	 * conservative, we have to factor in a context switch on top (due
+	 * to ksoftirqd).
+	 */
+	I915_WRITE(GEN9_MEDIA_PG_IDLE_HYSTERESIS, 250);
+	I915_WRITE(GEN9_RENDER_PG_IDLE_HYSTERESIS, 250);
+
+	/* 3a: Enable RC6 */
+	I915_WRITE(GEN6_RC6_THRESHOLD, 37500); /* 37.5/125ms per EI */
+
+	/* WaRsUseTimeoutMode:cnl (pre-prod) */
+	if (IS_CNL_REVID(dev_priv, CNL_REVID_A0, CNL_REVID_C0))
+		rc6_mode = GEN7_RC_CTL_TO_MODE;
+	else
+		rc6_mode = GEN6_RC_CTL_EI_MODE(1);
+
+	I915_WRITE(GEN6_RC_CONTROL,
+		   GEN6_RC_CTL_HW_ENABLE |
+		   GEN6_RC_CTL_RC6_ENABLE |
+		   rc6_mode);
+
+	/*
+	 * 3b: Enable Coarse Power Gating only when RC6 is enabled.
+	 * WaRsDisableCoarsePowerGating:skl,cnl
+	 *  - Render/Media PG need to be disabled with RC6.
+	 */
+	if (NEEDS_WaRsDisableCoarsePowerGating(dev_priv))
+		I915_WRITE(GEN9_PG_ENABLE, 0);
+	else
+		I915_WRITE(GEN9_PG_ENABLE,
+			   GEN9_RENDER_PG_ENABLE | GEN9_MEDIA_PG_ENABLE);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void gen8_enable_rc6(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	/* 1a: Software RC state - RC0 */
+	I915_WRITE(GEN6_RC_STATE, 0);
+
+	/*
+	 * 1b: Get forcewake during program sequence. Although the driver
+	 * hasn't enabled a state yet where we need forcewake, BIOS may have.
+	 */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* 2a: Disable RC states. */
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+
+	/* 2b: Program RC6 thresholds.*/
+	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16);
+	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
+	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
+	for_each_engine(engine, dev_priv, id)
+		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
+	I915_WRITE(GEN6_RC_SLEEP, 0);
+	I915_WRITE(GEN6_RC6_THRESHOLD, 625); /* 800us/1.28 for TO */
+
+	/* 3: Enable RC6 */
+
+	I915_WRITE(GEN6_RC_CONTROL,
+		   GEN6_RC_CTL_HW_ENABLE |
+		   GEN7_RC_CTL_TO_MODE |
+		   GEN6_RC_CTL_RC6_ENABLE);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void gen8_enable_rps(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* 1 Program defaults and thresholds for RPS*/
+	I915_WRITE(GEN6_RPNSWREQ,
+		   HSW_FREQUENCY(rps->rp1_freq));
+	I915_WRITE(GEN6_RC_VIDEO_FREQ,
+		   HSW_FREQUENCY(rps->rp1_freq));
+	/* NB: Docs say 1s, and 1000000 - which aren't equivalent */
+	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 100000000 / 128); /* 1s timeout */
+
+	/* Docs recommend 900MHz, and 300 MHz respectively */
+	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS,
+		   rps->max_freq_softlimit << 24 |
+		   rps->min_freq_softlimit << 16);
+
+	I915_WRITE(GEN6_RP_UP_THRESHOLD, 7600000 / 128); /* 76ms busyness per EI, 90% */
+	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 31300000 / 128); /* 313ms busyness per EI, 70%*/
+	I915_WRITE(GEN6_RP_UP_EI, 66000); /* 84.48ms, XXX: random? */
+	I915_WRITE(GEN6_RP_DOWN_EI, 350000); /* 448ms, XXX: random? */
+
+	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
+
+	/* 2: Enable RPS */
+	I915_WRITE(GEN6_RP_CONTROL,
+		   GEN6_RP_MEDIA_TURBO |
+		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
+		   GEN6_RP_MEDIA_IS_GFX |
+		   GEN6_RP_ENABLE |
+		   GEN6_RP_UP_BUSY_AVG |
+		   GEN6_RP_DOWN_IDLE_AVG);
+
+	reset_rps(dev_priv, gen6_set_rps);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void gen6_fix_rc6_voltage(struct drm_i915_private *i915)
+{
+	u32 rc6vids = 0;
+
+	if (sandybridge_pcode_read(i915,
+				   GEN6_PCODE_READ_RC6VIDS,
+				   &rc6vids)) {
+		DRM_DEBUG_DRIVER("Couldn't check for BIOS rc6 w/a\n");
+		return;
+	}
+
+	if (GEN6_DECODE_RC6_VID(rc6vids & 0xff) < 450) {
+		DRM_DEBUG_DRIVER("You should update your BIOS. Correcting minimum rc6 voltage (%dmV->%dmV)\n",
+				 GEN6_DECODE_RC6_VID(rc6vids & 0xff),
+				 450);
+
+		rc6vids &= 0xffff00;
+		rc6vids |= GEN6_ENCODE_RC6_VID(450);
+		if (sandybridge_pcode_write(i915,
+					    GEN6_PCODE_WRITE_RC6VIDS,
+					    rc6vids))
+			DRM_ERROR("Unable to correct rc6 voltage\n");
+	}
+}
+
+static void gen6_enable_rc6(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 gtfifodbg;
+	u32 rc6_mask;
+
+	I915_WRITE(GEN6_RC_STATE, 0);
+
+	/* Clear the DBG now so we don't confuse earlier errors */
+	gtfifodbg = I915_READ(GTFIFODBG);
+	if (gtfifodbg) {
+		DRM_ERROR("GT fifo had a previous error %x\n", gtfifodbg);
+		I915_WRITE(GTFIFODBG, gtfifodbg);
+	}
+
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* disable the counters and set deterministic thresholds */
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+
+	I915_WRITE(GEN6_RC1_WAKE_RATE_LIMIT, 1000 << 16);
+	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16 | 30);
+	I915_WRITE(GEN6_RC6pp_WAKE_RATE_LIMIT, 30);
+	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000);
+	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25);
+
+	for_each_engine(engine, dev_priv, id)
+		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
+
+	I915_WRITE(GEN6_RC_SLEEP, 0);
+	I915_WRITE(GEN6_RC1e_THRESHOLD, 1000);
+	if (IS_IVYBRIDGE(dev_priv))
+		I915_WRITE(GEN6_RC6_THRESHOLD, 125000);
+	else
+		I915_WRITE(GEN6_RC6_THRESHOLD, 50000);
+	I915_WRITE(GEN6_RC6p_THRESHOLD, 150000);
+	I915_WRITE(GEN6_RC6pp_THRESHOLD, 64000); /* unused */
+
+	/* We don't use those on Haswell */
+	rc6_mask = GEN6_RC_CTL_RC6_ENABLE;
+	if (HAS_RC6p(dev_priv))
+		rc6_mask |= GEN6_RC_CTL_RC6p_ENABLE;
+	if (HAS_RC6pp(dev_priv))
+		rc6_mask |= GEN6_RC_CTL_RC6pp_ENABLE;
+	I915_WRITE(GEN6_RC_CONTROL,
+		   rc6_mask |
+		   GEN6_RC_CTL_EI_MODE(1) |
+		   GEN6_RC_CTL_HW_ENABLE);
+
+	if (IS_GEN6(dev_priv))
+		gen6_fix_rc6_voltage(dev_priv);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void gen6_enable_rps(struct drm_i915_private *dev_priv)
+{
+	/*
+	 * Here begins a magic sequence of register writes to enable
+	 * auto-downclocking.
+	 *
+	 * Perhaps there might be some value in exposing these to
+	 * userspace...
+	 */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* Power down if completely idle for over 50ms */
+	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 50000);
+	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
+
+	reset_rps(dev_priv, gen6_set_rps);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	struct cpufreq_policy *policy;
+	const unsigned int scaling_factor = 180 / 2;
+	unsigned int max_ia_freq, min_ring_freq;
+	unsigned int max_gpu_freq, min_gpu_freq;
+	unsigned int gpu_freq;
+	int min_freq = 15;
+
+	lockdep_assert_held(&rps->lock);
+
+	if (rps->max_freq <= rps->min_freq)
+		return;
+
+	policy = cpufreq_cpu_get(0);
+	if (policy) {
+		max_ia_freq = policy->cpuinfo.max_freq;
+		cpufreq_cpu_put(policy);
+	} else {
+		/*
+		 * Default to measured freq if none found, PCU will ensure we
+		 * don't go over
+		 */
+		max_ia_freq = tsc_khz;
+	}
+
+	/* Convert from kHz to MHz */
+	max_ia_freq /= 1000;
+
+	min_ring_freq = I915_READ(DCLK) & 0xf;
+	/* convert DDR frequency from units of 266.6MHz to bandwidth */
+	min_ring_freq = mult_frac(min_ring_freq, 8, 3);
+
+	min_gpu_freq = rps->min_freq;
+	max_gpu_freq = rps->max_freq;
+	if (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) > 10) {
+		/* Convert GT frequency to 50 HZ units */
+		min_gpu_freq /= GEN9_FREQ_SCALER;
+		max_gpu_freq /= GEN9_FREQ_SCALER;
+	}
+
+	/*
+	 * For each potential GPU frequency, load a ring frequency we'd like
+	 * to use for memory access.  We do this by specifying the IA frequency
+	 * the PCU should use as a reference to determine the ring frequency.
+	 */
+	for (gpu_freq = max_gpu_freq; gpu_freq >= min_gpu_freq; gpu_freq--) {
+		int diff = max_gpu_freq - gpu_freq;
+		unsigned int ia_freq = 0, ring_freq = 0;
+
+		if (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) {
+			/*
+			 * ring_freq = 2 * GT. ring_freq is in 100MHz units
+			 * No floor required for ring frequency on SKL.
+			 */
+			ring_freq = gpu_freq;
+		} else if (INTEL_GEN(dev_priv) >= 8) {
+			/* max(2 * GT, DDR). NB: GT is 50MHz units */
+			ring_freq = max(min_ring_freq, gpu_freq);
+		} else if (IS_HASWELL(dev_priv)) {
+			ring_freq = mult_frac(gpu_freq, 5, 4);
+			ring_freq = max(min_ring_freq, ring_freq);
+			/* leave ia_freq as the default, chosen by cpufreq */
+		} else {
+			/* On older processors, there is no separate ring
+			 * clock domain, so in order to boost the bandwidth
+			 * of the ring, we need to upclock the CPU (ia_freq).
+			 *
+			 * For GPU frequencies less than 750MHz,
+			 * just use the lowest ring freq.
+			 */
+			if (gpu_freq < min_freq)
+				ia_freq = 800;
+			else
+				ia_freq = max_ia_freq - diff * scaling_factor;
+			ia_freq = DIV_ROUND_CLOSEST(ia_freq, 100);
+		}
+
+		sandybridge_pcode_write(dev_priv,
+					GEN6_PCODE_WRITE_MIN_FREQ_TABLE,
+					ia_freq << GEN6_PCODE_FREQ_IA_RATIO_SHIFT |
+					ring_freq << GEN6_PCODE_FREQ_RING_RATIO_SHIFT |
+					gpu_freq);
+	}
+}
+
+static int cherryview_rps_max_freq(struct drm_i915_private *i915)
+{
+	u32 val, rp0;
+
+	val = vlv_punit_read(i915, FB_GFX_FMAX_AT_VMAX_FUSE);
+
+	switch (INTEL_SSEU(i915)->eu_total) {
+	case 8:
+		/* (2 * 4) config */
+		rp0 = (val >> FB_GFX_FMAX_AT_VMAX_2SS4EU_FUSE_SHIFT);
+		break;
+	case 12:
+		/* (2 * 6) config */
+		rp0 = (val >> FB_GFX_FMAX_AT_VMAX_2SS6EU_FUSE_SHIFT);
+		break;
+	case 16:
+		/* (2 * 8) config */
+	default:
+		/* Setting (2 * 8) Min RP0 for any other combination */
+		rp0 = (val >> FB_GFX_FMAX_AT_VMAX_2SS8EU_FUSE_SHIFT);
+		break;
+	}
+
+	rp0 = (rp0 & FB_GFX_FREQ_FUSE_MASK);
+
+	return rp0;
+}
+
+static int cherryview_rps_rpe_freq(struct drm_i915_private *i915)
+{
+	u32 val, rpe;
+
+	val = vlv_punit_read(i915, PUNIT_GPU_DUTYCYCLE_REG);
+	rpe = (val >> PUNIT_GPU_DUTYCYCLE_RPE_FREQ_SHIFT) & PUNIT_GPU_DUTYCYCLE_RPE_FREQ_MASK;
+
+	return rpe;
+}
+
+static int cherryview_rps_guar_freq(struct drm_i915_private *i915)
+{
+	u32 val, rp1;
+
+	val = vlv_punit_read(i915, FB_GFX_FMAX_AT_VMAX_FUSE);
+	rp1 = (val & FB_GFX_FREQ_FUSE_MASK);
+
+	return rp1;
+}
+
+static u32 cherryview_rps_min_freq(struct drm_i915_private *i915)
+{
+	u32 val, rpn;
+
+	val = vlv_punit_read(i915, FB_GFX_FMIN_AT_VMIN_FUSE);
+	rpn = ((val >> FB_GFX_FMIN_AT_VMIN_FUSE_SHIFT) &
+		       FB_GFX_FREQ_FUSE_MASK);
+
+	return rpn;
+}
+
+static int valleyview_rps_guar_freq(struct drm_i915_private *i915)
+{
+	u32 val, rp1;
+
+	val = vlv_nc_read(i915, IOSF_NC_FB_GFX_FREQ_FUSE);
+
+	rp1 = (val & FB_GFX_FGUARANTEED_FREQ_FUSE_MASK) >> FB_GFX_FGUARANTEED_FREQ_FUSE_SHIFT;
+
+	return rp1;
+}
+
+static int valleyview_rps_max_freq(struct drm_i915_private *i915)
+{
+	u32 val, rp0;
+
+	val = vlv_nc_read(i915, IOSF_NC_FB_GFX_FREQ_FUSE);
+
+	rp0 = (val & FB_GFX_MAX_FREQ_FUSE_MASK) >> FB_GFX_MAX_FREQ_FUSE_SHIFT;
+	/* Clamp to max */
+	rp0 = min_t(u32, rp0, 0xea);
+
+	return rp0;
+}
+
+static int valleyview_rps_rpe_freq(struct drm_i915_private *i915)
+{
+	u32 val, rpe;
+
+	val = vlv_nc_read(i915, IOSF_NC_FB_GFX_FMAX_FUSE_LO);
+	rpe = (val & FB_FMAX_VMIN_FREQ_LO_MASK) >> FB_FMAX_VMIN_FREQ_LO_SHIFT;
+	val = vlv_nc_read(i915, IOSF_NC_FB_GFX_FMAX_FUSE_HI);
+	rpe |= (val & FB_FMAX_VMIN_FREQ_HI_MASK) << 5;
+
+	return rpe;
+}
+
+static int valleyview_rps_min_freq(struct drm_i915_private *i915)
+{
+	u32 val;
+
+	val = vlv_punit_read(i915, PUNIT_REG_GPU_LFM) & 0xff;
+	/*
+	 * According to the BYT Punit GPU turbo HAS 1.1.6.3 the minimum value
+	 * for the minimum frequency in GPLL mode is 0xc1. Contrary to this on
+	 * a BYT-M B0 the above register contains 0xbf. Moreover when setting
+	 * a frequency Punit will not allow values below 0xc0. Clamp it 0xc0
+	 * to make sure it matches what Punit accepts.
+	 */
+	return max_t(u32, val, 0xc0);
+}
+
+/* Check that the pctx buffer wasn't move under us. */
+static void valleyview_check_pctx(struct drm_i915_private *dev_priv)
+{
+	unsigned long pctx_addr = I915_READ(VLV_PCBR) & ~4095;
+
+	WARN_ON(pctx_addr != dev_priv->dsm.start +
+			     dev_priv->vlv_pctx->stolen->start);
+}
+
+/* Check that the pcbr address is not empty. */
+static void cherryview_check_pctx(struct drm_i915_private *dev_priv)
+{
+	unsigned long pctx_addr = I915_READ(VLV_PCBR) & ~4095;
+
+	WARN_ON((pctx_addr >> VLV_PCBR_ADDR_SHIFT) == 0);
+}
+
+static void cherryview_setup_pctx(struct drm_i915_private *dev_priv)
+{
+	resource_size_t pctx_paddr, paddr;
+	resource_size_t pctx_size = 32 * 1024;
+	u32 pcbr;
+
+	pcbr = I915_READ(VLV_PCBR);
+	if ((pcbr >> VLV_PCBR_ADDR_SHIFT) == 0) {
+		DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
+		paddr = dev_priv->dsm.end - pctx_size + 1;
+		GEM_BUG_ON(paddr > U32_MAX);
+
+		pctx_paddr = (paddr & (~4095));
+		I915_WRITE(VLV_PCBR, pctx_paddr);
+	}
+
+	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
+}
+
+static void valleyview_setup_pctx(struct drm_i915_private *dev_priv)
+{
+	struct drm_i915_gem_object *pctx;
+	resource_size_t pctx_paddr;
+	resource_size_t pctx_size = 24 * 1024;
+	u32 pcbr;
+
+	pcbr = I915_READ(VLV_PCBR);
+	if (pcbr) {
+		/* BIOS set it up already, grab the pre-alloc'd space */
+		resource_size_t pcbr_offset;
+
+		pcbr_offset = round_down(pcbr, 4096) - dev_priv->dsm.start;
+		pctx = i915_gem_object_create_stolen_for_preallocated(dev_priv,
+								      pcbr_offset,
+								      I915_GTT_OFFSET_NONE,
+								      pctx_size);
+		goto out;
+	}
+
+	DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
+
+	/*
+	 * From the Gunit register HAS:
+	 * The Gfx driver is expected to program this register and ensure
+	 * proper allocation within Gfx stolen memory.  For example, this
+	 * register should be programmed such than the PCBR range does not
+	 * overlap with other ranges, such as the frame buffer, protected
+	 * memory, or any other relevant ranges.
+	 */
+	pctx = i915_gem_object_create_stolen(dev_priv, pctx_size);
+	if (!pctx) {
+		DRM_DEBUG("not enough stolen space for PCTX, disabling\n");
+		goto out;
+	}
+
+	GEM_BUG_ON(range_overflows_t(u64,
+				     dev_priv->dsm.start,
+				     pctx->stolen->start,
+				     U32_MAX));
+	pctx_paddr = dev_priv->dsm.start + pctx->stolen->start;
+	I915_WRITE(VLV_PCBR, pctx_paddr);
+
+out:
+	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
+	dev_priv->vlv_pctx = pctx;
+}
+
+static void valleyview_cleanup_pctx(struct drm_i915_private *i915)
+{
+	if (WARN_ON(!i915->vlv_pctx))
+		return;
+
+	i915_gem_object_put(i915->vlv_pctx);
+	i915->vlv_pctx = NULL;
+}
+
+static void vlv_init_gpll_ref_freq(struct drm_i915_private *i915)
+{
+	i915->gt_pm.rps.gpll_ref_freq =
+		vlv_get_cck_clock(i915, "GPLL ref",
+				  CCK_GPLL_CLOCK_CONTROL,
+				  i915->czclk_freq);
+
+	DRM_DEBUG_DRIVER("GPLL reference freq: %d kHz\n",
+			 i915->gt_pm.rps.gpll_ref_freq);
+}
+
+static void valleyview_init_gt_powersave(struct drm_i915_private *i915)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	u32 val;
+
+	valleyview_setup_pctx(i915);
+
+	vlv_iosf_sb_get(i915,
+			BIT(VLV_IOSF_SB_PUNIT) |
+			BIT(VLV_IOSF_SB_NC) |
+			BIT(VLV_IOSF_SB_CCK));
+
+	vlv_init_gpll_ref_freq(i915);
+
+	val = vlv_punit_read(i915, PUNIT_REG_GPU_FREQ_STS);
+	switch ((val >> 6) & 3) {
+	case 0:
+	case 1:
+		i915->mem_freq = 800;
+		break;
+	case 2:
+		i915->mem_freq = 1066;
+		break;
+	case 3:
+		i915->mem_freq = 1333;
+		break;
+	}
+	DRM_DEBUG_DRIVER("DDR speed: %d MHz\n", i915->mem_freq);
+
+	rps->max_freq = valleyview_rps_max_freq(i915);
+	rps->rp0_freq = rps->max_freq;
+	DRM_DEBUG_DRIVER("max GPU freq: %d MHz (%u)\n",
+			 intel_gpu_freq(i915, rps->max_freq),
+			 rps->max_freq);
+
+	rps->efficient_freq = valleyview_rps_rpe_freq(i915);
+	DRM_DEBUG_DRIVER("RPe GPU freq: %d MHz (%u)\n",
+			 intel_gpu_freq(i915, rps->efficient_freq),
+			 rps->efficient_freq);
+
+	rps->rp1_freq = valleyview_rps_guar_freq(i915);
+	DRM_DEBUG_DRIVER("RP1(Guar Freq) GPU freq: %d MHz (%u)\n",
+			 intel_gpu_freq(i915, rps->rp1_freq),
+			 rps->rp1_freq);
+
+	rps->min_freq = valleyview_rps_min_freq(i915);
+	DRM_DEBUG_DRIVER("min GPU freq: %d MHz (%u)\n",
+			 intel_gpu_freq(i915, rps->min_freq),
+			 rps->min_freq);
+
+	vlv_iosf_sb_put(i915,
+			BIT(VLV_IOSF_SB_PUNIT) |
+			BIT(VLV_IOSF_SB_NC) |
+			BIT(VLV_IOSF_SB_CCK));
+}
+
+static void cherryview_init_gt_powersave(struct drm_i915_private *i915)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	u32 val;
+
+	cherryview_setup_pctx(i915);
+
+	vlv_iosf_sb_get(i915,
+			BIT(VLV_IOSF_SB_PUNIT) |
+			BIT(VLV_IOSF_SB_NC) |
+			BIT(VLV_IOSF_SB_CCK));
+
+	vlv_init_gpll_ref_freq(i915);
+
+	val = vlv_cck_read(i915, CCK_FUSE_REG);
+
+	switch ((val >> 2) & 0x7) {
+	case 3:
+		i915->mem_freq = 2000;
+		break;
+	default:
+		i915->mem_freq = 1600;
+		break;
+	}
+	DRM_DEBUG_DRIVER("DDR speed: %d MHz\n", i915->mem_freq);
+
+	rps->max_freq = cherryview_rps_max_freq(i915);
+	rps->rp0_freq = rps->max_freq;
+	DRM_DEBUG_DRIVER("max GPU freq: %d MHz (%u)\n",
+			 intel_gpu_freq(i915, rps->max_freq),
+			 rps->max_freq);
+
+	rps->efficient_freq = cherryview_rps_rpe_freq(i915);
+	DRM_DEBUG_DRIVER("RPe GPU freq: %d MHz (%u)\n",
+			 intel_gpu_freq(i915, rps->efficient_freq),
+			 rps->efficient_freq);
+
+	rps->rp1_freq = cherryview_rps_guar_freq(i915);
+	DRM_DEBUG_DRIVER("RP1(Guar) GPU freq: %d MHz (%u)\n",
+			 intel_gpu_freq(i915, rps->rp1_freq),
+			 rps->rp1_freq);
+
+	rps->min_freq = cherryview_rps_min_freq(i915);
+	DRM_DEBUG_DRIVER("min GPU freq: %d MHz (%u)\n",
+			 intel_gpu_freq(i915, rps->min_freq),
+			 rps->min_freq);
+
+	vlv_iosf_sb_put(i915,
+			BIT(VLV_IOSF_SB_PUNIT) |
+			BIT(VLV_IOSF_SB_NC) |
+			BIT(VLV_IOSF_SB_CCK));
+
+	WARN_ONCE((rps->max_freq | rps->efficient_freq | rps->rp1_freq |
+		   rps->min_freq) & 1,
+		  "Odd GPU freq values\n");
+}
+
+static void valleyview_cleanup_gt_powersave(struct drm_i915_private *i915)
+{
+	valleyview_cleanup_pctx(i915);
+}
+
+static void cherryview_enable_rc6(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 gtfifodbg, rc6_mode, pcbr;
+
+	gtfifodbg = I915_READ(GTFIFODBG) & ~(GT_FIFO_SBDEDICATE_FREE_ENTRY_CHV |
+					     GT_FIFO_FREE_ENTRIES_CHV);
+	if (gtfifodbg) {
+		DRM_DEBUG_DRIVER("GT fifo had a previous error %x\n",
+				 gtfifodbg);
+		I915_WRITE(GTFIFODBG, gtfifodbg);
+	}
+
+	cherryview_check_pctx(dev_priv);
+
+	/*
+	 * 1a & 1b: Get forcewake during program sequence. Although the driver
+	 * hasn't enabled a state yet where we need forcewake, BIOS may have.
+	 */
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/*  Disable RC states. */
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+
+	/* 2a: Program RC6 thresholds.*/
+	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16);
+	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
+	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
+
+	for_each_engine(engine, dev_priv, id)
+		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
+	I915_WRITE(GEN6_RC_SLEEP, 0);
+
+	/* TO threshold set to 500 us ( 0x186 * 1.28 us) */
+	I915_WRITE(GEN6_RC6_THRESHOLD, 0x186);
+
+	/* Allows RC6 residency counter to work */
+	I915_WRITE(VLV_COUNTER_CONTROL,
+		   _MASKED_BIT_ENABLE(VLV_COUNT_RANGE_HIGH |
+				      VLV_MEDIA_RC6_COUNT_EN |
+				      VLV_RENDER_RC6_COUNT_EN));
+
+	/* For now we assume BIOS is allocating and populating the PCBR  */
+	pcbr = I915_READ(VLV_PCBR);
+
+	/* 3: Enable RC6 */
+	rc6_mode = 0;
+	if (pcbr >> VLV_PCBR_ADDR_SHIFT)
+		rc6_mode = GEN7_RC_CTL_TO_MODE;
+	I915_WRITE(GEN6_RC_CONTROL, rc6_mode);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void cherryview_enable_rps(struct drm_i915_private *dev_priv)
+{
+	u32 val;
+
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* 1: Program defaults and thresholds for RPS*/
+	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000);
+	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
+	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
+	I915_WRITE(GEN6_RP_UP_EI, 66000);
+	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
+
+	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
+
+	/* 2: Enable RPS */
+	I915_WRITE(GEN6_RP_CONTROL,
+		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
+		   GEN6_RP_MEDIA_IS_GFX |
+		   GEN6_RP_ENABLE |
+		   GEN6_RP_UP_BUSY_AVG |
+		   GEN6_RP_DOWN_IDLE_AVG);
+
+	/* Setting Fixed Bias */
+	vlv_punit_get(dev_priv);
+
+	val = VLV_OVERRIDE_EN | VLV_SOC_TDP_EN | CHV_BIAS_CPU_50_SOC_50;
+	vlv_punit_write(dev_priv, VLV_TURBO_SOC_OVERRIDE, val);
+
+	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
+
+	vlv_punit_put(dev_priv);
+
+	/* RPS code assumes GPLL is used */
+	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
+
+	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", yesno(val & GPLLENABLE));
+	DRM_DEBUG_DRIVER("GPU status: 0x%08x\n", val);
+
+	reset_rps(dev_priv, valleyview_set_rps);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void valleyview_enable_rc6(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 gtfifodbg;
+
+	valleyview_check_pctx(dev_priv);
+
+	gtfifodbg = I915_READ(GTFIFODBG);
+	if (gtfifodbg) {
+		DRM_DEBUG_DRIVER("GT fifo had a previous error %x\n",
+				 gtfifodbg);
+		I915_WRITE(GTFIFODBG, gtfifodbg);
+	}
+
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/*  Disable RC states. */
+	I915_WRITE(GEN6_RC_CONTROL, 0);
+
+	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 0x00280000);
+	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000);
+	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25);
+
+	for_each_engine(engine, dev_priv, id)
+		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
+
+	I915_WRITE(GEN6_RC6_THRESHOLD, 0x557);
+
+	/* Allows RC6 residency counter to work */
+	I915_WRITE(VLV_COUNTER_CONTROL,
+		   _MASKED_BIT_ENABLE(VLV_COUNT_RANGE_HIGH |
+				      VLV_MEDIA_RC0_COUNT_EN |
+				      VLV_RENDER_RC0_COUNT_EN |
+				      VLV_MEDIA_RC6_COUNT_EN |
+				      VLV_RENDER_RC6_COUNT_EN));
+
+	I915_WRITE(GEN6_RC_CONTROL,
+		   GEN7_RC_CTL_TO_MODE | VLV_RC_CTL_CTX_RST_PARALLEL);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void valleyview_enable_rps(struct drm_i915_private *dev_priv)
+{
+	u32 val;
+
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000);
+	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
+	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
+	I915_WRITE(GEN6_RP_UP_EI, 66000);
+	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
+
+	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
+
+	I915_WRITE(GEN6_RP_CONTROL,
+		   GEN6_RP_MEDIA_TURBO |
+		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
+		   GEN6_RP_MEDIA_IS_GFX |
+		   GEN6_RP_ENABLE |
+		   GEN6_RP_UP_BUSY_AVG |
+		   GEN6_RP_DOWN_IDLE_CONT);
+
+	vlv_punit_get(dev_priv);
+
+	/* Setting Fixed Bias */
+	val = VLV_OVERRIDE_EN | VLV_SOC_TDP_EN | VLV_BIAS_CPU_125_SOC_875;
+	vlv_punit_write(dev_priv, VLV_TURBO_SOC_OVERRIDE, val);
+
+	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
+
+	vlv_punit_put(dev_priv);
+
+	/* RPS code assumes GPLL is used */
+	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
+
+	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", yesno(val & GPLLENABLE));
+	DRM_DEBUG_DRIVER("GPU status: 0x%08x\n", val);
+
+	reset_rps(dev_priv, valleyview_set_rps);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static unsigned int intel_pxfreq(u32 vidfreq)
+{
+	unsigned int div = (vidfreq & 0x3f0000) >> 16;
+	unsigned int post = (vidfreq & 0x3000) >> 12;
+	unsigned int pre = (vidfreq & 0x7);
+
+	if (!pre)
+		return 0;
+
+	return (div * 133333) / (pre << post);
+}
+
+static const struct cparams {
+	u16 i;
+	u16 t;
+	u16 m;
+	u16 c;
+} cparams[] = {
+	{ 1, 1333, 301, 28664 },
+	{ 1, 1066, 294, 24460 },
+	{ 1, 800, 294, 25192 },
+	{ 0, 1333, 276, 27605 },
+	{ 0, 1066, 276, 27605 },
+	{ 0, 800, 231, 23784 },
+};
+
+static unsigned long __i915_chipset_val(struct drm_i915_private *dev_priv)
+{
+	u64 total_count, diff, ret;
+	u32 count1, count2, count3, m = 0, c = 0;
+	unsigned long now = jiffies_to_msecs(jiffies), diff1;
+	int i;
+
+	lockdep_assert_held(&mchdev_lock);
+
+	diff1 = now - dev_priv->ips.last_time1;
+
+	/*
+	 * Prevent division-by-zero if we are asking too fast.
+	 * Also, we don't get interesting results if we are polling
+	 * faster than once in 10ms, so just return the saved value
+	 * in such cases.
+	 */
+	if (diff1 <= 10)
+		return dev_priv->ips.chipset_power;
+
+	count1 = I915_READ(DMIEC);
+	count2 = I915_READ(DDREC);
+	count3 = I915_READ(CSIEC);
+
+	total_count = count1 + count2 + count3;
+
+	/* FIXME: handle per-counter overflow */
+	if (total_count < dev_priv->ips.last_count1) {
+		diff = ~0UL - dev_priv->ips.last_count1;
+		diff += total_count;
+	} else {
+		diff = total_count - dev_priv->ips.last_count1;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(cparams); i++) {
+		if (cparams[i].i == dev_priv->ips.c_m &&
+		    cparams[i].t == dev_priv->ips.r_t) {
+			m = cparams[i].m;
+			c = cparams[i].c;
+			break;
+		}
+	}
+
+	diff = div_u64(diff, diff1);
+	ret = ((m * diff) + c);
+	ret = div_u64(ret, 10);
+
+	dev_priv->ips.last_count1 = total_count;
+	dev_priv->ips.last_time1 = now;
+
+	dev_priv->ips.chipset_power = ret;
+
+	return ret;
+}
+
+unsigned long i915_chipset_val(struct drm_i915_private *i915)
+{
+	unsigned long val;
+
+	if (!IS_GEN5(i915))
+		return 0;
+
+	intel_runtime_pm_get(i915);
+	spin_lock_irq(&mchdev_lock);
+
+	val = __i915_chipset_val(i915);
+
+	spin_unlock_irq(&mchdev_lock);
+	intel_runtime_pm_put(i915);
+
+	return val;
+}
+
+unsigned long i915_mch_val(struct drm_i915_private *dev_priv)
+{
+	unsigned long m, x, b;
+	u32 tsfs;
+
+	tsfs = I915_READ(TSFS);
+
+	m = ((tsfs & TSFS_SLOPE_MASK) >> TSFS_SLOPE_SHIFT);
+	x = I915_READ8(TR1);
+
+	b = tsfs & TSFS_INTR_MASK;
+
+	return ((m * x) / 127) - b;
+}
+
+static int _pxvid_to_vd(u8 pxvid)
+{
+	if (pxvid == 0)
+		return 0;
+
+	if (pxvid >= 8 && pxvid < 31)
+		pxvid = 31;
+
+	return (pxvid + 2) * 125;
+}
+
+static u32 pvid_to_extvid(struct drm_i915_private *i915, u8 pxvid)
+{
+	const int vd = _pxvid_to_vd(pxvid);
+	const int vm = vd - 1125;
+
+	if (IS_MOBILE(i915))
+		return vm > 0 ? vm : 0;
+
+	return vd;
+}
+
+static void __i915_update_gfx_val(struct drm_i915_private *dev_priv)
+{
+	u64 now, diff, diffms;
+	u32 count;
+
+	lockdep_assert_held(&mchdev_lock);
+
+	now = ktime_get_raw_ns();
+	diffms = now - dev_priv->ips.last_time2;
+	do_div(diffms, NSEC_PER_MSEC);
+
+	/* Don't divide by 0 */
+	if (!diffms)
+		return;
+
+	count = I915_READ(GFXEC);
+
+	if (count < dev_priv->ips.last_count2) {
+		diff = ~0UL - dev_priv->ips.last_count2;
+		diff += count;
+	} else {
+		diff = count - dev_priv->ips.last_count2;
+	}
+
+	dev_priv->ips.last_count2 = count;
+	dev_priv->ips.last_time2 = now;
+
+	/* More magic constants... */
+	diff = diff * 1181;
+	diff = div_u64(diff, diffms * 10);
+	dev_priv->ips.gfx_power = diff;
+}
+
+void i915_update_gfx_val(struct drm_i915_private *i915)
+{
+	if (!IS_GEN5(i915))
+		return;
+
+	intel_runtime_pm_get(i915);
+	spin_lock_irq(&mchdev_lock);
+
+	__i915_update_gfx_val(i915);
+
+	spin_unlock_irq(&mchdev_lock);
+	intel_runtime_pm_put(i915);
+}
+
+static unsigned long __i915_gfx_val(struct drm_i915_private *dev_priv)
+{
+	unsigned long t, corr, state1, corr2, state2;
+	u32 pxvid, ext_v;
+
+	lockdep_assert_held(&mchdev_lock);
+
+	pxvid = I915_READ(PXVFREQ(dev_priv->gt_pm.rps.cur_freq));
+	pxvid = (pxvid >> 24) & 0x7f;
+	ext_v = pvid_to_extvid(dev_priv, pxvid);
+
+	state1 = ext_v;
+
+	t = i915_mch_val(dev_priv);
+
+	/* Revel in the empirically derived constants */
+
+	/* Correction factor in 1/100000 units */
+	if (t > 80)
+		corr = ((t * 2349) + 135940);
+	else if (t >= 50)
+		corr = ((t * 964) + 29317);
+	else /* < 50 */
+		corr = ((t * 301) + 1004);
+
+	corr = corr * ((150142 * state1) / 10000 - 78642);
+	corr /= 100000;
+	corr2 = (corr * dev_priv->ips.corr);
+
+	state2 = (corr2 * state1) / 10000;
+	state2 /= 100; /* convert to mW */
+
+	__i915_update_gfx_val(dev_priv);
+
+	return dev_priv->ips.gfx_power + state2;
+}
+
+unsigned long i915_gfx_val(struct drm_i915_private *i915)
+{
+	unsigned long val;
+
+	if (!IS_GEN5(i915))
+		return 0;
+
+	intel_runtime_pm_get(i915);
+	spin_lock_irq(&mchdev_lock);
+
+	val = __i915_gfx_val(i915);
+
+	spin_unlock_irq(&mchdev_lock);
+	intel_runtime_pm_put(i915);
+
+	return val;
+}
+
+static struct drm_i915_private *i915_mch_dev;
+
+static struct drm_i915_private *mchdev_get(void)
+{
+	struct drm_i915_private *i915;
+
+	rcu_read_lock();
+	i915 = i915_mch_dev;
+	if (!kref_get_unless_zero(&i915->drm.ref))
+		i915 = NULL;
+	rcu_read_unlock();
+
+	return i915;
+}
+
+/**
+ * i915_read_mch_val - return value for IPS use
+ *
+ * Calculate and return a value for the IPS driver to use when deciding whether
+ * we have thermal and power headroom to increase CPU or GPU power budget.
+ */
+unsigned long i915_read_mch_val(void)
+{
+	struct drm_i915_private *i915;
+	unsigned long chipset_val, graphics_val;
+
+	i915 = mchdev_get();
+	if (!i915)
+		return 0;
+
+	intel_runtime_pm_get(i915);
+	spin_lock_irq(&mchdev_lock);
+	chipset_val = __i915_chipset_val(i915);
+	graphics_val = __i915_gfx_val(i915);
+	spin_unlock_irq(&mchdev_lock);
+	intel_runtime_pm_put(i915);
+
+	drm_dev_put(&i915->drm);
+	return chipset_val + graphics_val;
+}
+EXPORT_SYMBOL_GPL(i915_read_mch_val);
+
+/**
+ * i915_gpu_raise - raise GPU frequency limit
+ *
+ * Raise the limit; IPS indicates we have thermal headroom.
+ */
+bool i915_gpu_raise(void)
+{
+	struct drm_i915_private *i915;
+
+	i915 = mchdev_get();
+	if (!i915)
+		return false;
+
+	spin_lock_irq(&mchdev_lock);
+	if (i915->ips.max_delay > i915->ips.fmax)
+		i915->ips.max_delay--;
+	spin_unlock_irq(&mchdev_lock);
+
+	drm_dev_put(&i915->drm);
+	return true;
+}
+EXPORT_SYMBOL_GPL(i915_gpu_raise);
+
+/**
+ * i915_gpu_lower - lower GPU frequency limit
+ *
+ * IPS indicates we're close to a thermal limit, so throttle back the GPU
+ * frequency maximum.
+ */
+bool i915_gpu_lower(void)
+{
+	struct drm_i915_private *i915;
+
+	i915 = mchdev_get();
+	if (!i915)
+		return false;
+
+	spin_lock_irq(&mchdev_lock);
+	if (i915->ips.max_delay < i915->ips.min_delay)
+		i915->ips.max_delay++;
+	spin_unlock_irq(&mchdev_lock);
+
+	drm_dev_put(&i915->drm);
+	return true;
+}
+EXPORT_SYMBOL_GPL(i915_gpu_lower);
+
+/**
+ * i915_gpu_busy - indicate GPU business to IPS
+ *
+ * Tell the IPS driver whether or not the GPU is busy.
+ */
+bool i915_gpu_busy(void)
+{
+	struct drm_i915_private *i915;
+	bool ret;
+
+	i915 = mchdev_get();
+	if (!i915)
+		return false;
+
+	ret = i915->gt.awake;
+
+	drm_dev_put(&i915->drm);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(i915_gpu_busy);
+
+/**
+ * i915_gpu_turbo_disable - disable graphics turbo
+ *
+ * Disable graphics turbo by resetting the max frequency and setting the
+ * current frequency to the default.
+ */
+bool i915_gpu_turbo_disable(void)
+{
+	struct drm_i915_private *i915;
+	bool ret;
+
+	i915 = mchdev_get();
+	if (!i915)
+		return false;
+
+	spin_lock_irq(&mchdev_lock);
+	i915->ips.max_delay = i915->ips.fstart;
+	ret = ironlake_set_drps(i915, i915->ips.fstart);
+	spin_unlock_irq(&mchdev_lock);
+
+	drm_dev_put(&i915->drm);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(i915_gpu_turbo_disable);
+
+/**
+ * Tells the intel_ips driver that the i915 driver is now loaded, if
+ * IPS got loaded first.
+ *
+ * This awkward dance is so that neither module has to depend on the
+ * other in order for IPS to do the appropriate communication of
+ * GPU turbo limits to i915.
+ */
+static void
+ips_ping_for_i915_load(void)
+{
+	void (*link)(void);
+
+	link = symbol_get(ips_link_to_i915_driver);
+	if (link) {
+		link();
+		symbol_put(ips_link_to_i915_driver);
+	}
+}
+
+void intel_gpu_ips_init(struct drm_i915_private *i915)
+{
+	/*
+	 * We only register the i915 ips part with intel-ips once everything is
+	 * set up, to avoid intel-ips sneaking in and reading bogus values.
+	 */
+	rcu_assign_pointer(i915_mch_dev, i915);
+
+	ips_ping_for_i915_load();
+}
+
+void intel_gpu_ips_teardown(void)
+{
+	rcu_assign_pointer(i915_mch_dev, NULL);
+}
+
+static void intel_init_emon(struct drm_i915_private *dev_priv)
+{
+	u32 lcfuse;
+	u8 pxw[16];
+	int i;
+
+	/* Disable to program */
+	I915_WRITE(ECR, 0);
+	POSTING_READ(ECR);
+
+	/* Program energy weights for various events */
+	I915_WRITE(SDEW, 0x15040d00);
+	I915_WRITE(CSIEW0, 0x007f0000);
+	I915_WRITE(CSIEW1, 0x1e220004);
+	I915_WRITE(CSIEW2, 0x04000004);
+
+	for (i = 0; i < 5; i++)
+		I915_WRITE(PEW(i), 0);
+	for (i = 0; i < 3; i++)
+		I915_WRITE(DEW(i), 0);
+
+	/* Program P-state weights to account for frequency power adjustment */
+	for (i = 0; i < 16; i++) {
+		u32 pxvidfreq = I915_READ(PXVFREQ(i));
+		unsigned long freq = intel_pxfreq(pxvidfreq);
+		unsigned long vid = (pxvidfreq & PXVFREQ_PX_MASK) >>
+			PXVFREQ_PX_SHIFT;
+		unsigned long val;
+
+		val = vid * vid;
+		val *= freq / 1000;
+		val *= 255;
+		val /= 127 * 127 * 900;
+		if (val > 0xff)
+			DRM_ERROR("bad pxval: %ld\n", val);
+		pxw[i] = val;
+	}
+	/* Render standby states get 0 weight */
+	pxw[14] = 0;
+	pxw[15] = 0;
+
+	for (i = 0; i < 4; i++)
+		I915_WRITE(PXW(i),
+			   pxw[4 * i + 0] << 24 |
+			   pxw[4 * i + 1] << 16 |
+			   pxw[4 * i + 2] << 8 |
+			   pxw[4 * i + 3] << 0);
+
+	/* Adjust magic regs to magic values (more experimental results) */
+	I915_WRITE(OGW0, 0);
+	I915_WRITE(OGW1, 0);
+	I915_WRITE(EG0, 0x00007f00);
+	I915_WRITE(EG1, 0x0000000e);
+	I915_WRITE(EG2, 0x000e0000);
+	I915_WRITE(EG3, 0x68000300);
+	I915_WRITE(EG4, 0x42000000);
+	I915_WRITE(EG5, 0x00140031);
+	I915_WRITE(EG6, 0);
+	I915_WRITE(EG7, 0);
+
+	for (i = 0; i < 8; i++)
+		I915_WRITE(PXWL(i), 0);
+
+	/* Enable PMON + select events */
+	I915_WRITE(ECR, 0x80000019);
+
+	lcfuse = I915_READ(LCFUSE02);
+
+	dev_priv->ips.corr = (lcfuse & LCFUSE_HIV_MASK);
+}
+
+void intel_init_gt_powersave(struct drm_i915_private *i915)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+
+	mutex_init(&rps->lock);
+
+	/*
+	 * RPM depends on RC6 to save restore the GT HW context, so make RC6 a
+	 * requirement.
+	 */
+	if (!sanitize_rc6(i915)) {
+		DRM_INFO("RC6 disabled, disabling runtime PM support\n");
+		intel_runtime_pm_get(i915);
+	}
+
+	mutex_lock(&rps->lock);
+
+	/* Initialize RPS limits (for userspace) */
+	if (IS_CHERRYVIEW(i915))
+		cherryview_init_gt_powersave(i915);
+	else if (IS_VALLEYVIEW(i915))
+		valleyview_init_gt_powersave(i915);
+	else if (INTEL_GEN(i915) >= 6)
+		gen6_init_rps_frequencies(i915);
+
+	/* Derive initial user preferences/limits from the hardware limits */
+	rps->idle_freq = rps->min_freq;
+	rps->cur_freq = rps->idle_freq;
+
+	rps->max_freq_softlimit = rps->max_freq;
+	rps->min_freq_softlimit = rps->min_freq;
+
+	if (IS_HASWELL(i915) || IS_BROADWELL(i915))
+		rps->min_freq_softlimit =
+			max_t(int,
+			      rps->efficient_freq,
+			      intel_freq_opcode(i915, 450));
+
+	/* After setting max-softlimit, find the overclock max freq */
+	if (IS_GEN6(i915) || IS_IVYBRIDGE(i915) || IS_HASWELL(i915)) {
+		u32 params = 0;
+
+		sandybridge_pcode_read(i915, GEN6_READ_OC_PARAMS, &params);
+		if (params & BIT(31)) { /* OC supported */
+			DRM_DEBUG_DRIVER("Overclocking supported, max: %dMHz, overclock: %dMHz\n",
+					 (rps->max_freq & 0xff) * 50,
+					 (params & 0xff) * 50);
+			rps->max_freq = params & 0xff;
+		}
+	}
+
+	/* Finally allow us to boost to max by default */
+	rps->boost_freq = rps->max_freq;
+
+	mutex_unlock(&rps->lock);
+}
+
+void intel_cleanup_gt_powersave(struct drm_i915_private *i915)
+{
+	if (IS_VALLEYVIEW(i915))
+		valleyview_cleanup_gt_powersave(i915);
+
+	if (!HAS_RC6(i915))
+		intel_runtime_pm_put(i915);
+}
+
+/**
+ * intel_suspend_gt_powersave - suspend PM work and helper threads
+ * @i915: i915 device
+ *
+ * We don't want to disable RC6 or other features here, we just want
+ * to make sure any work we've queued has finished and won't bother
+ * us while we're suspended.
+ */
+void intel_suspend_gt_powersave(struct drm_i915_private *i915)
+{
+	if (INTEL_GEN(i915) < 6)
+		return;
+
+	/* gen6_rps_idle() will be called later to disable interrupts */
+}
+
+void intel_sanitize_gt_powersave(struct drm_i915_private *i915)
+{
+	i915->gt_pm.rps.enabled = true; /* force RPS disabling */
+	i915->gt_pm.rc6.enabled = true; /* force RC6 disabling */
+	intel_disable_gt_powersave(i915);
+
+	if (INTEL_GEN(i915) >= 11)
+		gen11_reset_rps_interrupts(i915);
+	else
+		gen6_reset_rps_interrupts(i915);
+}
+
+static inline void intel_disable_llc_pstate(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
+
+	if (!i915->gt_pm.llc_pstate.enabled)
+		return;
+
+	/* Currently there is no HW configuration to be done to disable. */
+
+	i915->gt_pm.llc_pstate.enabled = false;
+}
+
+static void intel_disable_rc6(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
+
+	if (!i915->gt_pm.rc6.enabled)
+		return;
+
+	if (INTEL_GEN(i915) >= 9)
+		gen9_disable_rc6(i915);
+	else if (IS_CHERRYVIEW(i915))
+		cherryview_disable_rc6(i915);
+	else if (IS_VALLEYVIEW(i915))
+		valleyview_disable_rc6(i915);
+	else if (INTEL_GEN(i915) >= 6)
+		gen6_disable_rc6(i915);
+
+	i915->gt_pm.rc6.enabled = false;
+}
+
+static void intel_disable_rps(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
+
+	if (!i915->gt_pm.rps.enabled)
+		return;
+
+	if (INTEL_GEN(i915) >= 9)
+		gen9_disable_rps(i915);
+	else if (IS_CHERRYVIEW(i915))
+		cherryview_disable_rps(i915);
+	else if (IS_VALLEYVIEW(i915))
+		valleyview_disable_rps(i915);
+	else if (INTEL_GEN(i915) >= 6)
+		gen6_disable_rps(i915);
+	else if (IS_IRONLAKE_M(i915))
+		ironlake_disable_drps(i915);
+
+	i915->gt_pm.rps.enabled = false;
+}
+
+void intel_disable_gt_powersave(struct drm_i915_private *i915)
+{
+	mutex_lock(&i915->gt_pm.rps.lock);
+
+	intel_disable_rc6(i915);
+	intel_disable_rps(i915);
+	if (HAS_LLC(i915))
+		intel_disable_llc_pstate(i915);
+
+	mutex_unlock(&i915->gt_pm.rps.lock);
+}
+
+static inline void intel_enable_llc_pstate(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
+
+	if (i915->gt_pm.llc_pstate.enabled)
+		return;
+
+	gen6_update_ring_freq(i915);
+
+	i915->gt_pm.llc_pstate.enabled = true;
+}
+
+static void intel_enable_rc6(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
+
+	if (i915->gt_pm.rc6.enabled)
+		return;
+
+	if (IS_CHERRYVIEW(i915))
+		cherryview_enable_rc6(i915);
+	else if (IS_VALLEYVIEW(i915))
+		valleyview_enable_rc6(i915);
+	else if (INTEL_GEN(i915) >= 9)
+		gen9_enable_rc6(i915);
+	else if (IS_BROADWELL(i915))
+		gen8_enable_rc6(i915);
+	else if (INTEL_GEN(i915) >= 6)
+		gen6_enable_rc6(i915);
+
+	i915->gt_pm.rc6.enabled = true;
+}
+
+static void intel_enable_rps(struct drm_i915_private *i915)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+
+	lockdep_assert_held(&rps->lock);
+
+	if (rps->enabled)
+		return;
+
+	if (IS_CHERRYVIEW(i915)) {
+		cherryview_enable_rps(i915);
+	} else if (IS_VALLEYVIEW(i915)) {
+		valleyview_enable_rps(i915);
+	} else if (INTEL_GEN(i915) >= 9) {
+		gen9_enable_rps(i915);
+	} else if (IS_BROADWELL(i915)) {
+		gen8_enable_rps(i915);
+	} else if (INTEL_GEN(i915) >= 6) {
+		gen6_enable_rps(i915);
+	} else if (IS_IRONLAKE_M(i915)) {
+		ironlake_enable_drps(i915);
+		intel_init_emon(i915);
+	}
+
+	WARN_ON(rps->max_freq < rps->min_freq);
+	WARN_ON(rps->idle_freq > rps->max_freq);
+
+	WARN_ON(rps->efficient_freq < rps->min_freq);
+	WARN_ON(rps->efficient_freq > rps->max_freq);
+
+	rps->enabled = true;
+}
+
+void intel_enable_gt_powersave(struct drm_i915_private *i915)
+{
+	/* Powersaving is controlled by the host when inside a VM */
+	if (intel_vgpu_active(i915))
+		return;
+
+	mutex_lock(&i915->gt_pm.rps.lock);
+
+	if (HAS_RC6(i915))
+		intel_enable_rc6(i915);
+	intel_enable_rps(i915);
+	if (HAS_LLC(i915))
+		intel_enable_llc_pstate(i915);
+
+	mutex_unlock(&i915->gt_pm.rps.lock);
+}
+
+static int byt_gpu_freq(const struct drm_i915_private *i915, int val)
+{
+	const struct intel_rps *rps = &i915->gt_pm.rps;
+
+	/*
+	 * N = val - 0xb7
+	 * Slow = Fast = GPLL ref * N
+	 */
+	return DIV_ROUND_CLOSEST(rps->gpll_ref_freq * (val - 0xb7), 1000);
+}
+
+static int byt_freq_opcode(const struct drm_i915_private *i915, int val)
+{
+	const struct intel_rps *rps = &i915->gt_pm.rps;
+
+	return DIV_ROUND_CLOSEST(1000 * val, rps->gpll_ref_freq) + 0xb7;
+}
+
+static int chv_gpu_freq(const struct drm_i915_private *i915, int val)
+{
+	const struct intel_rps *rps = &i915->gt_pm.rps;
+
+	/*
+	 * N = val / 2
+	 * CU (slow) = CU2x (fast) / 2 = GPLL ref * N / 2
+	 */
+	return DIV_ROUND_CLOSEST(rps->gpll_ref_freq * val, 2 * 2 * 1000);
+}
+
+static int chv_freq_opcode(const struct drm_i915_private *i915, int val)
+{
+	const struct intel_rps *rps = &i915->gt_pm.rps;
+
+	/* CHV needs even values */
+	return DIV_ROUND_CLOSEST(2 * 1000 * val, rps->gpll_ref_freq) * 2;
+}
+
+int intel_gpu_freq(const struct drm_i915_private *i915, int val)
+{
+	if (INTEL_GEN(i915) >= 9)
+		return DIV_ROUND_CLOSEST(val * GT_FREQUENCY_MULTIPLIER,
+					 GEN9_FREQ_SCALER);
+	else if (IS_CHERRYVIEW(i915))
+		return chv_gpu_freq(i915, val);
+	else if (IS_VALLEYVIEW(i915))
+		return byt_gpu_freq(i915, val);
+	else
+		return val * GT_FREQUENCY_MULTIPLIER;
+}
+
+int intel_freq_opcode(const struct drm_i915_private *i915, int val)
+{
+	if (INTEL_GEN(i915) >= 9)
+		return DIV_ROUND_CLOSEST(val * GEN9_FREQ_SCALER,
+					 GT_FREQUENCY_MULTIPLIER);
+	else if (IS_CHERRYVIEW(i915))
+		return chv_freq_opcode(i915, val);
+	else if (IS_VALLEYVIEW(i915))
+		return byt_freq_opcode(i915, val);
+	else
+		return DIV_ROUND_CLOSEST(val, GT_FREQUENCY_MULTIPLIER);
+}
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
new file mode 100644
index 000000000000..20e937d6c7e0
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -0,0 +1,105 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2012-2018 Intel Corporation
+ */
+
+#ifndef __INTEL_GT_PM_H__
+#define __INTEL_GT_PM_H__
+
+struct drm_i915_private;
+struct i915_request;
+struct intel_rps_client;
+
+struct intel_rps_ei {
+	ktime_t ktime;
+	u32 render_c0;
+	u32 media_c0;
+};
+
+struct intel_rps {
+	struct mutex lock;
+
+	/*
+	 * work, interrupts_enabled and pm_iir are protected by
+	 * i915->irq_lock
+	 */
+	struct work_struct work;
+	bool interrupts_enabled;
+	u32 pm_iir;
+
+	/* PM interrupt bits that should never be masked */
+	u32 pm_intrmsk_mbz;
+
+	/*
+	 * Frequencies are stored in potentially platform dependent multiples.
+	 * In other words, *_freq needs to be multiplied by X to be interesting.
+	 * Soft limits are those which are used for the dynamic reclocking done
+	 * by the driver (raise frequencies under heavy loads, and lower for
+	 * lighter loads). Hard limits are those imposed by the hardware.
+	 *
+	 * A distinction is made for overclocking, which is never enabled by
+	 * default, and is considered to be above the hard limit if it's
+	 * possible at all.
+	 */
+	u8 cur_freq;		/* Current frequency (cached, may not == HW) */
+	u8 min_freq_softlimit;	/* Minimum frequency permitted by the driver */
+	u8 max_freq_softlimit;	/* Max frequency permitted by the driver */
+	u8 max_freq;		/* Maximum frequency, RP0 if not overclocking */
+	u8 min_freq;		/* AKA RPn. Minimum frequency */
+	u8 boost_freq;		/* Frequency to request when wait boosting */
+	u8 idle_freq;		/* Frequency to request when we are idle */
+	u8 efficient_freq;	/* AKA RPe. Pre-determined balanced frequency */
+	u8 rp1_freq;		/* "less than" RP0 power/freqency */
+	u8 rp0_freq;		/* Non-overclocked max frequency. */
+	u16 gpll_ref_freq;	/* vlv/chv GPLL reference frequency */
+
+	u8 up_threshold; /* Current %busy required to uplock */
+	u8 down_threshold; /* Current %busy required to downclock */
+
+	int last_adj;
+	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+
+	bool enabled;
+	atomic_t num_waiters;
+	atomic_t boosts;
+
+	/* manual wa residency calculations */
+	struct intel_rps_ei ei;
+};
+
+struct intel_rc6 {
+	bool enabled;
+	u64 prev_hw_residency[4];
+	u64 cur_residency[4];
+};
+
+struct intel_llc_pstate {
+	bool enabled;
+};
+
+struct intel_gt_pm {
+	struct intel_rps rps;
+	struct intel_rc6 rc6;
+	struct intel_llc_pstate llc_pstate;
+};
+
+void intel_gpu_ips_init(struct drm_i915_private *i915);
+void intel_gpu_ips_teardown(void);
+
+void intel_init_gt_powersave(struct drm_i915_private *i915);
+void intel_cleanup_gt_powersave(struct drm_i915_private *i915);
+void intel_sanitize_gt_powersave(struct drm_i915_private *i915);
+void intel_enable_gt_powersave(struct drm_i915_private *i915);
+void intel_disable_gt_powersave(struct drm_i915_private *i915);
+void intel_suspend_gt_powersave(struct drm_i915_private *i915);
+
+void gen6_rps_busy(struct drm_i915_private *i915);
+void gen6_rps_reset_ei(struct drm_i915_private *i915);
+void gen6_rps_idle(struct drm_i915_private *i915);
+void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
+
+int intel_gpu_freq(const struct drm_i915_private *i915, int val);
+int intel_freq_opcode(const struct drm_i915_private *i915, int val);
+
+#endif /* __INTEL_GT_PM_H__ */
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 4f549c7cdd19..b1f33c9c5f57 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -34,27 +34,6 @@
 #include "i915_drv.h"
 #include "intel_drv.h"
 #include "intel_sideband.h"
-#include "../../../platform/x86/intel_ips.h"
-
-/**
- * DOC: RC6
- *
- * RC6 is a special power stage which allows the GPU to enter an very
- * low-voltage mode when idle, using down to 0V while at this stage.  This
- * stage is entered automatically when the GPU is idle when RC6 support is
- * enabled, and as soon as new workload arises GPU wakes up automatically as well.
- *
- * There are different RC6 modes available in Intel GPU, which differentiate
- * among each other with the latency required to enter and leave RC6 and
- * voltage consumed by the GPU in different states.
- *
- * The combination of the following flags define which states GPU is allowed
- * to enter, while RC6 is the normal RC6 state, RC6p is the deep RC6, and
- * RC6pp is deepest RC6. Their support by hardware varies according to the
- * GPU, BIOS, chipset and platform. RC6 is usually the safest one and the one
- * which brings the most power savings; deeper states save more power, but
- * require higher latency to switch to and wake up.
- */
 
 static void gen9_init_clock_gating(struct drm_i915_private *dev_priv)
 {
@@ -6102,2549 +6081,267 @@ void intel_init_ipc(struct drm_i915_private *dev_priv)
 	intel_enable_ipc(dev_priv);
 }
 
-/*
- * Lock protecting IPS related data structures
- */
-DEFINE_SPINLOCK(mchdev_lock);
+static void ibx_init_clock_gating(struct drm_i915_private *dev_priv)
+{
+	/*
+	 * On Ibex Peak and Cougar Point, we need to disable clock
+	 * gating for the panel power sequencer or it will fail to
+	 * start up when no ports are active.
+	 */
+	I915_WRITE(SOUTH_DSPCLK_GATE_D, PCH_DPLSUNIT_CLOCK_GATE_DISABLE);
+}
 
-bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val)
+static void g4x_disable_trickle_feed(struct drm_i915_private *dev_priv)
 {
-	u16 rgvswctl;
+	enum pipe pipe;
 
-	lockdep_assert_held(&mchdev_lock);
+	for_each_pipe(dev_priv, pipe) {
+		I915_WRITE(DSPCNTR(pipe),
+			   I915_READ(DSPCNTR(pipe)) |
+			   DISPPLANE_TRICKLE_FEED_DISABLE);
 
-	rgvswctl = I915_READ16(MEMSWCTL);
-	if (rgvswctl & MEMCTL_CMD_STS) {
-		DRM_DEBUG("gpu busy, RCS change rejected\n");
-		return false; /* still busy with another command */
+		I915_WRITE(DSPSURF(pipe), I915_READ(DSPSURF(pipe)));
+		POSTING_READ(DSPSURF(pipe));
 	}
-
-	rgvswctl = (MEMCTL_CMD_CHFREQ << MEMCTL_CMD_SHIFT) |
-		(val << MEMCTL_FREQ_SHIFT) | MEMCTL_SFCAVM;
-	I915_WRITE16(MEMSWCTL, rgvswctl);
-	POSTING_READ16(MEMSWCTL);
-
-	rgvswctl |= MEMCTL_CMD_STS;
-	I915_WRITE16(MEMSWCTL, rgvswctl);
-
-	return true;
 }
 
-static void ironlake_enable_drps(struct drm_i915_private *dev_priv)
+static void ilk_init_clock_gating(struct drm_i915_private *dev_priv)
 {
-	u32 rgvmodectl;
-	u8 fmax, fmin, fstart, vstart;
-
-	spin_lock_irq(&mchdev_lock);
-
-	rgvmodectl = I915_READ(MEMMODECTL);
+	u32 dspclk_gate = ILK_VRHUNIT_CLOCK_GATE_DISABLE;
 
-	/* Enable temp reporting */
-	I915_WRITE16(PMMISC, I915_READ(PMMISC) | MCPPCE_EN);
-	I915_WRITE16(TSC1, I915_READ(TSC1) | TSE);
-
-	/* 100ms RC evaluation intervals */
-	I915_WRITE(RCUPEI, 100000);
-	I915_WRITE(RCDNEI, 100000);
-
-	/* Set max/min thresholds to 90ms and 80ms respectively */
-	I915_WRITE(RCBMAXAVG, 90000);
-	I915_WRITE(RCBMINAVG, 80000);
-
-	I915_WRITE(MEMIHYST, 1);
-
-	/* Set up min, max, and cur for interrupt handling */
-	fmax = (rgvmodectl & MEMMODE_FMAX_MASK) >> MEMMODE_FMAX_SHIFT;
-	fmin = (rgvmodectl & MEMMODE_FMIN_MASK);
-	fstart = (rgvmodectl & MEMMODE_FSTART_MASK) >>
-		MEMMODE_FSTART_SHIFT;
-
-	vstart = (I915_READ(PXVFREQ(fstart)) & PXVFREQ_PX_MASK) >>
-		PXVFREQ_PX_SHIFT;
-
-	dev_priv->ips.fmax = fmax; /* IPS callback will increase this */
-	dev_priv->ips.fstart = fstart;
-
-	dev_priv->ips.max_delay = fstart;
-	dev_priv->ips.min_delay = fmin;
-	dev_priv->ips.cur_delay = fstart;
+	/*
+	 * Required for FBC
+	 * WaFbcDisableDpfcClockGating:ilk
+	 */
+	dspclk_gate |= ILK_DPFCRUNIT_CLOCK_GATE_DISABLE |
+		   ILK_DPFCUNIT_CLOCK_GATE_DISABLE |
+		   ILK_DPFDUNIT_CLOCK_GATE_ENABLE;
 
-	DRM_DEBUG_DRIVER("fmax: %d, fmin: %d, fstart: %d\n",
-			 fmax, fmin, fstart);
+	I915_WRITE(PCH_3DCGDIS0,
+		   MARIUNIT_CLOCK_GATE_DISABLE |
+		   SVSMUNIT_CLOCK_GATE_DISABLE);
+	I915_WRITE(PCH_3DCGDIS1,
+		   VFMUNIT_CLOCK_GATE_DISABLE);
 
-	I915_WRITE(MEMINTREN, MEMINT_CX_SUPR_EN | MEMINT_EVAL_CHG_EN);
+	/*
+	 * According to the spec the following bits should be set in
+	 * order to enable memory self-refresh
+	 * The bit 22/21 of 0x42004
+	 * The bit 5 of 0x42020
+	 * The bit 15 of 0x45000
+	 */
+	I915_WRITE(ILK_DISPLAY_CHICKEN2,
+		   (I915_READ(ILK_DISPLAY_CHICKEN2) |
+		    ILK_DPARB_GATE | ILK_VSDPFD_FULL));
+	dspclk_gate |= ILK_DPARBUNIT_CLOCK_GATE_ENABLE;
+	I915_WRITE(DISP_ARB_CTL,
+		   (I915_READ(DISP_ARB_CTL) |
+		    DISP_FBC_WM_DIS));
 
 	/*
-	 * Interrupts will be enabled in ironlake_irq_postinstall
+	 * Based on the document from hardware guys the following bits
+	 * should be set unconditionally in order to enable FBC.
+	 * The bit 22 of 0x42000
+	 * The bit 22 of 0x42004
+	 * The bit 7,8,9 of 0x42020.
 	 */
+	if (IS_IRONLAKE_M(dev_priv)) {
+		/* WaFbcAsynchFlipDisableFbcQueue:ilk */
+		I915_WRITE(ILK_DISPLAY_CHICKEN1,
+			   I915_READ(ILK_DISPLAY_CHICKEN1) |
+			   ILK_FBCQ_DIS);
+		I915_WRITE(ILK_DISPLAY_CHICKEN2,
+			   I915_READ(ILK_DISPLAY_CHICKEN2) |
+			   ILK_DPARB_GATE);
+	}
 
-	I915_WRITE(VIDSTART, vstart);
-	POSTING_READ(VIDSTART);
+	I915_WRITE(ILK_DSPCLK_GATE_D, dspclk_gate);
 
-	rgvmodectl |= MEMMODE_SWMODE_EN;
-	I915_WRITE(MEMMODECTL, rgvmodectl);
+	I915_WRITE(ILK_DISPLAY_CHICKEN2,
+		   I915_READ(ILK_DISPLAY_CHICKEN2) |
+		   ILK_ELPIN_409_SELECT);
+	I915_WRITE(_3D_CHICKEN2,
+		   _3D_CHICKEN2_WM_READ_PIPELINED << 16 |
+		   _3D_CHICKEN2_WM_READ_PIPELINED);
 
-	if (wait_for_atomic((I915_READ(MEMSWCTL) & MEMCTL_CMD_STS) == 0, 10))
-		DRM_ERROR("stuck trying to change perf mode\n");
-	mdelay(1);
+	/* WaDisableRenderCachePipelinedFlush:ilk */
+	I915_WRITE(CACHE_MODE_0,
+		   _MASKED_BIT_ENABLE(CM0_PIPELINED_RENDER_FLUSH_DISABLE));
 
-	ironlake_set_drps(dev_priv, fstart);
+	/* WaDisable_RenderCache_OperationalFlush:ilk */
+	I915_WRITE(CACHE_MODE_0, _MASKED_BIT_DISABLE(RC_OP_FLUSH_ENABLE));
 
-	dev_priv->ips.last_count1 = I915_READ(DMIEC) +
-		I915_READ(DDREC) + I915_READ(CSIEC);
-	dev_priv->ips.last_time1 = jiffies_to_msecs(jiffies);
-	dev_priv->ips.last_count2 = I915_READ(GFXEC);
-	dev_priv->ips.last_time2 = ktime_get_raw_ns();
+	g4x_disable_trickle_feed(dev_priv);
 
-	spin_unlock_irq(&mchdev_lock);
+	ibx_init_clock_gating(dev_priv);
 }
 
-static void ironlake_disable_drps(struct drm_i915_private *dev_priv)
+static void cpt_init_clock_gating(struct drm_i915_private *dev_priv)
 {
-	u16 rgvswctl;
-
-	spin_lock_irq(&mchdev_lock);
-
-	rgvswctl = I915_READ16(MEMSWCTL);
-
-	/* Ack interrupts, disable EFC interrupt */
-	I915_WRITE(MEMINTREN, I915_READ(MEMINTREN) & ~MEMINT_EVAL_CHG_EN);
-	I915_WRITE(MEMINTRSTS, MEMINT_EVAL_CHG);
-	I915_WRITE(DEIER, I915_READ(DEIER) & ~DE_PCU_EVENT);
-	I915_WRITE(DEIIR, DE_PCU_EVENT);
-	I915_WRITE(DEIMR, I915_READ(DEIMR) | DE_PCU_EVENT);
-
-	/* Go back to the starting frequency */
-	ironlake_set_drps(dev_priv, dev_priv->ips.fstart);
-	mdelay(1);
-	rgvswctl |= MEMCTL_CMD_STS;
-	I915_WRITE(MEMSWCTL, rgvswctl);
-	mdelay(1);
+	int pipe;
+	u32 val;
 
-	spin_unlock_irq(&mchdev_lock);
+	/*
+	 * On Ibex Peak and Cougar Point, we need to disable clock
+	 * gating for the panel power sequencer or it will fail to
+	 * start up when no ports are active.
+	 */
+	I915_WRITE(SOUTH_DSPCLK_GATE_D, PCH_DPLSUNIT_CLOCK_GATE_DISABLE |
+		   PCH_DPLUNIT_CLOCK_GATE_DISABLE |
+		   PCH_CPUNIT_CLOCK_GATE_DISABLE);
+	I915_WRITE(SOUTH_CHICKEN2, I915_READ(SOUTH_CHICKEN2) |
+		   DPLS_EDP_PPS_FIX_DIS);
+	/* The below fixes the weird display corruption, a few pixels shifted
+	 * downward, on (only) LVDS of some HP laptops with IVY.
+	 */
+	for_each_pipe(dev_priv, pipe) {
+		val = I915_READ(TRANS_CHICKEN2(pipe));
+		val |= TRANS_CHICKEN2_TIMING_OVERRIDE;
+		val &= ~TRANS_CHICKEN2_FDI_POLARITY_REVERSED;
+		if (dev_priv->vbt.fdi_rx_polarity_inverted)
+			val |= TRANS_CHICKEN2_FDI_POLARITY_REVERSED;
+		val &= ~TRANS_CHICKEN2_FRAME_START_DELAY_MASK;
+		val &= ~TRANS_CHICKEN2_DISABLE_DEEP_COLOR_COUNTER;
+		val &= ~TRANS_CHICKEN2_DISABLE_DEEP_COLOR_MODESWITCH;
+		I915_WRITE(TRANS_CHICKEN2(pipe), val);
+	}
+	/* WADP0ClockGatingDisable */
+	for_each_pipe(dev_priv, pipe) {
+		I915_WRITE(TRANS_CHICKEN1(pipe),
+			   TRANS_CHICKEN1_DP0UNIT_GC_DISABLE);
+	}
 }
 
-/* There's a funny hw issue where the hw returns all 0 when reading from
- * GEN6_RP_INTERRUPT_LIMITS. Hence we always need to compute the desired value
- * ourselves, instead of doing a rmw cycle (which might result in us clearing
- * all limits and the gpu stuck at whatever frequency it is at atm).
- */
-static u32 intel_rps_limits(struct drm_i915_private *dev_priv, u8 val)
+static void gen6_check_mch_setup(struct drm_i915_private *dev_priv)
 {
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u32 limits;
-
-	/* Only set the down limit when we've reached the lowest level to avoid
-	 * getting more interrupts, otherwise leave this clear. This prevents a
-	 * race in the hw when coming out of rc6: There's a tiny window where
-	 * the hw runs at the minimal clock before selecting the desired
-	 * frequency, if the down threshold expires in that window we will not
-	 * receive a down interrupt. */
-	if (INTEL_GEN(dev_priv) >= 9) {
-		limits = (rps->max_freq_softlimit) << 23;
-		if (val <= rps->min_freq_softlimit)
-			limits |= (rps->min_freq_softlimit) << 14;
-	} else {
-		limits = rps->max_freq_softlimit << 24;
-		if (val <= rps->min_freq_softlimit)
-			limits |= rps->min_freq_softlimit << 16;
-	}
+	u32 tmp;
 
-	return limits;
+	tmp = I915_READ(MCH_SSKPD);
+	if ((tmp & MCH_SSKPD_WM0_MASK) != MCH_SSKPD_WM0_VAL)
+		DRM_DEBUG_KMS("Wrong MCH_SSKPD value: 0x%08x This can cause underruns.\n",
+			      tmp);
 }
 
-static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+static void gen6_init_clock_gating(struct drm_i915_private *dev_priv)
 {
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	int new_power;
-	u32 threshold_up = 0, threshold_down = 0; /* in % */
-	u32 ei_up = 0, ei_down = 0;
-
-	new_power = rps->power;
-	switch (rps->power) {
-	case LOW_POWER:
-		if (val > rps->efficient_freq + 1 &&
-		    val > rps->cur_freq)
-			new_power = BETWEEN;
-		break;
+	I915_WRITE(ILK_DSPCLK_GATE_D, ILK_VRHUNIT_CLOCK_GATE_DISABLE);
 
-	case BETWEEN:
-		if (val <= rps->efficient_freq &&
-		    val < rps->cur_freq)
-			new_power = LOW_POWER;
-		else if (val >= rps->rp0_freq &&
-			 val > rps->cur_freq)
-			new_power = HIGH_POWER;
-		break;
+	I915_WRITE(ILK_DISPLAY_CHICKEN2,
+		   I915_READ(ILK_DISPLAY_CHICKEN2) |
+		   ILK_ELPIN_409_SELECT);
 
-	case HIGH_POWER:
-		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
-		    val < rps->cur_freq)
-			new_power = BETWEEN;
-		break;
-	}
-	/* Max/min bins are special */
-	if (val <= rps->min_freq_softlimit)
-		new_power = LOW_POWER;
-	if (val >= rps->max_freq_softlimit)
-		new_power = HIGH_POWER;
-	if (new_power == rps->power)
-		return;
+	/* WaDisableHiZPlanesWhenMSAAEnabled:snb */
+	I915_WRITE(_3D_CHICKEN,
+		   _MASKED_BIT_ENABLE(_3D_CHICKEN_HIZ_PLANE_DISABLE_MSAA_4X_SNB));
 
-	/* Note the units here are not exactly 1us, but 1280ns. */
-	switch (new_power) {
-	case LOW_POWER:
-		/* Upclock if more than 95% busy over 16ms */
-		ei_up = 16000;
-		threshold_up = 95;
+	/* WaDisable_RenderCache_OperationalFlush:snb */
+	I915_WRITE(CACHE_MODE_0, _MASKED_BIT_DISABLE(RC_OP_FLUSH_ENABLE));
 
-		/* Downclock if less than 85% busy over 32ms */
-		ei_down = 32000;
-		threshold_down = 85;
-		break;
+	/*
+	 * BSpec recoomends 8x4 when MSAA is used,
+	 * however in practice 16x4 seems fastest.
+	 *
+	 * Note that PS/WM thread counts depend on the WIZ hashing
+	 * disable bit, which we don't touch here, but it's good
+	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
+	 */
+	I915_WRITE(GEN6_GT_MODE,
+		   _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4));
 
-	case BETWEEN:
-		/* Upclock if more than 90% busy over 13ms */
-		ei_up = 13000;
-		threshold_up = 90;
+	I915_WRITE(CACHE_MODE_0,
+		   _MASKED_BIT_DISABLE(CM0_STC_EVICT_DISABLE_LRA_SNB));
 
-		/* Downclock if less than 75% busy over 32ms */
-		ei_down = 32000;
-		threshold_down = 75;
-		break;
+	I915_WRITE(GEN6_UCGCTL1,
+		   I915_READ(GEN6_UCGCTL1) |
+		   GEN6_BLBUNIT_CLOCK_GATE_DISABLE |
+		   GEN6_CSUNIT_CLOCK_GATE_DISABLE);
 
-	case HIGH_POWER:
-		/* Upclock if more than 85% busy over 10ms */
-		ei_up = 10000;
-		threshold_up = 85;
+	/* According to the BSpec vol1g, bit 12 (RCPBUNIT) clock
+	 * gating disable must be set.  Failure to set it results in
+	 * flickering pixels due to Z write ordering failures after
+	 * some amount of runtime in the Mesa "fire" demo, and Unigine
+	 * Sanctuary and Tropics, and apparently anything else with
+	 * alpha test or pixel discard.
+	 *
+	 * According to the spec, bit 11 (RCCUNIT) must also be set,
+	 * but we didn't debug actual testcases to find it out.
+	 *
+	 * WaDisableRCCUnitClockGating:snb
+	 * WaDisableRCPBUnitClockGating:snb
+	 */
+	I915_WRITE(GEN6_UCGCTL2,
+		   GEN6_RCPBUNIT_CLOCK_GATE_DISABLE |
+		   GEN6_RCCUNIT_CLOCK_GATE_DISABLE);
 
-		/* Downclock if less than 60% busy over 32ms */
-		ei_down = 32000;
-		threshold_down = 60;
-		break;
-	}
+	/* WaStripsFansDisableFastClipPerformanceFix:snb */
+	I915_WRITE(_3D_CHICKEN3,
+		   _MASKED_BIT_ENABLE(_3D_CHICKEN3_SF_DISABLE_FASTCLIP_CULL));
 
-	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
-		/*
-		 * Baytrail and Braswell control the gpu frequency via the
-		 * punit, which is very slow and expensive to communicate with,
-		 * as we synchronously force the package to C0. If we try and
-		 * update the gpufreq too often we cause measurable system
-		 * load for little benefit (effectively stealing CPU time for
-		 * the GPU, negatively impacting overall throughput).
-		 */
-		ei_up <<= 2;
-		ei_down <<= 2;
-	}
+	/*
+	 * Bspec says:
+	 * "This bit must be set if 3DSTATE_CLIP clip mode is set to normal and
+	 * 3DSTATE_SF number of SF output attributes is more than 16."
+	 */
+	I915_WRITE(_3D_CHICKEN3,
+		   _MASKED_BIT_ENABLE(_3D_CHICKEN3_SF_DISABLE_PIPELINED_ATTR_FETCH));
 
-	I915_WRITE(GEN6_RP_UP_EI,
-		   GT_INTERVAL_FROM_US(dev_priv, ei_up));
-	I915_WRITE(GEN6_RP_UP_THRESHOLD,
-		   GT_INTERVAL_FROM_US(dev_priv,
-				       ei_up * threshold_up / 100));
+	/*
+	 * According to the spec the following bits should be
+	 * set in order to enable memory self-refresh and fbc:
+	 * The bit21 and bit22 of 0x42000
+	 * The bit21 and bit22 of 0x42004
+	 * The bit5 and bit7 of 0x42020
+	 * The bit14 of 0x70180
+	 * The bit14 of 0x71180
+	 *
+	 * WaFbcAsynchFlipDisableFbcQueue:snb
+	 */
+	I915_WRITE(ILK_DISPLAY_CHICKEN1,
+		   I915_READ(ILK_DISPLAY_CHICKEN1) |
+		   ILK_FBCQ_DIS | ILK_PABSTRETCH_DIS);
+	I915_WRITE(ILK_DISPLAY_CHICKEN2,
+		   I915_READ(ILK_DISPLAY_CHICKEN2) |
+		   ILK_DPARB_GATE | ILK_VSDPFD_FULL);
+	I915_WRITE(ILK_DSPCLK_GATE_D,
+		   I915_READ(ILK_DSPCLK_GATE_D) |
+		   ILK_DPARBUNIT_CLOCK_GATE_ENABLE  |
+		   ILK_DPFDUNIT_CLOCK_GATE_ENABLE);
 
-	I915_WRITE(GEN6_RP_DOWN_EI,
-		   GT_INTERVAL_FROM_US(dev_priv, ei_down));
-	I915_WRITE(GEN6_RP_DOWN_THRESHOLD,
-		   GT_INTERVAL_FROM_US(dev_priv,
-				       ei_down * threshold_down / 100));
+	g4x_disable_trickle_feed(dev_priv);
 
-	I915_WRITE(GEN6_RP_CONTROL,
-		   GEN6_RP_MEDIA_TURBO |
-		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
-		   GEN6_RP_MEDIA_IS_GFX |
-		   GEN6_RP_ENABLE |
-		   GEN6_RP_UP_BUSY_AVG |
-		   GEN6_RP_DOWN_IDLE_AVG);
+	cpt_init_clock_gating(dev_priv);
 
-	rps->power = new_power;
-	rps->up_threshold = threshold_up;
-	rps->down_threshold = threshold_down;
-	rps->last_adj = 0;
+	gen6_check_mch_setup(dev_priv);
 }
 
-static u32 gen6_rps_pm_mask(struct drm_i915_private *dev_priv, u8 val)
+static void gen7_setup_fixed_func_scheduler(struct drm_i915_private *dev_priv)
 {
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u32 mask = 0;
-
-	/* We use UP_EI_EXPIRED interupts for both up/down in manual mode */
-	if (val > rps->min_freq_softlimit)
-		mask |= GEN6_PM_RP_UP_EI_EXPIRED | GEN6_PM_RP_DOWN_THRESHOLD | GEN6_PM_RP_DOWN_TIMEOUT;
-	if (val < rps->max_freq_softlimit)
-		mask |= GEN6_PM_RP_UP_EI_EXPIRED | GEN6_PM_RP_UP_THRESHOLD;
+	u32 reg = I915_READ(GEN7_FF_THREAD_MODE);
 
-	mask &= dev_priv->pm_rps_events;
+	/*
+	 * WaVSThreadDispatchOverride:ivb,vlv
+	 *
+	 * This actually overrides the dispatch
+	 * mode for all thread types.
+	 */
+	reg &= ~GEN7_FF_SCHED_MASK;
+	reg |= GEN7_FF_TS_SCHED_HW;
+	reg |= GEN7_FF_VS_SCHED_HW;
+	reg |= GEN7_FF_DS_SCHED_HW;
 
-	return gen6_sanitize_rps_pm_mask(dev_priv, ~mask);
+	I915_WRITE(GEN7_FF_THREAD_MODE, reg);
 }
 
-/* gen6_set_rps is called to update the frequency request, but should also be
- * called when the range (min_delay and max_delay) is modified so that we can
- * update the GEN6_RP_INTERRUPT_LIMITS register accordingly. */
-static int gen6_set_rps(struct drm_i915_private *dev_priv, u8 val)
+static void lpt_init_clock_gating(struct drm_i915_private *dev_priv)
 {
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	/* min/max delay may still have been modified so be sure to
-	 * write the limits value.
+	/*
+	 * TODO: this bit should only be enabled when really needed, then
+	 * disabled when not needed anymore in order to save power.
 	 */
-	if (val != rps->cur_freq) {
-		gen6_set_rps_thresholds(dev_priv, val);
-
-		if (INTEL_GEN(dev_priv) >= 9)
-			I915_WRITE(GEN6_RPNSWREQ,
-				   GEN9_FREQUENCY(val));
-		else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
-			I915_WRITE(GEN6_RPNSWREQ,
-				   HSW_FREQUENCY(val));
-		else
-			I915_WRITE(GEN6_RPNSWREQ,
-				   GEN6_FREQUENCY(val) |
-				   GEN6_OFFSET(0) |
-				   GEN6_AGGRESSIVE_TURBO);
-	}
-
-	/* Make sure we continue to get interrupts
-	 * until we hit the minimum or maximum frequencies.
-	 */
-	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS, intel_rps_limits(dev_priv, val));
-	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
-
-	rps->cur_freq = val;
-	trace_intel_gpu_freq_change(intel_gpu_freq(dev_priv, val));
-
-	return 0;
-}
-
-static int valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
-{
-	int err;
-
-	if (WARN_ONCE(IS_CHERRYVIEW(dev_priv) && (val & 1),
-		      "Odd GPU freq value\n"))
-		val &= ~1;
-
-	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
-
-	if (val != dev_priv->gt_pm.rps.cur_freq) {
-		vlv_punit_get(dev_priv);
-		err = vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
-		vlv_punit_put(dev_priv);
-		if (err)
-			return err;
-
-		gen6_set_rps_thresholds(dev_priv, val);
-	}
-
-	dev_priv->gt_pm.rps.cur_freq = val;
-	trace_intel_gpu_freq_change(intel_gpu_freq(dev_priv, val));
-
-	return 0;
-}
-
-/* vlv_set_rps_idle: Set the frequency to idle, if Gfx clocks are down
- *
- * * If Gfx is Idle, then
- * 1. Forcewake Media well.
- * 2. Request idle freq.
- * 3. Release Forcewake of Media well.
-*/
-static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u32 val = rps->idle_freq;
-	int err;
-
-	if (rps->cur_freq <= val)
-		return;
-
-	/* The punit delays the write of the frequency and voltage until it
-	 * determines the GPU is awake. During normal usage we don't want to
-	 * waste power changing the frequency if the GPU is sleeping (rc6).
-	 * However, the GPU and driver is now idle and we do not want to delay
-	 * switching to minimum voltage (reducing power whilst idle) as we do
-	 * not expect to be woken in the near future and so must flush the
-	 * change by waking the device.
-	 *
-	 * We choose to take the media powerwell (either would do to trick the
-	 * punit into committing the voltage change) as that takes a lot less
-	 * power than the render powerwell.
-	 */
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_MEDIA);
-	err = valleyview_set_rps(dev_priv, val);
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_MEDIA);
-
-	if (err)
-		DRM_ERROR("Failed to set RPS for idle\n");
-}
-
-void gen6_rps_busy(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	mutex_lock(&rps->lock);
-	if (rps->enabled) {
-		u8 freq;
-
-		if (dev_priv->pm_rps_events & GEN6_PM_RP_UP_EI_EXPIRED)
-			gen6_rps_reset_ei(dev_priv);
-		I915_WRITE(GEN6_PMINTRMSK,
-			   gen6_rps_pm_mask(dev_priv, rps->cur_freq));
-
-		gen6_enable_rps_interrupts(dev_priv);
-
-		/* Use the user's desired frequency as a guide, but for better
-		 * performance, jump directly to RPe as our starting frequency.
-		 */
-		freq = max(rps->cur_freq,
-			   rps->efficient_freq);
-
-		if (intel_set_rps(dev_priv,
-				  clamp(freq,
-					rps->min_freq_softlimit,
-					rps->max_freq_softlimit)))
-			DRM_DEBUG_DRIVER("Failed to set idle frequency\n");
-	}
-	mutex_unlock(&rps->lock);
-}
-
-void gen6_rps_idle(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	/* Flush our bottom-half so that it does not race with us
-	 * setting the idle frequency and so that it is bounded by
-	 * our rpm wakeref. And then disable the interrupts to stop any
-	 * futher RPS reclocking whilst we are asleep.
-	 */
-	gen6_disable_rps_interrupts(dev_priv);
-
-	mutex_lock(&rps->lock);
-	if (rps->enabled) {
-		if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
-			vlv_set_rps_idle(dev_priv);
-		else
-			gen6_set_rps(dev_priv, rps->idle_freq);
-		rps->last_adj = 0;
-		I915_WRITE(GEN6_PMINTRMSK,
-			   gen6_sanitize_rps_pm_mask(dev_priv, ~0));
-	}
-	mutex_unlock(&rps->lock);
-}
-
-void gen6_rps_boost(struct i915_request *rq,
-		    struct intel_rps_client *rps_client)
-{
-	struct intel_rps *rps = &rq->i915->gt_pm.rps;
-	unsigned long flags;
-	bool boost;
-
-	/* This is intentionally racy! We peek at the state here, then
-	 * validate inside the RPS worker.
-	 */
-	if (!rps->enabled)
-		return;
-
-	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
-		return;
-
-	/* Serializes with i915_request_retire() */
-	boost = false;
-	spin_lock_irqsave(&rq->lock, flags);
-	if (!rq->waitboost && !dma_fence_is_signaled_locked(&rq->fence)) {
-		boost = !atomic_fetch_inc(&rps->num_waiters);
-		rq->waitboost = true;
-	}
-	spin_unlock_irqrestore(&rq->lock, flags);
-	if (!boost)
-		return;
-
-	if (READ_ONCE(rps->cur_freq) < rps->boost_freq)
-		schedule_work(&rps->work);
-
-	atomic_inc(rps_client ? &rps_client->boosts : &rps->boosts);
-}
-
-int intel_set_rps(struct drm_i915_private *dev_priv, u8 val)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	int err;
-
-	lockdep_assert_held(&rps->lock);
-	GEM_BUG_ON(val > rps->max_freq);
-	GEM_BUG_ON(val < rps->min_freq);
-
-	if (!rps->enabled) {
-		rps->cur_freq = val;
-		return 0;
-	}
-
-	if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
-		err = valleyview_set_rps(dev_priv, val);
-	else
-		err = gen6_set_rps(dev_priv, val);
-
-	return err;
-}
-
-static void gen9_disable_rc6(struct drm_i915_private *dev_priv)
-{
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-	I915_WRITE(GEN9_PG_ENABLE, 0);
-}
-
-static void gen9_disable_rps(struct drm_i915_private *dev_priv)
-{
-	I915_WRITE(GEN6_RP_CONTROL, 0);
-}
-
-static void gen6_disable_rc6(struct drm_i915_private *dev_priv)
-{
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-}
-
-static void gen6_disable_rps(struct drm_i915_private *dev_priv)
-{
-	I915_WRITE(GEN6_RPNSWREQ, 1 << 31);
-	I915_WRITE(GEN6_RP_CONTROL, 0);
-}
-
-static void cherryview_disable_rc6(struct drm_i915_private *dev_priv)
-{
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-}
-
-static void cherryview_disable_rps(struct drm_i915_private *dev_priv)
-{
-	I915_WRITE(GEN6_RP_CONTROL, 0);
-}
-
-static void valleyview_disable_rc6(struct drm_i915_private *dev_priv)
-{
-	/* We're doing forcewake before Disabling RC6,
-	 * This what the BIOS expects when going into suspend */
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void valleyview_disable_rps(struct drm_i915_private *dev_priv)
-{
-	I915_WRITE(GEN6_RP_CONTROL, 0);
-}
-
-static bool bxt_check_bios_rc6_setup(struct drm_i915_private *dev_priv)
-{
-	bool enable_rc6 = true;
-	unsigned long rc6_ctx_base;
-	u32 rc_ctl;
-	int rc_sw_target;
-
-	rc_ctl = I915_READ(GEN6_RC_CONTROL);
-	rc_sw_target = (I915_READ(GEN6_RC_STATE) & RC_SW_TARGET_STATE_MASK) >>
-		       RC_SW_TARGET_STATE_SHIFT;
-	DRM_DEBUG_DRIVER("BIOS enabled RC states: "
-			 "HW_CTRL %s HW_RC6 %s SW_TARGET_STATE %x\n",
-			 onoff(rc_ctl & GEN6_RC_CTL_HW_ENABLE),
-			 onoff(rc_ctl & GEN6_RC_CTL_RC6_ENABLE),
-			 rc_sw_target);
-
-	if (!(I915_READ(RC6_LOCATION) & RC6_CTX_IN_DRAM)) {
-		DRM_DEBUG_DRIVER("RC6 Base location not set properly.\n");
-		enable_rc6 = false;
-	}
-
-	/*
-	 * The exact context size is not known for BXT, so assume a page size
-	 * for this check.
-	 */
-	rc6_ctx_base = I915_READ(RC6_CTX_BASE) & RC6_CTX_BASE_MASK;
-	if (!((rc6_ctx_base >= dev_priv->dsm_reserved.start) &&
-	      (rc6_ctx_base + PAGE_SIZE < dev_priv->dsm_reserved.end))) {
-		DRM_DEBUG_DRIVER("RC6 Base address not as expected.\n");
-		enable_rc6 = false;
-	}
-
-	if (!(((I915_READ(PWRCTX_MAXCNT_RCSUNIT) & IDLE_TIME_MASK) > 1) &&
-	      ((I915_READ(PWRCTX_MAXCNT_VCSUNIT0) & IDLE_TIME_MASK) > 1) &&
-	      ((I915_READ(PWRCTX_MAXCNT_BCSUNIT) & IDLE_TIME_MASK) > 1) &&
-	      ((I915_READ(PWRCTX_MAXCNT_VECSUNIT) & IDLE_TIME_MASK) > 1))) {
-		DRM_DEBUG_DRIVER("Engine Idle wait time not set properly.\n");
-		enable_rc6 = false;
-	}
-
-	if (!I915_READ(GEN8_PUSHBUS_CONTROL) ||
-	    !I915_READ(GEN8_PUSHBUS_ENABLE) ||
-	    !I915_READ(GEN8_PUSHBUS_SHIFT)) {
-		DRM_DEBUG_DRIVER("Pushbus not setup properly.\n");
-		enable_rc6 = false;
-	}
-
-	if (!I915_READ(GEN6_GFXPAUSE)) {
-		DRM_DEBUG_DRIVER("GFX pause not setup properly.\n");
-		enable_rc6 = false;
-	}
-
-	if (!I915_READ(GEN8_MISC_CTRL0)) {
-		DRM_DEBUG_DRIVER("GPM control not setup properly.\n");
-		enable_rc6 = false;
-	}
-
-	return enable_rc6;
-}
-
-static bool sanitize_rc6(struct drm_i915_private *i915)
-{
-	struct intel_device_info *info = mkwrite_device_info(i915);
-
-	/* Powersaving is controlled by the host when inside a VM */
-	if (intel_vgpu_active(i915))
-		info->has_rc6 = 0;
-
-	if (info->has_rc6 &&
-	    IS_GEN9_LP(i915) && !bxt_check_bios_rc6_setup(i915)) {
-		DRM_INFO("RC6 disabled by BIOS\n");
-		info->has_rc6 = 0;
-	}
-
-	/*
-	 * We assume that we do not have any deep rc6 levels if we don't have
-	 * have the previous rc6 level supported, i.e. we use HAS_RC6()
-	 * as the initial coarse check for rc6 in general, moving on to
-	 * progressively finer/deeper levels.
-	 */
-	if (!info->has_rc6 && info->has_rc6p)
-		info->has_rc6p = 0;
-
-	return info->has_rc6;
-}
-
-static void gen6_init_rps_frequencies(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	/* All of these values are in units of 50MHz */
-
-	/* static values from HW: RP0 > RP1 > RPn (min_freq) */
-	if (IS_GEN9_LP(dev_priv)) {
-		u32 rp_state_cap = I915_READ(BXT_RP_STATE_CAP);
-		rps->rp0_freq = (rp_state_cap >> 16) & 0xff;
-		rps->rp1_freq = (rp_state_cap >>  8) & 0xff;
-		rps->min_freq = (rp_state_cap >>  0) & 0xff;
-	} else {
-		u32 rp_state_cap = I915_READ(GEN6_RP_STATE_CAP);
-		rps->rp0_freq = (rp_state_cap >>  0) & 0xff;
-		rps->rp1_freq = (rp_state_cap >>  8) & 0xff;
-		rps->min_freq = (rp_state_cap >> 16) & 0xff;
-	}
-	/* hw_max = RP0 until we check for overclocking */
-	rps->max_freq = rps->rp0_freq;
-
-	rps->efficient_freq = rps->rp1_freq;
-	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv) ||
-	    IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) {
-		u32 ddcc_status = 0;
-
-		if (sandybridge_pcode_read(dev_priv,
-					   HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL,
-					   &ddcc_status) == 0)
-			rps->efficient_freq =
-				clamp_t(u8,
-					((ddcc_status >> 8) & 0xff),
-					rps->min_freq,
-					rps->max_freq);
-	}
-
-	if (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) {
-		/* Store the frequency values in 16.66 MHZ units, which is
-		 * the natural hardware unit for SKL
-		 */
-		rps->rp0_freq *= GEN9_FREQ_SCALER;
-		rps->rp1_freq *= GEN9_FREQ_SCALER;
-		rps->min_freq *= GEN9_FREQ_SCALER;
-		rps->max_freq *= GEN9_FREQ_SCALER;
-		rps->efficient_freq *= GEN9_FREQ_SCALER;
-	}
-}
-
-static void reset_rps(struct drm_i915_private *dev_priv,
-		      int (*set)(struct drm_i915_private *, u8))
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u8 freq = rps->cur_freq;
-
-	/* force a reset */
-	rps->power = -1;
-	rps->cur_freq = -1;
-
-	if (set(dev_priv, freq))
-		DRM_ERROR("Failed to reset RPS to initial values\n");
-}
-
-/* See the Gen9_GT_PM_Programming_Guide doc for the below */
-static void gen9_enable_rps(struct drm_i915_private *dev_priv)
-{
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/* Program defaults and thresholds for RPS */
-	if (IS_GEN9(dev_priv))
-		I915_WRITE(GEN6_RC_VIDEO_FREQ,
-			GEN9_FREQUENCY(dev_priv->gt_pm.rps.rp1_freq));
-
-	/* 1 second timeout*/
-	I915_WRITE(GEN6_RP_DOWN_TIMEOUT,
-		GT_INTERVAL_FROM_US(dev_priv, 1000000));
-
-	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 0xa);
-
-	/* Leaning on the below call to gen6_set_rps to program/setup the
-	 * Up/Down EI & threshold registers, as well as the RP_CONTROL,
-	 * RP_INTERRUPT_LIMITS & RPNSWREQ registers */
-	reset_rps(dev_priv, gen6_set_rps);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void gen9_enable_rc6(struct drm_i915_private *dev_priv)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	u32 rc6_mode;
-
-	/* 1a: Software RC state - RC0 */
-	I915_WRITE(GEN6_RC_STATE, 0);
-
-	/* 1b: Get forcewake during program sequence. Although the driver
-	 * hasn't enabled a state yet where we need forcewake, BIOS may have.*/
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/* 2a: Disable RC states. */
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-
-	/* 2b: Program RC6 thresholds.*/
-	if (INTEL_GEN(dev_priv) >= 10) {
-		I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16 | 85);
-		I915_WRITE(GEN10_MEDIA_WAKE_RATE_LIMIT, 150);
-	} else if (IS_SKYLAKE(dev_priv)) {
-		/*
-		 * WaRsDoubleRc6WrlWithCoarsePowerGating:skl Doubling WRL only
-		 * when CPG is enabled
-		 */
-		I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 108 << 16);
-	} else {
-		I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 54 << 16);
-	}
-
-	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
-	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
-	for_each_engine(engine, dev_priv, id)
-		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
-
-	if (HAS_GUC(dev_priv))
-		I915_WRITE(GUC_MAX_IDLE_COUNT, 0xA);
-
-	I915_WRITE(GEN6_RC_SLEEP, 0);
-
-	/*
-	 * 2c: Program Coarse Power Gating Policies.
-	 *
-	 * Bspec's guidance is to use 25us (really 25 * 1280ns) here. What we
-	 * use instead is a more conservative estimate for the maximum time
-	 * it takes us to service a CS interrupt and submit a new ELSP - that
-	 * is the time which the GPU is idle waiting for the CPU to select the
-	 * next request to execute. If the idle hysteresis is less than that
-	 * interrupt service latency, the hardware will automatically gate
-	 * the power well and we will then incur the wake up cost on top of
-	 * the service latency. A similar guide from intel_pstate is that we
-	 * do not want the enable hysteresis to less than the wakeup latency.
-	 *
-	 * igt/gem_exec_nop/sequential provides a rough estimate for the
-	 * service latency, and puts it around 10us for Broadwell (and other
-	 * big core) and around 40us for Broxton (and other low power cores).
-	 * [Note that for legacy ringbuffer submission, this is less than 1us!]
-	 * However, the wakeup latency on Broxton is closer to 100us. To be
-	 * conservative, we have to factor in a context switch on top (due
-	 * to ksoftirqd).
-	 */
-	I915_WRITE(GEN9_MEDIA_PG_IDLE_HYSTERESIS, 250);
-	I915_WRITE(GEN9_RENDER_PG_IDLE_HYSTERESIS, 250);
-
-	/* 3a: Enable RC6 */
-	I915_WRITE(GEN6_RC6_THRESHOLD, 37500); /* 37.5/125ms per EI */
-
-	/* WaRsUseTimeoutMode:cnl (pre-prod) */
-	if (IS_CNL_REVID(dev_priv, CNL_REVID_A0, CNL_REVID_C0))
-		rc6_mode = GEN7_RC_CTL_TO_MODE;
-	else
-		rc6_mode = GEN6_RC_CTL_EI_MODE(1);
-
-	I915_WRITE(GEN6_RC_CONTROL,
-		   GEN6_RC_CTL_HW_ENABLE |
-		   GEN6_RC_CTL_RC6_ENABLE |
-		   rc6_mode);
-
-	/*
-	 * 3b: Enable Coarse Power Gating only when RC6 is enabled.
-	 * WaRsDisableCoarsePowerGating:skl,cnl - Render/Media PG need to be disabled with RC6.
-	 */
-	if (NEEDS_WaRsDisableCoarsePowerGating(dev_priv))
-		I915_WRITE(GEN9_PG_ENABLE, 0);
-	else
-		I915_WRITE(GEN9_PG_ENABLE,
-			   GEN9_RENDER_PG_ENABLE | GEN9_MEDIA_PG_ENABLE);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void gen8_enable_rc6(struct drm_i915_private *dev_priv)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-
-	/* 1a: Software RC state - RC0 */
-	I915_WRITE(GEN6_RC_STATE, 0);
-
-	/* 1b: Get forcewake during program sequence. Although the driver
-	 * hasn't enabled a state yet where we need forcewake, BIOS may have.*/
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/* 2a: Disable RC states. */
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-
-	/* 2b: Program RC6 thresholds.*/
-	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16);
-	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
-	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
-	for_each_engine(engine, dev_priv, id)
-		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
-	I915_WRITE(GEN6_RC_SLEEP, 0);
-	I915_WRITE(GEN6_RC6_THRESHOLD, 625); /* 800us/1.28 for TO */
-
-	/* 3: Enable RC6 */
-
-	I915_WRITE(GEN6_RC_CONTROL,
-		   GEN6_RC_CTL_HW_ENABLE |
-		   GEN7_RC_CTL_TO_MODE |
-		   GEN6_RC_CTL_RC6_ENABLE);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void gen8_enable_rps(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/* 1 Program defaults and thresholds for RPS*/
-	I915_WRITE(GEN6_RPNSWREQ,
-		   HSW_FREQUENCY(rps->rp1_freq));
-	I915_WRITE(GEN6_RC_VIDEO_FREQ,
-		   HSW_FREQUENCY(rps->rp1_freq));
-	/* NB: Docs say 1s, and 1000000 - which aren't equivalent */
-	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 100000000 / 128); /* 1 second timeout */
-
-	/* Docs recommend 900MHz, and 300 MHz respectively */
-	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS,
-		   rps->max_freq_softlimit << 24 |
-		   rps->min_freq_softlimit << 16);
-
-	I915_WRITE(GEN6_RP_UP_THRESHOLD, 7600000 / 128); /* 76ms busyness per EI, 90% */
-	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 31300000 / 128); /* 313ms busyness per EI, 70%*/
-	I915_WRITE(GEN6_RP_UP_EI, 66000); /* 84.48ms, XXX: random? */
-	I915_WRITE(GEN6_RP_DOWN_EI, 350000); /* 448ms, XXX: random? */
-
-	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
-
-	/* 2: Enable RPS */
-	I915_WRITE(GEN6_RP_CONTROL,
-		   GEN6_RP_MEDIA_TURBO |
-		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
-		   GEN6_RP_MEDIA_IS_GFX |
-		   GEN6_RP_ENABLE |
-		   GEN6_RP_UP_BUSY_AVG |
-		   GEN6_RP_DOWN_IDLE_AVG);
-
-	reset_rps(dev_priv, gen6_set_rps);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void gen6_enable_rc6(struct drm_i915_private *dev_priv)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	u32 rc6vids, rc6_mask;
-	u32 gtfifodbg;
-	int ret;
-
-	I915_WRITE(GEN6_RC_STATE, 0);
-
-	/* Clear the DBG now so we don't confuse earlier errors */
-	gtfifodbg = I915_READ(GTFIFODBG);
-	if (gtfifodbg) {
-		DRM_ERROR("GT fifo had a previous error %x\n", gtfifodbg);
-		I915_WRITE(GTFIFODBG, gtfifodbg);
-	}
-
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/* disable the counters and set deterministic thresholds */
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-
-	I915_WRITE(GEN6_RC1_WAKE_RATE_LIMIT, 1000 << 16);
-	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16 | 30);
-	I915_WRITE(GEN6_RC6pp_WAKE_RATE_LIMIT, 30);
-	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000);
-	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25);
-
-	for_each_engine(engine, dev_priv, id)
-		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
-
-	I915_WRITE(GEN6_RC_SLEEP, 0);
-	I915_WRITE(GEN6_RC1e_THRESHOLD, 1000);
-	if (IS_IVYBRIDGE(dev_priv))
-		I915_WRITE(GEN6_RC6_THRESHOLD, 125000);
-	else
-		I915_WRITE(GEN6_RC6_THRESHOLD, 50000);
-	I915_WRITE(GEN6_RC6p_THRESHOLD, 150000);
-	I915_WRITE(GEN6_RC6pp_THRESHOLD, 64000); /* unused */
-
-	/* We don't use those on Haswell */
-	rc6_mask = GEN6_RC_CTL_RC6_ENABLE;
-	if (HAS_RC6p(dev_priv))
-		rc6_mask |= GEN6_RC_CTL_RC6p_ENABLE;
-	if (HAS_RC6pp(dev_priv))
-		rc6_mask |= GEN6_RC_CTL_RC6pp_ENABLE;
-	I915_WRITE(GEN6_RC_CONTROL,
-		   rc6_mask |
-		   GEN6_RC_CTL_EI_MODE(1) |
-		   GEN6_RC_CTL_HW_ENABLE);
-
-	rc6vids = 0;
-	ret = sandybridge_pcode_read(dev_priv, GEN6_PCODE_READ_RC6VIDS, &rc6vids);
-	if (IS_GEN6(dev_priv) && ret) {
-		DRM_DEBUG_DRIVER("Couldn't check for BIOS workaround\n");
-	} else if (IS_GEN6(dev_priv) && (GEN6_DECODE_RC6_VID(rc6vids & 0xff) < 450)) {
-		DRM_DEBUG_DRIVER("You should update your BIOS. Correcting minimum rc6 voltage (%dmV->%dmV)\n",
-			  GEN6_DECODE_RC6_VID(rc6vids & 0xff), 450);
-		rc6vids &= 0xffff00;
-		rc6vids |= GEN6_ENCODE_RC6_VID(450);
-		ret = sandybridge_pcode_write(dev_priv, GEN6_PCODE_WRITE_RC6VIDS, rc6vids);
-		if (ret)
-			DRM_ERROR("Couldn't fix incorrect rc6 voltage\n");
-	}
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void gen6_enable_rps(struct drm_i915_private *dev_priv)
-{
-	/* Here begins a magic sequence of register writes to enable
-	 * auto-downclocking.
-	 *
-	 * Perhaps there might be some value in exposing these to
-	 * userspace...
-	 */
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/* Power down if completely idle for over 50ms */
-	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 50000);
-	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
-
-	reset_rps(dev_priv, gen6_set_rps);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	const int min_freq = 15;
-	const int scaling_factor = 180;
-	unsigned int gpu_freq;
-	unsigned int max_ia_freq, min_ring_freq;
-	unsigned int max_gpu_freq, min_gpu_freq;
-	struct cpufreq_policy *policy;
-
-	lockdep_assert_held(&rps->lock);
-
-	if (rps->max_freq <= rps->min_freq)
-		return;
-
-	policy = cpufreq_cpu_get(0);
-	if (policy) {
-		max_ia_freq = policy->cpuinfo.max_freq;
-		cpufreq_cpu_put(policy);
-	} else {
-		/*
-		 * Default to measured freq if none found, PCU will ensure we
-		 * don't go over
-		 */
-		max_ia_freq = tsc_khz;
-	}
-
-	/* Convert from kHz to MHz */
-	max_ia_freq /= 1000;
-
-	min_ring_freq = I915_READ(DCLK) & 0xf;
-	/* convert DDR frequency from units of 266.6MHz to bandwidth */
-	min_ring_freq = mult_frac(min_ring_freq, 8, 3);
-
-	min_gpu_freq = rps->min_freq;
-	max_gpu_freq = rps->max_freq;
-	if (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) {
-		/* Convert GT frequency to 50 HZ units */
-		min_gpu_freq /= GEN9_FREQ_SCALER;
-		max_gpu_freq /= GEN9_FREQ_SCALER;
-	}
-
-	/*
-	 * For each potential GPU frequency, load a ring frequency we'd like
-	 * to use for memory access.  We do this by specifying the IA frequency
-	 * the PCU should use as a reference to determine the ring frequency.
-	 */
-	for (gpu_freq = max_gpu_freq; gpu_freq >= min_gpu_freq; gpu_freq--) {
-		const int diff = max_gpu_freq - gpu_freq;
-		unsigned int ia_freq = 0, ring_freq = 0;
-
-		if (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) {
-			/*
-			 * ring_freq = 2 * GT. ring_freq is in 100MHz units
-			 * No floor required for ring frequency on SKL.
-			 */
-			ring_freq = gpu_freq;
-		} else if (INTEL_GEN(dev_priv) >= 8) {
-			/* max(2 * GT, DDR). NB: GT is 50MHz units */
-			ring_freq = max(min_ring_freq, gpu_freq);
-		} else if (IS_HASWELL(dev_priv)) {
-			ring_freq = mult_frac(gpu_freq, 5, 4);
-			ring_freq = max(min_ring_freq, ring_freq);
-			/* leave ia_freq as the default, chosen by cpufreq */
-		} else {
-			/* On older processors, there is no separate ring
-			 * clock domain, so in order to boost the bandwidth
-			 * of the ring, we need to upclock the CPU (ia_freq).
-			 *
-			 * For GPU frequencies less than 750MHz,
-			 * just use the lowest ring freq.
-			 */
-			if (gpu_freq < min_freq)
-				ia_freq = 800;
-			else
-				ia_freq = max_ia_freq - ((diff * scaling_factor) / 2);
-			ia_freq = DIV_ROUND_CLOSEST(ia_freq, 100);
-		}
-
-		sandybridge_pcode_write(dev_priv,
-					GEN6_PCODE_WRITE_MIN_FREQ_TABLE,
-					ia_freq << GEN6_PCODE_FREQ_IA_RATIO_SHIFT |
-					ring_freq << GEN6_PCODE_FREQ_RING_RATIO_SHIFT |
-					gpu_freq);
-	}
-}
-
-static int cherryview_rps_max_freq(struct drm_i915_private *dev_priv)
-{
-	u32 val, rp0;
-
-	val = vlv_punit_read(dev_priv, FB_GFX_FMAX_AT_VMAX_FUSE);
-
-	switch (INTEL_INFO(dev_priv)->sseu.eu_total) {
-	case 8:
-		/* (2 * 4) config */
-		rp0 = (val >> FB_GFX_FMAX_AT_VMAX_2SS4EU_FUSE_SHIFT);
-		break;
-	case 12:
-		/* (2 * 6) config */
-		rp0 = (val >> FB_GFX_FMAX_AT_VMAX_2SS6EU_FUSE_SHIFT);
-		break;
-	case 16:
-		/* (2 * 8) config */
-	default:
-		/* Setting (2 * 8) Min RP0 for any other combination */
-		rp0 = (val >> FB_GFX_FMAX_AT_VMAX_2SS8EU_FUSE_SHIFT);
-		break;
-	}
-
-	rp0 = (rp0 & FB_GFX_FREQ_FUSE_MASK);
-
-	return rp0;
-}
-
-static int cherryview_rps_rpe_freq(struct drm_i915_private *dev_priv)
-{
-	u32 val, rpe;
-
-	val = vlv_punit_read(dev_priv, PUNIT_GPU_DUTYCYCLE_REG);
-	rpe = (val >> PUNIT_GPU_DUTYCYCLE_RPE_FREQ_SHIFT) & PUNIT_GPU_DUTYCYCLE_RPE_FREQ_MASK;
-
-	return rpe;
-}
-
-static int cherryview_rps_guar_freq(struct drm_i915_private *dev_priv)
-{
-	u32 val, rp1;
-
-	val = vlv_punit_read(dev_priv, FB_GFX_FMAX_AT_VMAX_FUSE);
-	rp1 = (val & FB_GFX_FREQ_FUSE_MASK);
-
-	return rp1;
-}
-
-static u32 cherryview_rps_min_freq(struct drm_i915_private *dev_priv)
-{
-	u32 val, rpn;
-
-	val = vlv_punit_read(dev_priv, FB_GFX_FMIN_AT_VMIN_FUSE);
-	rpn = ((val >> FB_GFX_FMIN_AT_VMIN_FUSE_SHIFT) &
-		       FB_GFX_FREQ_FUSE_MASK);
-
-	return rpn;
-}
-
-static int valleyview_rps_guar_freq(struct drm_i915_private *dev_priv)
-{
-	u32 val, rp1;
-
-	val = vlv_nc_read(dev_priv, IOSF_NC_FB_GFX_FREQ_FUSE);
-
-	rp1 = (val & FB_GFX_FGUARANTEED_FREQ_FUSE_MASK) >> FB_GFX_FGUARANTEED_FREQ_FUSE_SHIFT;
-
-	return rp1;
-}
-
-static int valleyview_rps_max_freq(struct drm_i915_private *dev_priv)
-{
-	u32 val, rp0;
-
-	val = vlv_nc_read(dev_priv, IOSF_NC_FB_GFX_FREQ_FUSE);
-
-	rp0 = (val & FB_GFX_MAX_FREQ_FUSE_MASK) >> FB_GFX_MAX_FREQ_FUSE_SHIFT;
-	/* Clamp to max */
-	rp0 = min_t(u32, rp0, 0xea);
-
-	return rp0;
-}
-
-static int valleyview_rps_rpe_freq(struct drm_i915_private *dev_priv)
-{
-	u32 val, rpe;
-
-	val = vlv_nc_read(dev_priv, IOSF_NC_FB_GFX_FMAX_FUSE_LO);
-	rpe = (val & FB_FMAX_VMIN_FREQ_LO_MASK) >> FB_FMAX_VMIN_FREQ_LO_SHIFT;
-	val = vlv_nc_read(dev_priv, IOSF_NC_FB_GFX_FMAX_FUSE_HI);
-	rpe |= (val & FB_FMAX_VMIN_FREQ_HI_MASK) << 5;
-
-	return rpe;
-}
-
-static int valleyview_rps_min_freq(struct drm_i915_private *dev_priv)
-{
-	u32 val;
-
-	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_LFM) & 0xff;
-	/*
-	 * According to the BYT Punit GPU turbo HAS 1.1.6.3 the minimum value
-	 * for the minimum frequency in GPLL mode is 0xc1. Contrary to this on
-	 * a BYT-M B0 the above register contains 0xbf. Moreover when setting
-	 * a frequency Punit will not allow values below 0xc0. Clamp it 0xc0
-	 * to make sure it matches what Punit accepts.
-	 */
-	return max_t(u32, val, 0xc0);
-}
-
-/* Check that the pctx buffer wasn't move under us. */
-static void valleyview_check_pctx(struct drm_i915_private *dev_priv)
-{
-	unsigned long pctx_addr = I915_READ(VLV_PCBR) & ~4095;
-
-	WARN_ON(pctx_addr != dev_priv->dsm.start +
-			     dev_priv->vlv_pctx->stolen->start);
-}
-
-
-/* Check that the pcbr address is not empty. */
-static void cherryview_check_pctx(struct drm_i915_private *dev_priv)
-{
-	unsigned long pctx_addr = I915_READ(VLV_PCBR) & ~4095;
-
-	WARN_ON((pctx_addr >> VLV_PCBR_ADDR_SHIFT) == 0);
-}
-
-static void cherryview_setup_pctx(struct drm_i915_private *dev_priv)
-{
-	resource_size_t pctx_paddr, paddr;
-	resource_size_t pctx_size = 32*1024;
-	u32 pcbr;
-
-	pcbr = I915_READ(VLV_PCBR);
-	if ((pcbr >> VLV_PCBR_ADDR_SHIFT) == 0) {
-		DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
-		paddr = dev_priv->dsm.end + 1 - pctx_size;
-		GEM_BUG_ON(paddr > U32_MAX);
-
-		pctx_paddr = (paddr & (~4095));
-		I915_WRITE(VLV_PCBR, pctx_paddr);
-	}
-
-	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
-}
-
-static void valleyview_setup_pctx(struct drm_i915_private *dev_priv)
-{
-	struct drm_i915_gem_object *pctx;
-	resource_size_t pctx_paddr;
-	resource_size_t pctx_size = 24*1024;
-	u32 pcbr;
-
-	pcbr = I915_READ(VLV_PCBR);
-	if (pcbr) {
-		/* BIOS set it up already, grab the pre-alloc'd space */
-		resource_size_t pcbr_offset;
-
-		pcbr_offset = (pcbr & (~4095)) - dev_priv->dsm.start;
-		pctx = i915_gem_object_create_stolen_for_preallocated(dev_priv,
-								      pcbr_offset,
-								      I915_GTT_OFFSET_NONE,
-								      pctx_size);
-		goto out;
-	}
-
-	DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
-
-	/*
-	 * From the Gunit register HAS:
-	 * The Gfx driver is expected to program this register and ensure
-	 * proper allocation within Gfx stolen memory.  For example, this
-	 * register should be programmed such than the PCBR range does not
-	 * overlap with other ranges, such as the frame buffer, protected
-	 * memory, or any other relevant ranges.
-	 */
-	pctx = i915_gem_object_create_stolen(dev_priv, pctx_size);
-	if (!pctx) {
-		DRM_DEBUG("not enough stolen space for PCTX, disabling\n");
-		goto out;
-	}
-
-	GEM_BUG_ON(range_overflows_t(u64,
-				     dev_priv->dsm.start,
-				     pctx->stolen->start,
-				     U32_MAX));
-	pctx_paddr = dev_priv->dsm.start + pctx->stolen->start;
-	I915_WRITE(VLV_PCBR, pctx_paddr);
-
-out:
-	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
-	dev_priv->vlv_pctx = pctx;
-}
-
-static void valleyview_cleanup_pctx(struct drm_i915_private *dev_priv)
-{
-	if (WARN_ON(!dev_priv->vlv_pctx))
-		return;
-
-	i915_gem_object_put(dev_priv->vlv_pctx);
-	dev_priv->vlv_pctx = NULL;
-}
-
-static void vlv_init_gpll_ref_freq(struct drm_i915_private *dev_priv)
-{
-	dev_priv->gt_pm.rps.gpll_ref_freq =
-		vlv_get_cck_clock(dev_priv, "GPLL ref",
-				  CCK_GPLL_CLOCK_CONTROL,
-				  dev_priv->czclk_freq);
-
-	DRM_DEBUG_DRIVER("GPLL reference freq: %d kHz\n",
-			 dev_priv->gt_pm.rps.gpll_ref_freq);
-}
-
-static void valleyview_init_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u32 val;
-
-	valleyview_setup_pctx(dev_priv);
-
-	vlv_iosf_sb_get(dev_priv,
-			BIT(VLV_IOSF_SB_PUNIT) |
-			BIT(VLV_IOSF_SB_NC) |
-			BIT(VLV_IOSF_SB_CCK));
-
-	vlv_init_gpll_ref_freq(dev_priv);
-
-	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
-	switch ((val >> 6) & 3) {
-	case 0:
-	case 1:
-		dev_priv->mem_freq = 800;
-		break;
-	case 2:
-		dev_priv->mem_freq = 1066;
-		break;
-	case 3:
-		dev_priv->mem_freq = 1333;
-		break;
-	}
-	DRM_DEBUG_DRIVER("DDR speed: %d MHz\n", dev_priv->mem_freq);
-
-	rps->max_freq = valleyview_rps_max_freq(dev_priv);
-	rps->rp0_freq = rps->max_freq;
-	DRM_DEBUG_DRIVER("max GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(dev_priv, rps->max_freq),
-			 rps->max_freq);
-
-	rps->efficient_freq = valleyview_rps_rpe_freq(dev_priv);
-	DRM_DEBUG_DRIVER("RPe GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(dev_priv, rps->efficient_freq),
-			 rps->efficient_freq);
-
-	rps->rp1_freq = valleyview_rps_guar_freq(dev_priv);
-	DRM_DEBUG_DRIVER("RP1(Guar Freq) GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(dev_priv, rps->rp1_freq),
-			 rps->rp1_freq);
-
-	rps->min_freq = valleyview_rps_min_freq(dev_priv);
-	DRM_DEBUG_DRIVER("min GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(dev_priv, rps->min_freq),
-			 rps->min_freq);
-
-	vlv_iosf_sb_put(dev_priv,
-			BIT(VLV_IOSF_SB_PUNIT) |
-			BIT(VLV_IOSF_SB_NC) |
-			BIT(VLV_IOSF_SB_CCK));
-}
-
-static void cherryview_init_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u32 val;
-
-	cherryview_setup_pctx(dev_priv);
-
-	vlv_iosf_sb_get(dev_priv,
-			BIT(VLV_IOSF_SB_PUNIT) |
-			BIT(VLV_IOSF_SB_NC) |
-			BIT(VLV_IOSF_SB_CCK));
-
-	vlv_init_gpll_ref_freq(dev_priv);
-
-	val = vlv_cck_read(dev_priv, CCK_FUSE_REG);
-
-	switch ((val >> 2) & 0x7) {
-	case 3:
-		dev_priv->mem_freq = 2000;
-		break;
-	default:
-		dev_priv->mem_freq = 1600;
-		break;
-	}
-	DRM_DEBUG_DRIVER("DDR speed: %d MHz\n", dev_priv->mem_freq);
-
-	rps->max_freq = cherryview_rps_max_freq(dev_priv);
-	rps->rp0_freq = rps->max_freq;
-	DRM_DEBUG_DRIVER("max GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(dev_priv, rps->max_freq),
-			 rps->max_freq);
-
-	rps->efficient_freq = cherryview_rps_rpe_freq(dev_priv);
-	DRM_DEBUG_DRIVER("RPe GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(dev_priv, rps->efficient_freq),
-			 rps->efficient_freq);
-
-	rps->rp1_freq = cherryview_rps_guar_freq(dev_priv);
-	DRM_DEBUG_DRIVER("RP1(Guar) GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(dev_priv, rps->rp1_freq),
-			 rps->rp1_freq);
-
-	rps->min_freq = cherryview_rps_min_freq(dev_priv);
-	DRM_DEBUG_DRIVER("min GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(dev_priv, rps->min_freq),
-			 rps->min_freq);
-
-	vlv_iosf_sb_put(dev_priv,
-			BIT(VLV_IOSF_SB_PUNIT) |
-			BIT(VLV_IOSF_SB_NC) |
-			BIT(VLV_IOSF_SB_CCK));
-
-	WARN_ONCE((rps->max_freq | rps->efficient_freq | rps->rp1_freq |
-		   rps->min_freq) & 1,
-		  "Odd GPU freq values\n");
-}
-
-static void valleyview_cleanup_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	valleyview_cleanup_pctx(dev_priv);
-}
-
-static void cherryview_enable_rc6(struct drm_i915_private *dev_priv)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	u32 gtfifodbg, rc6_mode, pcbr;
-
-	gtfifodbg = I915_READ(GTFIFODBG) & ~(GT_FIFO_SBDEDICATE_FREE_ENTRY_CHV |
-					     GT_FIFO_FREE_ENTRIES_CHV);
-	if (gtfifodbg) {
-		DRM_DEBUG_DRIVER("GT fifo had a previous error %x\n",
-				 gtfifodbg);
-		I915_WRITE(GTFIFODBG, gtfifodbg);
-	}
-
-	cherryview_check_pctx(dev_priv);
-
-	/* 1a & 1b: Get forcewake during program sequence. Although the driver
-	 * hasn't enabled a state yet where we need forcewake, BIOS may have.*/
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/*  Disable RC states. */
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-
-	/* 2a: Program RC6 thresholds.*/
-	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 40 << 16);
-	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000); /* 12500 * 1280ns */
-	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25); /* 25 * 1280ns */
-
-	for_each_engine(engine, dev_priv, id)
-		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
-	I915_WRITE(GEN6_RC_SLEEP, 0);
-
-	/* TO threshold set to 500 us ( 0x186 * 1.28 us) */
-	I915_WRITE(GEN6_RC6_THRESHOLD, 0x186);
-
-	/* Allows RC6 residency counter to work */
-	I915_WRITE(VLV_COUNTER_CONTROL,
-		   _MASKED_BIT_ENABLE(VLV_COUNT_RANGE_HIGH |
-				      VLV_MEDIA_RC6_COUNT_EN |
-				      VLV_RENDER_RC6_COUNT_EN));
-
-	/* For now we assume BIOS is allocating and populating the PCBR  */
-	pcbr = I915_READ(VLV_PCBR);
-
-	/* 3: Enable RC6 */
-	rc6_mode = 0;
-	if (pcbr >> VLV_PCBR_ADDR_SHIFT)
-		rc6_mode = GEN7_RC_CTL_TO_MODE;
-	I915_WRITE(GEN6_RC_CONTROL, rc6_mode);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void cherryview_enable_rps(struct drm_i915_private *dev_priv)
-{
-	u32 val;
-
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/* 1: Program defaults and thresholds for RPS*/
-	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000);
-	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
-	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
-	I915_WRITE(GEN6_RP_UP_EI, 66000);
-	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
-
-	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
-
-	/* 2: Enable RPS */
-	I915_WRITE(GEN6_RP_CONTROL,
-		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
-		   GEN6_RP_MEDIA_IS_GFX |
-		   GEN6_RP_ENABLE |
-		   GEN6_RP_UP_BUSY_AVG |
-		   GEN6_RP_DOWN_IDLE_AVG);
-
-	/* Setting Fixed Bias */
-	vlv_punit_get(dev_priv);
-
-	val = VLV_OVERRIDE_EN | VLV_SOC_TDP_EN | CHV_BIAS_CPU_50_SOC_50;
-	vlv_punit_write(dev_priv, VLV_TURBO_SOC_OVERRIDE, val);
-
-	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
-
-	vlv_punit_put(dev_priv);
-
-	/* RPS code assumes GPLL is used */
-	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
-
-	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", yesno(val & GPLLENABLE));
-	DRM_DEBUG_DRIVER("GPU status: 0x%08x\n", val);
-
-	reset_rps(dev_priv, valleyview_set_rps);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void valleyview_enable_rc6(struct drm_i915_private *dev_priv)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	u32 gtfifodbg;
-
-	valleyview_check_pctx(dev_priv);
-
-	gtfifodbg = I915_READ(GTFIFODBG);
-	if (gtfifodbg) {
-		DRM_DEBUG_DRIVER("GT fifo had a previous error %x\n",
-				 gtfifodbg);
-		I915_WRITE(GTFIFODBG, gtfifodbg);
-	}
-
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	/*  Disable RC states. */
-	I915_WRITE(GEN6_RC_CONTROL, 0);
-
-	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 0x00280000);
-	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000);
-	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25);
-
-	for_each_engine(engine, dev_priv, id)
-		I915_WRITE(RING_MAX_IDLE(engine->mmio_base), 10);
-
-	I915_WRITE(GEN6_RC6_THRESHOLD, 0x557);
-
-	/* Allows RC6 residency counter to work */
-	I915_WRITE(VLV_COUNTER_CONTROL,
-		   _MASKED_BIT_ENABLE(VLV_COUNT_RANGE_HIGH |
-				      VLV_MEDIA_RC0_COUNT_EN |
-				      VLV_RENDER_RC0_COUNT_EN |
-				      VLV_MEDIA_RC6_COUNT_EN |
-				      VLV_RENDER_RC6_COUNT_EN));
-
-	I915_WRITE(GEN6_RC_CONTROL,
-		   GEN7_RC_CTL_TO_MODE | VLV_RC_CTL_CTX_RST_PARALLEL);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static void valleyview_enable_rps(struct drm_i915_private *dev_priv)
-{
-	u32 val;
-
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
-
-	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000);
-	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
-	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
-	I915_WRITE(GEN6_RP_UP_EI, 66000);
-	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
-
-	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
-
-	I915_WRITE(GEN6_RP_CONTROL,
-		   GEN6_RP_MEDIA_TURBO |
-		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
-		   GEN6_RP_MEDIA_IS_GFX |
-		   GEN6_RP_ENABLE |
-		   GEN6_RP_UP_BUSY_AVG |
-		   GEN6_RP_DOWN_IDLE_CONT);
-
-	vlv_punit_get(dev_priv);
-
-	/* Setting Fixed Bias */
-	val = VLV_OVERRIDE_EN | VLV_SOC_TDP_EN | VLV_BIAS_CPU_125_SOC_875;
-	vlv_punit_write(dev_priv, VLV_TURBO_SOC_OVERRIDE, val);
-
-	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
-
-	vlv_punit_put(dev_priv);
-
-	/* RPS code assumes GPLL is used */
-	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
-
-	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", yesno(val & GPLLENABLE));
-	DRM_DEBUG_DRIVER("GPU status: 0x%08x\n", val);
-
-	reset_rps(dev_priv, valleyview_set_rps);
-
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-}
-
-static unsigned long intel_pxfreq(u32 vidfreq)
-{
-	unsigned long freq;
-	int div = (vidfreq & 0x3f0000) >> 16;
-	int post = (vidfreq & 0x3000) >> 12;
-	int pre = (vidfreq & 0x7);
-
-	if (!pre)
-		return 0;
-
-	freq = ((div * 133333) / ((1<<post) * pre));
-
-	return freq;
-}
-
-static const struct cparams {
-	u16 i;
-	u16 t;
-	u16 m;
-	u16 c;
-} cparams[] = {
-	{ 1, 1333, 301, 28664 },
-	{ 1, 1066, 294, 24460 },
-	{ 1, 800, 294, 25192 },
-	{ 0, 1333, 276, 27605 },
-	{ 0, 1066, 276, 27605 },
-	{ 0, 800, 231, 23784 },
-};
-
-static unsigned long __i915_chipset_val(struct drm_i915_private *dev_priv)
-{
-	u64 total_count, diff, ret;
-	u32 count1, count2, count3, m = 0, c = 0;
-	unsigned long now = jiffies_to_msecs(jiffies), diff1;
-	int i;
-
-	lockdep_assert_held(&mchdev_lock);
-
-	diff1 = now - dev_priv->ips.last_time1;
-
-	/* Prevent division-by-zero if we are asking too fast.
-	 * Also, we don't get interesting results if we are polling
-	 * faster than once in 10ms, so just return the saved value
-	 * in such cases.
-	 */
-	if (diff1 <= 10)
-		return dev_priv->ips.chipset_power;
-
-	count1 = I915_READ(DMIEC);
-	count2 = I915_READ(DDREC);
-	count3 = I915_READ(CSIEC);
-
-	total_count = count1 + count2 + count3;
-
-	/* FIXME: handle per-counter overflow */
-	if (total_count < dev_priv->ips.last_count1) {
-		diff = ~0UL - dev_priv->ips.last_count1;
-		diff += total_count;
-	} else {
-		diff = total_count - dev_priv->ips.last_count1;
-	}
-
-	for (i = 0; i < ARRAY_SIZE(cparams); i++) {
-		if (cparams[i].i == dev_priv->ips.c_m &&
-		    cparams[i].t == dev_priv->ips.r_t) {
-			m = cparams[i].m;
-			c = cparams[i].c;
-			break;
-		}
-	}
-
-	diff = div_u64(diff, diff1);
-	ret = ((m * diff) + c);
-	ret = div_u64(ret, 10);
-
-	dev_priv->ips.last_count1 = total_count;
-	dev_priv->ips.last_time1 = now;
-
-	dev_priv->ips.chipset_power = ret;
-
-	return ret;
-}
-
-unsigned long i915_chipset_val(struct drm_i915_private *dev_priv)
-{
-	unsigned long val;
-
-	if (!IS_GEN5(dev_priv))
-		return 0;
-
-	intel_runtime_pm_get(dev_priv);
-	spin_lock_irq(&mchdev_lock);
-
-	val = __i915_chipset_val(dev_priv);
-
-	spin_unlock_irq(&mchdev_lock);
-	intel_runtime_pm_put(dev_priv);
-
-	return val;
-}
-
-unsigned long i915_mch_val(struct drm_i915_private *dev_priv)
-{
-	unsigned long m, x, b;
-	u32 tsfs;
-
-	tsfs = I915_READ(TSFS);
-
-	m = ((tsfs & TSFS_SLOPE_MASK) >> TSFS_SLOPE_SHIFT);
-	x = I915_READ8(TR1);
-
-	b = tsfs & TSFS_INTR_MASK;
-
-	return ((m * x) / 127) - b;
-}
-
-static int _pxvid_to_vd(u8 pxvid)
-{
-	if (pxvid == 0)
-		return 0;
-
-	if (pxvid >= 8 && pxvid < 31)
-		pxvid = 31;
-
-	return (pxvid + 2) * 125;
-}
-
-static u32 pvid_to_extvid(struct drm_i915_private *dev_priv, u8 pxvid)
-{
-	const int vd = _pxvid_to_vd(pxvid);
-	const int vm = vd - 1125;
-
-	if (INTEL_INFO(dev_priv)->is_mobile)
-		return vm > 0 ? vm : 0;
-
-	return vd;
-}
-
-static void __i915_update_gfx_val(struct drm_i915_private *dev_priv)
-{
-	u64 now, diff, diffms;
-	u32 count;
-
-	lockdep_assert_held(&mchdev_lock);
-
-	now = ktime_get_raw_ns();
-	diffms = now - dev_priv->ips.last_time2;
-	do_div(diffms, NSEC_PER_MSEC);
-
-	/* Don't divide by 0 */
-	if (!diffms)
-		return;
-
-	count = I915_READ(GFXEC);
-
-	if (count < dev_priv->ips.last_count2) {
-		diff = ~0UL - dev_priv->ips.last_count2;
-		diff += count;
-	} else {
-		diff = count - dev_priv->ips.last_count2;
-	}
-
-	dev_priv->ips.last_count2 = count;
-	dev_priv->ips.last_time2 = now;
-
-	/* More magic constants... */
-	diff = diff * 1181;
-	diff = div_u64(diff, diffms * 10);
-	dev_priv->ips.gfx_power = diff;
-}
-
-void i915_update_gfx_val(struct drm_i915_private *dev_priv)
-{
-	if (!IS_GEN5(dev_priv))
-		return;
-
-	intel_runtime_pm_get(dev_priv);
-	spin_lock_irq(&mchdev_lock);
-
-	__i915_update_gfx_val(dev_priv);
-
-	spin_unlock_irq(&mchdev_lock);
-	intel_runtime_pm_put(dev_priv);
-}
-
-static unsigned long __i915_gfx_val(struct drm_i915_private *dev_priv)
-{
-	unsigned long t, corr, state1, corr2, state2;
-	u32 pxvid, ext_v;
-
-	lockdep_assert_held(&mchdev_lock);
-
-	pxvid = I915_READ(PXVFREQ(dev_priv->gt_pm.rps.cur_freq));
-	pxvid = (pxvid >> 24) & 0x7f;
-	ext_v = pvid_to_extvid(dev_priv, pxvid);
-
-	state1 = ext_v;
-
-	t = i915_mch_val(dev_priv);
-
-	/* Revel in the empirically derived constants */
-
-	/* Correction factor in 1/100000 units */
-	if (t > 80)
-		corr = ((t * 2349) + 135940);
-	else if (t >= 50)
-		corr = ((t * 964) + 29317);
-	else /* < 50 */
-		corr = ((t * 301) + 1004);
-
-	corr = corr * ((150142 * state1) / 10000 - 78642);
-	corr /= 100000;
-	corr2 = (corr * dev_priv->ips.corr);
-
-	state2 = (corr2 * state1) / 10000;
-	state2 /= 100; /* convert to mW */
-
-	__i915_update_gfx_val(dev_priv);
-
-	return dev_priv->ips.gfx_power + state2;
-}
-
-unsigned long i915_gfx_val(struct drm_i915_private *dev_priv)
-{
-	unsigned long val;
-
-	if (!IS_GEN5(dev_priv))
-		return 0;
-
-	intel_runtime_pm_get(dev_priv);
-	spin_lock_irq(&mchdev_lock);
-
-	val = __i915_gfx_val(dev_priv);
-
-	spin_unlock_irq(&mchdev_lock);
-	intel_runtime_pm_put(dev_priv);
-
-	return val;
-}
-
-static struct drm_i915_private *i915_mch_dev;
-
-static struct drm_i915_private *mchdev_get(void)
-{
-	struct drm_i915_private *i915;
-
-	rcu_read_lock();
-	i915 = i915_mch_dev;
-	if (!kref_get_unless_zero(&i915->drm.ref))
-		i915 = NULL;
-	rcu_read_unlock();
-
-	return i915;
-}
-
-/**
- * i915_read_mch_val - return value for IPS use
- *
- * Calculate and return a value for the IPS driver to use when deciding whether
- * we have thermal and power headroom to increase CPU or GPU power budget.
- */
-unsigned long i915_read_mch_val(void)
-{
-	struct drm_i915_private *i915;
-	unsigned long chipset_val, graphics_val;
-
-	i915 = mchdev_get();
-	if (!i915)
-		return 0;
-
-	intel_runtime_pm_get(i915);
-	spin_lock_irq(&mchdev_lock);
-	chipset_val = __i915_chipset_val(i915);
-	graphics_val = __i915_gfx_val(i915);
-	spin_unlock_irq(&mchdev_lock);
-	intel_runtime_pm_put(i915);
-
-	drm_dev_put(&i915->drm);
-	return chipset_val + graphics_val;
-}
-EXPORT_SYMBOL_GPL(i915_read_mch_val);
-
-/**
- * i915_gpu_raise - raise GPU frequency limit
- *
- * Raise the limit; IPS indicates we have thermal headroom.
- */
-bool i915_gpu_raise(void)
-{
-	struct drm_i915_private *i915;
-
-	i915 = mchdev_get();
-	if (!i915)
-		return false;
-
-	spin_lock_irq(&mchdev_lock);
-	if (i915->ips.max_delay > i915->ips.fmax)
-		i915->ips.max_delay--;
-	spin_unlock_irq(&mchdev_lock);
-
-	drm_dev_put(&i915->drm);
-	return true;
-}
-EXPORT_SYMBOL_GPL(i915_gpu_raise);
-
-/**
- * i915_gpu_lower - lower GPU frequency limit
- *
- * IPS indicates we're close to a thermal limit, so throttle back the GPU
- * frequency maximum.
- */
-bool i915_gpu_lower(void)
-{
-	struct drm_i915_private *i915;
-
-	i915 = mchdev_get();
-	if (!i915)
-		return false;
-
-	spin_lock_irq(&mchdev_lock);
-	if (i915->ips.max_delay < i915->ips.min_delay)
-		i915->ips.max_delay++;
-	spin_unlock_irq(&mchdev_lock);
-
-	drm_dev_put(&i915->drm);
-	return true;
-}
-EXPORT_SYMBOL_GPL(i915_gpu_lower);
-
-/**
- * i915_gpu_busy - indicate GPU business to IPS
- *
- * Tell the IPS driver whether or not the GPU is busy.
- */
-bool i915_gpu_busy(void)
-{
-	struct drm_i915_private *i915;
-	bool ret;
-
-	i915 = mchdev_get();
-	if (!i915)
-		return false;
-
-	ret = i915->gt.awake;
-
-	drm_dev_put(&i915->drm);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(i915_gpu_busy);
-
-/**
- * i915_gpu_turbo_disable - disable graphics turbo
- *
- * Disable graphics turbo by resetting the max frequency and setting the
- * current frequency to the default.
- */
-bool i915_gpu_turbo_disable(void)
-{
-	struct drm_i915_private *i915;
-	bool ret;
-
-	i915 = mchdev_get();
-	if (!i915)
-		return false;
-
-	spin_lock_irq(&mchdev_lock);
-	i915->ips.max_delay = i915->ips.fstart;
-	ret = ironlake_set_drps(i915, i915->ips.fstart);
-	spin_unlock_irq(&mchdev_lock);
-
-	drm_dev_put(&i915->drm);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(i915_gpu_turbo_disable);
-
-/**
- * Tells the intel_ips driver that the i915 driver is now loaded, if
- * IPS got loaded first.
- *
- * This awkward dance is so that neither module has to depend on the
- * other in order for IPS to do the appropriate communication of
- * GPU turbo limits to i915.
- */
-static void
-ips_ping_for_i915_load(void)
-{
-	void (*link)(void);
-
-	link = symbol_get(ips_link_to_i915_driver);
-	if (link) {
-		link();
-		symbol_put(ips_link_to_i915_driver);
-	}
-}
-
-void intel_gpu_ips_init(struct drm_i915_private *dev_priv)
-{
-	/* We only register the i915 ips part with intel-ips once everything is
-	 * set up, to avoid intel-ips sneaking in and reading bogus values. */
-	rcu_assign_pointer(i915_mch_dev, dev_priv);
-
-	ips_ping_for_i915_load();
-}
-
-void intel_gpu_ips_teardown(void)
-{
-	rcu_assign_pointer(i915_mch_dev, NULL);
-}
-
-static void intel_init_emon(struct drm_i915_private *dev_priv)
-{
-	u32 lcfuse;
-	u8 pxw[16];
-	int i;
-
-	/* Disable to program */
-	I915_WRITE(ECR, 0);
-	POSTING_READ(ECR);
-
-	/* Program energy weights for various events */
-	I915_WRITE(SDEW, 0x15040d00);
-	I915_WRITE(CSIEW0, 0x007f0000);
-	I915_WRITE(CSIEW1, 0x1e220004);
-	I915_WRITE(CSIEW2, 0x04000004);
-
-	for (i = 0; i < 5; i++)
-		I915_WRITE(PEW(i), 0);
-	for (i = 0; i < 3; i++)
-		I915_WRITE(DEW(i), 0);
-
-	/* Program P-state weights to account for frequency power adjustment */
-	for (i = 0; i < 16; i++) {
-		u32 pxvidfreq = I915_READ(PXVFREQ(i));
-		unsigned long freq = intel_pxfreq(pxvidfreq);
-		unsigned long vid = (pxvidfreq & PXVFREQ_PX_MASK) >>
-			PXVFREQ_PX_SHIFT;
-		unsigned long val;
-
-		val = vid * vid;
-		val *= (freq / 1000);
-		val *= 255;
-		val /= (127*127*900);
-		if (val > 0xff)
-			DRM_ERROR("bad pxval: %ld\n", val);
-		pxw[i] = val;
-	}
-	/* Render standby states get 0 weight */
-	pxw[14] = 0;
-	pxw[15] = 0;
-
-	for (i = 0; i < 4; i++) {
-		u32 val = (pxw[i*4] << 24) | (pxw[(i*4)+1] << 16) |
-			(pxw[(i*4)+2] << 8) | (pxw[(i*4)+3]);
-		I915_WRITE(PXW(i), val);
-	}
-
-	/* Adjust magic regs to magic values (more experimental results) */
-	I915_WRITE(OGW0, 0);
-	I915_WRITE(OGW1, 0);
-	I915_WRITE(EG0, 0x00007f00);
-	I915_WRITE(EG1, 0x0000000e);
-	I915_WRITE(EG2, 0x000e0000);
-	I915_WRITE(EG3, 0x68000300);
-	I915_WRITE(EG4, 0x42000000);
-	I915_WRITE(EG5, 0x00140031);
-	I915_WRITE(EG6, 0);
-	I915_WRITE(EG7, 0);
-
-	for (i = 0; i < 8; i++)
-		I915_WRITE(PXWL(i), 0);
-
-	/* Enable PMON + select events */
-	I915_WRITE(ECR, 0x80000019);
-
-	lcfuse = I915_READ(LCFUSE02);
-
-	dev_priv->ips.corr = (lcfuse & LCFUSE_HIV_MASK);
-}
-
-void intel_init_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	/*
-	 * RPM depends on RC6 to save restore the GT HW context, so make RC6 a
-	 * requirement.
-	 */
-	if (!sanitize_rc6(dev_priv)) {
-		DRM_INFO("RC6 disabled, disabling runtime PM support\n");
-		intel_runtime_pm_get(dev_priv);
-	}
-
-	mutex_lock(&rps->lock);
-
-	/* Initialize RPS limits (for userspace) */
-	if (IS_CHERRYVIEW(dev_priv))
-		cherryview_init_gt_powersave(dev_priv);
-	else if (IS_VALLEYVIEW(dev_priv))
-		valleyview_init_gt_powersave(dev_priv);
-	else if (INTEL_GEN(dev_priv) >= 6)
-		gen6_init_rps_frequencies(dev_priv);
-
-	/* Derive initial user preferences/limits from the hardware limits */
-	rps->idle_freq = rps->min_freq;
-	rps->cur_freq = rps->idle_freq;
-
-	rps->max_freq_softlimit = rps->max_freq;
-	rps->min_freq_softlimit = rps->min_freq;
-
-	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
-		rps->min_freq_softlimit =
-			max_t(int,
-			      rps->efficient_freq,
-			      intel_freq_opcode(dev_priv, 450));
-
-	/* After setting max-softlimit, find the overclock max freq */
-	if (IS_GEN6(dev_priv) ||
-	    IS_IVYBRIDGE(dev_priv) || IS_HASWELL(dev_priv)) {
-		u32 params = 0;
-
-		sandybridge_pcode_read(dev_priv, GEN6_READ_OC_PARAMS, &params);
-		if (params & BIT(31)) { /* OC supported */
-			DRM_DEBUG_DRIVER("Overclocking supported, max: %dMHz, overclock: %dMHz\n",
-					 (rps->max_freq & 0xff) * 50,
-					 (params & 0xff) * 50);
-			rps->max_freq = params & 0xff;
-		}
-	}
-
-	/* Finally allow us to boost to max by default */
-	rps->boost_freq = rps->max_freq;
-
-	mutex_unlock(&rps->lock);
-}
-
-void intel_cleanup_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	if (IS_VALLEYVIEW(dev_priv))
-		valleyview_cleanup_gt_powersave(dev_priv);
-
-	if (!HAS_RC6(dev_priv))
-		intel_runtime_pm_put(dev_priv);
-}
-
-/**
- * intel_suspend_gt_powersave - suspend PM work and helper threads
- * @dev_priv: i915 device
- *
- * We don't want to disable RC6 or other features here, we just want
- * to make sure any work we've queued has finished and won't bother
- * us while we're suspended.
- */
-void intel_suspend_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	if (INTEL_GEN(dev_priv) < 6)
-		return;
-
-	/* gen6_rps_idle() will be called later to disable interrupts */
-}
-
-void intel_sanitize_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	dev_priv->gt_pm.rps.enabled = true; /* force RPS disabling */
-	dev_priv->gt_pm.rc6.enabled = true; /* force RC6 disabling */
-	intel_disable_gt_powersave(dev_priv);
-
-	if (INTEL_GEN(dev_priv) >= 11)
-		gen11_reset_rps_interrupts(dev_priv);
-	else
-		gen6_reset_rps_interrupts(dev_priv);
-}
-
-static inline void intel_disable_llc_pstate(struct drm_i915_private *i915)
-{
-	lockdep_assert_held(&i915->gt_pm.rps.lock);
-
-	if (!i915->gt_pm.llc_pstate.enabled)
-		return;
-
-	/* Currently there is no HW configuration to be done to disable. */
-
-	i915->gt_pm.llc_pstate.enabled = false;
-}
-
-static void intel_disable_rc6(struct drm_i915_private *dev_priv)
-{
-	lockdep_assert_held(&dev_priv->gt_pm.rps.lock);
-
-	if (!dev_priv->gt_pm.rc6.enabled)
-		return;
-
-	if (INTEL_GEN(dev_priv) >= 9)
-		gen9_disable_rc6(dev_priv);
-	else if (IS_CHERRYVIEW(dev_priv))
-		cherryview_disable_rc6(dev_priv);
-	else if (IS_VALLEYVIEW(dev_priv))
-		valleyview_disable_rc6(dev_priv);
-	else if (INTEL_GEN(dev_priv) >= 6)
-		gen6_disable_rc6(dev_priv);
-
-	dev_priv->gt_pm.rc6.enabled = false;
-}
-
-static void intel_disable_rps(struct drm_i915_private *dev_priv)
-{
-	lockdep_assert_held(&dev_priv->gt_pm.rps.lock);
-
-	if (!dev_priv->gt_pm.rps.enabled)
-		return;
-
-	if (INTEL_GEN(dev_priv) >= 9)
-		gen9_disable_rps(dev_priv);
-	else if (IS_CHERRYVIEW(dev_priv))
-		cherryview_disable_rps(dev_priv);
-	else if (IS_VALLEYVIEW(dev_priv))
-		valleyview_disable_rps(dev_priv);
-	else if (INTEL_GEN(dev_priv) >= 6)
-		gen6_disable_rps(dev_priv);
-	else if (IS_IRONLAKE_M(dev_priv))
-		ironlake_disable_drps(dev_priv);
-
-	dev_priv->gt_pm.rps.enabled = false;
-}
-
-void intel_disable_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	mutex_lock(&dev_priv->gt_pm.rps.lock);
-
-	intel_disable_rc6(dev_priv);
-	intel_disable_rps(dev_priv);
-	if (HAS_LLC(dev_priv))
-		intel_disable_llc_pstate(dev_priv);
-
-	mutex_unlock(&dev_priv->gt_pm.rps.lock);
-}
-
-static inline void intel_enable_llc_pstate(struct drm_i915_private *i915)
-{
-	lockdep_assert_held(&i915->gt_pm.rps.lock);
-
-	if (i915->gt_pm.llc_pstate.enabled)
-		return;
-
-	gen6_update_ring_freq(i915);
-
-	i915->gt_pm.llc_pstate.enabled = true;
-}
-
-static void intel_enable_rc6(struct drm_i915_private *dev_priv)
-{
-	lockdep_assert_held(&dev_priv->gt_pm.rps.lock);
-
-	if (dev_priv->gt_pm.rc6.enabled)
-		return;
-
-	if (IS_CHERRYVIEW(dev_priv))
-		cherryview_enable_rc6(dev_priv);
-	else if (IS_VALLEYVIEW(dev_priv))
-		valleyview_enable_rc6(dev_priv);
-	else if (INTEL_GEN(dev_priv) >= 9)
-		gen9_enable_rc6(dev_priv);
-	else if (IS_BROADWELL(dev_priv))
-		gen8_enable_rc6(dev_priv);
-	else if (INTEL_GEN(dev_priv) >= 6)
-		gen6_enable_rc6(dev_priv);
-
-	dev_priv->gt_pm.rc6.enabled = true;
-}
-
-static void intel_enable_rps(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	lockdep_assert_held(&rps->lock);
-
-	if (rps->enabled)
-		return;
-
-	if (IS_CHERRYVIEW(dev_priv)) {
-		cherryview_enable_rps(dev_priv);
-	} else if (IS_VALLEYVIEW(dev_priv)) {
-		valleyview_enable_rps(dev_priv);
-	} else if (INTEL_GEN(dev_priv) >= 9) {
-		gen9_enable_rps(dev_priv);
-	} else if (IS_BROADWELL(dev_priv)) {
-		gen8_enable_rps(dev_priv);
-	} else if (INTEL_GEN(dev_priv) >= 6) {
-		gen6_enable_rps(dev_priv);
-	} else if (IS_IRONLAKE_M(dev_priv)) {
-		ironlake_enable_drps(dev_priv);
-		intel_init_emon(dev_priv);
-	}
-
-	WARN_ON(rps->max_freq < rps->min_freq);
-	WARN_ON(rps->idle_freq > rps->max_freq);
-
-	WARN_ON(rps->efficient_freq < rps->min_freq);
-	WARN_ON(rps->efficient_freq > rps->max_freq);
-
-	rps->enabled = true;
-}
-
-void intel_enable_gt_powersave(struct drm_i915_private *dev_priv)
-{
-	/* Powersaving is controlled by the host when inside a VM */
-	if (intel_vgpu_active(dev_priv))
-		return;
-
-	mutex_lock(&dev_priv->gt_pm.rps.lock);
-
-	if (HAS_RC6(dev_priv))
-		intel_enable_rc6(dev_priv);
-	intel_enable_rps(dev_priv);
-	if (HAS_LLC(dev_priv))
-		intel_enable_llc_pstate(dev_priv);
-
-	mutex_unlock(&dev_priv->gt_pm.rps.lock);
-}
-
-static void ibx_init_clock_gating(struct drm_i915_private *dev_priv)
-{
-	/*
-	 * On Ibex Peak and Cougar Point, we need to disable clock
-	 * gating for the panel power sequencer or it will fail to
-	 * start up when no ports are active.
-	 */
-	I915_WRITE(SOUTH_DSPCLK_GATE_D, PCH_DPLSUNIT_CLOCK_GATE_DISABLE);
-}
-
-static void g4x_disable_trickle_feed(struct drm_i915_private *dev_priv)
-{
-	enum pipe pipe;
-
-	for_each_pipe(dev_priv, pipe) {
-		I915_WRITE(DSPCNTR(pipe),
-			   I915_READ(DSPCNTR(pipe)) |
-			   DISPPLANE_TRICKLE_FEED_DISABLE);
-
-		I915_WRITE(DSPSURF(pipe), I915_READ(DSPSURF(pipe)));
-		POSTING_READ(DSPSURF(pipe));
-	}
-}
-
-static void ilk_init_clock_gating(struct drm_i915_private *dev_priv)
-{
-	uint32_t dspclk_gate = ILK_VRHUNIT_CLOCK_GATE_DISABLE;
-
-	/*
-	 * Required for FBC
-	 * WaFbcDisableDpfcClockGating:ilk
-	 */
-	dspclk_gate |= ILK_DPFCRUNIT_CLOCK_GATE_DISABLE |
-		   ILK_DPFCUNIT_CLOCK_GATE_DISABLE |
-		   ILK_DPFDUNIT_CLOCK_GATE_ENABLE;
-
-	I915_WRITE(PCH_3DCGDIS0,
-		   MARIUNIT_CLOCK_GATE_DISABLE |
-		   SVSMUNIT_CLOCK_GATE_DISABLE);
-	I915_WRITE(PCH_3DCGDIS1,
-		   VFMUNIT_CLOCK_GATE_DISABLE);
-
-	/*
-	 * According to the spec the following bits should be set in
-	 * order to enable memory self-refresh
-	 * The bit 22/21 of 0x42004
-	 * The bit 5 of 0x42020
-	 * The bit 15 of 0x45000
-	 */
-	I915_WRITE(ILK_DISPLAY_CHICKEN2,
-		   (I915_READ(ILK_DISPLAY_CHICKEN2) |
-		    ILK_DPARB_GATE | ILK_VSDPFD_FULL));
-	dspclk_gate |= ILK_DPARBUNIT_CLOCK_GATE_ENABLE;
-	I915_WRITE(DISP_ARB_CTL,
-		   (I915_READ(DISP_ARB_CTL) |
-		    DISP_FBC_WM_DIS));
-
-	/*
-	 * Based on the document from hardware guys the following bits
-	 * should be set unconditionally in order to enable FBC.
-	 * The bit 22 of 0x42000
-	 * The bit 22 of 0x42004
-	 * The bit 7,8,9 of 0x42020.
-	 */
-	if (IS_IRONLAKE_M(dev_priv)) {
-		/* WaFbcAsynchFlipDisableFbcQueue:ilk */
-		I915_WRITE(ILK_DISPLAY_CHICKEN1,
-			   I915_READ(ILK_DISPLAY_CHICKEN1) |
-			   ILK_FBCQ_DIS);
-		I915_WRITE(ILK_DISPLAY_CHICKEN2,
-			   I915_READ(ILK_DISPLAY_CHICKEN2) |
-			   ILK_DPARB_GATE);
-	}
-
-	I915_WRITE(ILK_DSPCLK_GATE_D, dspclk_gate);
-
-	I915_WRITE(ILK_DISPLAY_CHICKEN2,
-		   I915_READ(ILK_DISPLAY_CHICKEN2) |
-		   ILK_ELPIN_409_SELECT);
-	I915_WRITE(_3D_CHICKEN2,
-		   _3D_CHICKEN2_WM_READ_PIPELINED << 16 |
-		   _3D_CHICKEN2_WM_READ_PIPELINED);
-
-	/* WaDisableRenderCachePipelinedFlush:ilk */
-	I915_WRITE(CACHE_MODE_0,
-		   _MASKED_BIT_ENABLE(CM0_PIPELINED_RENDER_FLUSH_DISABLE));
-
-	/* WaDisable_RenderCache_OperationalFlush:ilk */
-	I915_WRITE(CACHE_MODE_0, _MASKED_BIT_DISABLE(RC_OP_FLUSH_ENABLE));
-
-	g4x_disable_trickle_feed(dev_priv);
-
-	ibx_init_clock_gating(dev_priv);
-}
-
-static void cpt_init_clock_gating(struct drm_i915_private *dev_priv)
-{
-	int pipe;
-	uint32_t val;
-
-	/*
-	 * On Ibex Peak and Cougar Point, we need to disable clock
-	 * gating for the panel power sequencer or it will fail to
-	 * start up when no ports are active.
-	 */
-	I915_WRITE(SOUTH_DSPCLK_GATE_D, PCH_DPLSUNIT_CLOCK_GATE_DISABLE |
-		   PCH_DPLUNIT_CLOCK_GATE_DISABLE |
-		   PCH_CPUNIT_CLOCK_GATE_DISABLE);
-	I915_WRITE(SOUTH_CHICKEN2, I915_READ(SOUTH_CHICKEN2) |
-		   DPLS_EDP_PPS_FIX_DIS);
-	/* The below fixes the weird display corruption, a few pixels shifted
-	 * downward, on (only) LVDS of some HP laptops with IVY.
-	 */
-	for_each_pipe(dev_priv, pipe) {
-		val = I915_READ(TRANS_CHICKEN2(pipe));
-		val |= TRANS_CHICKEN2_TIMING_OVERRIDE;
-		val &= ~TRANS_CHICKEN2_FDI_POLARITY_REVERSED;
-		if (dev_priv->vbt.fdi_rx_polarity_inverted)
-			val |= TRANS_CHICKEN2_FDI_POLARITY_REVERSED;
-		val &= ~TRANS_CHICKEN2_FRAME_START_DELAY_MASK;
-		val &= ~TRANS_CHICKEN2_DISABLE_DEEP_COLOR_COUNTER;
-		val &= ~TRANS_CHICKEN2_DISABLE_DEEP_COLOR_MODESWITCH;
-		I915_WRITE(TRANS_CHICKEN2(pipe), val);
-	}
-	/* WADP0ClockGatingDisable */
-	for_each_pipe(dev_priv, pipe) {
-		I915_WRITE(TRANS_CHICKEN1(pipe),
-			   TRANS_CHICKEN1_DP0UNIT_GC_DISABLE);
-	}
-}
-
-static void gen6_check_mch_setup(struct drm_i915_private *dev_priv)
-{
-	uint32_t tmp;
-
-	tmp = I915_READ(MCH_SSKPD);
-	if ((tmp & MCH_SSKPD_WM0_MASK) != MCH_SSKPD_WM0_VAL)
-		DRM_DEBUG_KMS("Wrong MCH_SSKPD value: 0x%08x This can cause underruns.\n",
-			      tmp);
-}
-
-static void gen6_init_clock_gating(struct drm_i915_private *dev_priv)
-{
-	uint32_t dspclk_gate = ILK_VRHUNIT_CLOCK_GATE_DISABLE;
-
-	I915_WRITE(ILK_DSPCLK_GATE_D, dspclk_gate);
-
-	I915_WRITE(ILK_DISPLAY_CHICKEN2,
-		   I915_READ(ILK_DISPLAY_CHICKEN2) |
-		   ILK_ELPIN_409_SELECT);
-
-	/* WaDisableHiZPlanesWhenMSAAEnabled:snb */
-	I915_WRITE(_3D_CHICKEN,
-		   _MASKED_BIT_ENABLE(_3D_CHICKEN_HIZ_PLANE_DISABLE_MSAA_4X_SNB));
-
-	/* WaDisable_RenderCache_OperationalFlush:snb */
-	I915_WRITE(CACHE_MODE_0, _MASKED_BIT_DISABLE(RC_OP_FLUSH_ENABLE));
-
-	/*
-	 * BSpec recoomends 8x4 when MSAA is used,
-	 * however in practice 16x4 seems fastest.
-	 *
-	 * Note that PS/WM thread counts depend on the WIZ hashing
-	 * disable bit, which we don't touch here, but it's good
-	 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
-	 */
-	I915_WRITE(GEN6_GT_MODE,
-		   _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4));
-
-	I915_WRITE(CACHE_MODE_0,
-		   _MASKED_BIT_DISABLE(CM0_STC_EVICT_DISABLE_LRA_SNB));
-
-	I915_WRITE(GEN6_UCGCTL1,
-		   I915_READ(GEN6_UCGCTL1) |
-		   GEN6_BLBUNIT_CLOCK_GATE_DISABLE |
-		   GEN6_CSUNIT_CLOCK_GATE_DISABLE);
-
-	/* According to the BSpec vol1g, bit 12 (RCPBUNIT) clock
-	 * gating disable must be set.  Failure to set it results in
-	 * flickering pixels due to Z write ordering failures after
-	 * some amount of runtime in the Mesa "fire" demo, and Unigine
-	 * Sanctuary and Tropics, and apparently anything else with
-	 * alpha test or pixel discard.
-	 *
-	 * According to the spec, bit 11 (RCCUNIT) must also be set,
-	 * but we didn't debug actual testcases to find it out.
-	 *
-	 * WaDisableRCCUnitClockGating:snb
-	 * WaDisableRCPBUnitClockGating:snb
-	 */
-	I915_WRITE(GEN6_UCGCTL2,
-		   GEN6_RCPBUNIT_CLOCK_GATE_DISABLE |
-		   GEN6_RCCUNIT_CLOCK_GATE_DISABLE);
-
-	/* WaStripsFansDisableFastClipPerformanceFix:snb */
-	I915_WRITE(_3D_CHICKEN3,
-		   _MASKED_BIT_ENABLE(_3D_CHICKEN3_SF_DISABLE_FASTCLIP_CULL));
-
-	/*
-	 * Bspec says:
-	 * "This bit must be set if 3DSTATE_CLIP clip mode is set to normal and
-	 * 3DSTATE_SF number of SF output attributes is more than 16."
-	 */
-	I915_WRITE(_3D_CHICKEN3,
-		   _MASKED_BIT_ENABLE(_3D_CHICKEN3_SF_DISABLE_PIPELINED_ATTR_FETCH));
-
-	/*
-	 * According to the spec the following bits should be
-	 * set in order to enable memory self-refresh and fbc:
-	 * The bit21 and bit22 of 0x42000
-	 * The bit21 and bit22 of 0x42004
-	 * The bit5 and bit7 of 0x42020
-	 * The bit14 of 0x70180
-	 * The bit14 of 0x71180
-	 *
-	 * WaFbcAsynchFlipDisableFbcQueue:snb
-	 */
-	I915_WRITE(ILK_DISPLAY_CHICKEN1,
-		   I915_READ(ILK_DISPLAY_CHICKEN1) |
-		   ILK_FBCQ_DIS | ILK_PABSTRETCH_DIS);
-	I915_WRITE(ILK_DISPLAY_CHICKEN2,
-		   I915_READ(ILK_DISPLAY_CHICKEN2) |
-		   ILK_DPARB_GATE | ILK_VSDPFD_FULL);
-	I915_WRITE(ILK_DSPCLK_GATE_D,
-		   I915_READ(ILK_DSPCLK_GATE_D) |
-		   ILK_DPARBUNIT_CLOCK_GATE_ENABLE  |
-		   ILK_DPFDUNIT_CLOCK_GATE_ENABLE);
-
-	g4x_disable_trickle_feed(dev_priv);
-
-	cpt_init_clock_gating(dev_priv);
-
-	gen6_check_mch_setup(dev_priv);
-}
-
-static void gen7_setup_fixed_func_scheduler(struct drm_i915_private *dev_priv)
-{
-	uint32_t reg = I915_READ(GEN7_FF_THREAD_MODE);
-
-	/*
-	 * WaVSThreadDispatchOverride:ivb,vlv
-	 *
-	 * This actually overrides the dispatch
-	 * mode for all thread types.
-	 */
-	reg &= ~GEN7_FF_SCHED_MASK;
-	reg |= GEN7_FF_TS_SCHED_HW;
-	reg |= GEN7_FF_VS_SCHED_HW;
-	reg |= GEN7_FF_DS_SCHED_HW;
-
-	I915_WRITE(GEN7_FF_THREAD_MODE, reg);
-}
-
-static void lpt_init_clock_gating(struct drm_i915_private *dev_priv)
-{
-	/*
-	 * TODO: this bit should only be enabled when really needed, then
-	 * disabled when not needed anymore in order to save power.
-	 */
-	if (HAS_PCH_LPT_LP(dev_priv))
-		I915_WRITE(SOUTH_DSPCLK_GATE_D,
-			   I915_READ(SOUTH_DSPCLK_GATE_D) |
-			   PCH_LP_PARTITION_LEVEL_DISABLE);
+	if (HAS_PCH_LPT_LP(dev_priv))
+		I915_WRITE(SOUTH_DSPCLK_GATE_D,
+			   I915_READ(SOUTH_DSPCLK_GATE_D) |
+			   PCH_LP_PARTITION_LEVEL_DISABLE);
 
 	/* WADPOClockGatingDisable:hsw */
 	I915_WRITE(TRANS_CHICKEN1(PIPE_A),
@@ -9339,74 +7036,8 @@ void intel_init_pm(struct drm_i915_private *dev_priv)
 	}
 }
 
-static int byt_gpu_freq(struct drm_i915_private *dev_priv, int val)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	/*
-	 * N = val - 0xb7
-	 * Slow = Fast = GPLL ref * N
-	 */
-	return DIV_ROUND_CLOSEST(rps->gpll_ref_freq * (val - 0xb7), 1000);
-}
-
-static int byt_freq_opcode(struct drm_i915_private *dev_priv, int val)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	return DIV_ROUND_CLOSEST(1000 * val, rps->gpll_ref_freq) + 0xb7;
-}
-
-static int chv_gpu_freq(struct drm_i915_private *dev_priv, int val)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	/*
-	 * N = val / 2
-	 * CU (slow) = CU2x (fast) / 2 = GPLL ref * N / 2
-	 */
-	return DIV_ROUND_CLOSEST(rps->gpll_ref_freq * val, 2 * 2 * 1000);
-}
-
-static int chv_freq_opcode(struct drm_i915_private *dev_priv, int val)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	/* CHV needs even values */
-	return DIV_ROUND_CLOSEST(2 * 1000 * val, rps->gpll_ref_freq) * 2;
-}
-
-int intel_gpu_freq(struct drm_i915_private *dev_priv, int val)
-{
-	if (INTEL_GEN(dev_priv) >= 9)
-		return DIV_ROUND_CLOSEST(val * GT_FREQUENCY_MULTIPLIER,
-					 GEN9_FREQ_SCALER);
-	else if (IS_CHERRYVIEW(dev_priv))
-		return chv_gpu_freq(dev_priv, val);
-	else if (IS_VALLEYVIEW(dev_priv))
-		return byt_gpu_freq(dev_priv, val);
-	else
-		return val * GT_FREQUENCY_MULTIPLIER;
-}
-
-int intel_freq_opcode(struct drm_i915_private *dev_priv, int val)
-{
-	if (INTEL_GEN(dev_priv) >= 9)
-		return DIV_ROUND_CLOSEST(val * GEN9_FREQ_SCALER,
-					 GT_FREQUENCY_MULTIPLIER);
-	else if (IS_CHERRYVIEW(dev_priv))
-		return chv_freq_opcode(dev_priv, val);
-	else if (IS_VALLEYVIEW(dev_priv))
-		return byt_freq_opcode(dev_priv, val);
-	else
-		return DIV_ROUND_CLOSEST(val, GT_FREQUENCY_MULTIPLIER);
-}
-
 void intel_pm_setup(struct drm_i915_private *dev_priv)
 {
-	mutex_init(&dev_priv->gt_pm.rps.lock);
-	atomic_set(&dev_priv->gt_pm.rps.num_waiters, 0);
-
 	dev_priv->runtime_pm.suspended = false;
 	atomic_set(&dev_priv->runtime_pm.wakeref_count, 0);
 }
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index d6e20f0f4c28..07aaffee3250 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -576,8 +576,6 @@ void intel_uncore_runtime_resume(struct drm_i915_private *dev_priv)
 
 void intel_uncore_sanitize(struct drm_i915_private *dev_priv)
 {
-	/* BIOS often leaves RC6 enabled, but disable it for hw init */
-	intel_sanitize_gt_powersave(dev_priv);
 }
 
 static void __intel_uncore_forcewake_get(struct drm_i915_private *dev_priv,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 41/71] drm/i915: Move rps worker to intel_gt_pm.c
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (38 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 40/71] drm/i915: Split GT powermanagement functions to intel_gt_pm.c Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 42/71] drm/i915: Move all the RPS irq handlers to intel_gt_pm Chris Wilson
                   ` (11 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

The RPS worker exists to do the bidding of the GT powermanagement, so
move it from i915_irq to intel_gt_pm.c where it can be hidden from the
rest of the world. The goal being that the RPS worker is the one true
way though which all RPS updates are coordinated.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h    |   1 -
 drivers/gpu/drm/i915/i915_irq.c    | 141 ---------------------
 drivers/gpu/drm/i915/i915_sysfs.c  |  38 ++----
 drivers/gpu/drm/i915/intel_gt_pm.c | 189 +++++++++++++++++++++++------
 drivers/gpu/drm/i915/intel_gt_pm.h |   1 -
 5 files changed, 163 insertions(+), 207 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 26e5b9ff91e7..9308e52d92bb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3353,7 +3353,6 @@ extern void i915_redisable_vga(struct drm_i915_private *dev_priv);
 extern void i915_redisable_vga_power_on(struct drm_i915_private *dev_priv);
 extern bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val);
 extern void intel_init_pch_refclk(struct drm_i915_private *dev_priv);
-extern int intel_set_rps(struct drm_i915_private *dev_priv, u8 val);
 extern bool intel_set_memory_cxsr(struct drm_i915_private *dev_priv,
 				  bool enable);
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 043dbca25b2f..f02f6cdf3374 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1193,145 +1193,6 @@ static void notify_ring(struct intel_engine_cs *engine)
 	trace_intel_engine_notify(engine, wait);
 }
 
-static void vlv_c0_read(struct drm_i915_private *dev_priv,
-			struct intel_rps_ei *ei)
-{
-	ei->ktime = ktime_get_raw();
-	ei->render_c0 = I915_READ(VLV_RENDER_C0_COUNT);
-	ei->media_c0 = I915_READ(VLV_MEDIA_C0_COUNT);
-}
-
-void gen6_rps_reset_ei(struct drm_i915_private *dev_priv)
-{
-	memset(&dev_priv->gt_pm.rps.ei, 0, sizeof(dev_priv->gt_pm.rps.ei));
-}
-
-static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	const struct intel_rps_ei *prev = &rps->ei;
-	struct intel_rps_ei now;
-	u32 events = 0;
-
-	if ((pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) == 0)
-		return 0;
-
-	vlv_c0_read(dev_priv, &now);
-
-	if (prev->ktime) {
-		u64 time, c0;
-		u32 render, media;
-
-		time = ktime_us_delta(now.ktime, prev->ktime);
-
-		time *= dev_priv->czclk_freq;
-
-		/* Workload can be split between render + media,
-		 * e.g. SwapBuffers being blitted in X after being rendered in
-		 * mesa. To account for this we need to combine both engines
-		 * into our activity counter.
-		 */
-		render = now.render_c0 - prev->render_c0;
-		media = now.media_c0 - prev->media_c0;
-		c0 = max(render, media);
-		c0 *= 1000 * 100 << 8; /* to usecs and scale to threshold% */
-
-		if (c0 > time * rps->up_threshold)
-			events = GEN6_PM_RP_UP_THRESHOLD;
-		else if (c0 < time * rps->down_threshold)
-			events = GEN6_PM_RP_DOWN_THRESHOLD;
-	}
-
-	rps->ei = now;
-	return events;
-}
-
-static void gen6_pm_rps_work(struct work_struct *work)
-{
-	struct drm_i915_private *dev_priv =
-		container_of(work, struct drm_i915_private, gt_pm.rps.work);
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	bool client_boost = false;
-	int new_delay, adj, min, max;
-	u32 pm_iir = 0;
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	if (rps->interrupts_enabled) {
-		pm_iir = fetch_and_zero(&rps->pm_iir);
-		client_boost = atomic_read(&rps->num_waiters);
-	}
-	spin_unlock_irq(&dev_priv->irq_lock);
-
-	/* Make sure we didn't queue anything we're not going to process. */
-	WARN_ON(pm_iir & ~dev_priv->pm_rps_events);
-	if ((pm_iir & dev_priv->pm_rps_events) == 0 && !client_boost)
-		goto out;
-
-	mutex_lock(&rps->lock);
-
-	pm_iir |= vlv_wa_c0_ei(dev_priv, pm_iir);
-
-	adj = rps->last_adj;
-	new_delay = rps->cur_freq;
-	min = rps->min_freq_softlimit;
-	max = rps->max_freq_softlimit;
-	if (client_boost)
-		max = rps->max_freq;
-	if (client_boost && new_delay < rps->boost_freq) {
-		new_delay = rps->boost_freq;
-		adj = 0;
-	} else if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) {
-		if (adj > 0)
-			adj *= 2;
-		else /* CHV needs even encode values */
-			adj = IS_CHERRYVIEW(dev_priv) ? 2 : 1;
-
-		if (new_delay >= rps->max_freq_softlimit)
-			adj = 0;
-	} else if (client_boost) {
-		adj = 0;
-	} else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) {
-		if (rps->cur_freq > rps->efficient_freq)
-			new_delay = rps->efficient_freq;
-		else if (rps->cur_freq > rps->min_freq_softlimit)
-			new_delay = rps->min_freq_softlimit;
-		adj = 0;
-	} else if (pm_iir & GEN6_PM_RP_DOWN_THRESHOLD) {
-		if (adj < 0)
-			adj *= 2;
-		else /* CHV needs even encode values */
-			adj = IS_CHERRYVIEW(dev_priv) ? -2 : -1;
-
-		if (new_delay <= rps->min_freq_softlimit)
-			adj = 0;
-	} else { /* unknown event */
-		adj = 0;
-	}
-
-	rps->last_adj = adj;
-
-	/* sysfs frequency interfaces may have snuck in while servicing the
-	 * interrupt
-	 */
-	new_delay += adj;
-	new_delay = clamp_t(int, new_delay, min, max);
-
-	if (intel_set_rps(dev_priv, new_delay)) {
-		DRM_DEBUG_DRIVER("Failed to set new GPU frequency\n");
-		rps->last_adj = 0;
-	}
-
-	mutex_unlock(&rps->lock);
-
-out:
-	/* Make sure not to corrupt PMIMR state used by ringbuffer on GEN6 */
-	spin_lock_irq(&dev_priv->irq_lock);
-	if (rps->interrupts_enabled)
-		gen6_unmask_pm_irq(dev_priv, dev_priv->pm_rps_events);
-	spin_unlock_irq(&dev_priv->irq_lock);
-}
-
-
 /**
  * ivybridge_parity_work - Workqueue called when a parity error interrupt
  * occurred.
@@ -4385,8 +4246,6 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 
 	intel_hpd_init_work(dev_priv);
 
-	INIT_WORK(&rps->work, gen6_pm_rps_work);
-
 	INIT_WORK(&dev_priv->l3_parity.error_work, ivybridge_parity_work);
 	for (i = 0; i < MAX_L3_SLICES; ++i)
 		dev_priv->l3_parity.remap_info[i] = NULL;
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index fde5f0139ca1..a72aab28399f 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -355,17 +355,16 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 {
 	struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u32 val;
 	ssize_t ret;
+	u32 val;
 
 	ret = kstrtou32(buf, 0, &val);
 	if (ret)
 		return ret;
 
-	intel_runtime_pm_get(dev_priv);
-	mutex_lock(&rps->lock);
-
 	val = intel_freq_opcode(dev_priv, val);
+
+	mutex_lock(&rps->lock);
 	if (val < rps->min_freq ||
 	    val > rps->max_freq ||
 	    val < rps->min_freq_softlimit) {
@@ -378,19 +377,11 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 			  intel_gpu_freq(dev_priv, val));
 
 	rps->max_freq_softlimit = val;
-
-	val = clamp_t(int, rps->cur_freq,
-		      rps->min_freq_softlimit,
-		      rps->max_freq_softlimit);
-
-	/* We still need *_set_rps to process the new max_delay and
-	 * update the interrupt limits and PMINTRMSK even though
-	 * frequency request may be unchanged. */
-	ret = intel_set_rps(dev_priv, val);
+	schedule_work(&rps->work);
 
 unlock:
 	mutex_unlock(&rps->lock);
-	intel_runtime_pm_put(dev_priv);
+	flush_work(&rps->work);
 
 	return ret ?: count;
 }
@@ -410,17 +401,16 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
 {
 	struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u32 val;
 	ssize_t ret;
+	u32 val;
 
 	ret = kstrtou32(buf, 0, &val);
 	if (ret)
 		return ret;
 
-	intel_runtime_pm_get(dev_priv);
-	mutex_lock(&rps->lock);
-
 	val = intel_freq_opcode(dev_priv, val);
+
+	mutex_lock(&rps->lock);
 	if (val < rps->min_freq ||
 	    val > rps->max_freq ||
 	    val > rps->max_freq_softlimit) {
@@ -429,19 +419,11 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
 	}
 
 	rps->min_freq_softlimit = val;
-
-	val = clamp_t(int, rps->cur_freq,
-		      rps->min_freq_softlimit,
-		      rps->max_freq_softlimit);
-
-	/* We still need *_set_rps to process the new min_delay and
-	 * update the interrupt limits and PMINTRMSK even though
-	 * frequency request may be unchanged. */
-	ret = intel_set_rps(dev_priv, val);
+	schedule_work(&rps->work);
 
 unlock:
 	mutex_unlock(&rps->lock);
-	intel_runtime_pm_put(dev_priv);
+	flush_work(&rps->work);
 
 	return ret ?: count;
 }
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index 733d346601ca..c51b40c791f8 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -329,13 +329,7 @@ static int gen6_set_rps(struct drm_i915_private *dev_priv, u8 val)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
-	/*
-	 * min/max delay may still have been modified so be sure to
-	 * write the limits value.
-	 */
 	if (val != rps->cur_freq) {
-		gen6_set_rps_thresholds(dev_priv, val);
-
 		if (INTEL_GEN(dev_priv) >= 9)
 			I915_WRITE(GEN6_RPNSWREQ,
 				   GEN9_FREQUENCY(val));
@@ -349,6 +343,8 @@ static int gen6_set_rps(struct drm_i915_private *dev_priv, u8 val)
 				   GEN6_AGGRESSIVE_TURBO);
 	}
 
+	gen6_set_rps_thresholds(dev_priv, val);
+
 	/*
 	 * Make sure we continue to get interrupts
 	 * until we hit the minimum or maximum frequencies.
@@ -370,18 +366,17 @@ static int valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
 		      "Odd GPU freq value\n"))
 		val &= ~1;
 
-	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
-
 	if (val != dev_priv->gt_pm.rps.cur_freq) {
 		vlv_punit_get(dev_priv);
 		err = vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
 		vlv_punit_put(dev_priv);
 		if (err)
 			return err;
-
-		gen6_set_rps_thresholds(dev_priv, val);
 	}
 
+	gen6_set_rps_thresholds(dev_priv, val);
+	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
+
 	dev_priv->gt_pm.rps.cur_freq = val;
 	trace_intel_gpu_freq_change(intel_gpu_freq(dev_priv, val));
 
@@ -426,6 +421,151 @@ static void vlv_set_rps_idle(struct drm_i915_private *i915)
 		DRM_ERROR("Failed to set RPS for idle\n");
 }
 
+static int intel_set_rps(struct drm_i915_private *i915, u8 val)
+{
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	int err;
+
+	lockdep_assert_held(&rps->lock);
+	GEM_BUG_ON(val > rps->max_freq);
+	GEM_BUG_ON(val < rps->min_freq);
+
+	if (!rps->enabled) {
+		rps->cur_freq = val;
+		return 0;
+	}
+
+	if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
+		err = valleyview_set_rps(i915, val);
+	else
+		err = gen6_set_rps(i915, val);
+
+	return err;
+}
+
+static void vlv_c0_read(struct drm_i915_private *dev_priv,
+			struct intel_rps_ei *ei)
+{
+	ei->ktime = ktime_get_raw();
+	ei->render_c0 = I915_READ(VLV_RENDER_C0_COUNT);
+	ei->media_c0 = I915_READ(VLV_MEDIA_C0_COUNT);
+}
+
+static u32 vlv_wa_c0_ei(struct drm_i915_private *dev_priv, u32 pm_iir)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	const struct intel_rps_ei *prev = &rps->ei;
+	struct intel_rps_ei now;
+	u32 events = 0;
+
+	if ((pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) == 0)
+		return 0;
+
+	vlv_c0_read(dev_priv, &now);
+
+	if (prev->ktime) {
+		u64 time, c0;
+		u32 render, media;
+
+		time = ktime_us_delta(now.ktime, prev->ktime);
+
+		time *= dev_priv->czclk_freq;
+
+		/* Workload can be split between render + media,
+		 * e.g. SwapBuffers being blitted in X after being rendered in
+		 * mesa. To account for this we need to combine both engines
+		 * into our activity counter.
+		 */
+		render = now.render_c0 - prev->render_c0;
+		media = now.media_c0 - prev->media_c0;
+		c0 = max(render, media);
+		c0 *= 1000 * 100 << 8; /* to usecs and scale to threshold% */
+
+		if (c0 > time * rps->up_threshold)
+			events = GEN6_PM_RP_UP_THRESHOLD;
+		else if (c0 < time * rps->down_threshold)
+			events = GEN6_PM_RP_DOWN_THRESHOLD;
+	}
+
+	rps->ei = now;
+	return events;
+}
+
+static void intel_rps_work(struct work_struct *work)
+{
+	struct drm_i915_private *i915 =
+		container_of(work, struct drm_i915_private, gt_pm.rps.work);
+	struct intel_rps *rps = &i915->gt_pm.rps;
+	int freq, adj, min, max;
+	bool client_boost;
+	u32 pm_iir;
+
+	pm_iir = xchg(&rps->pm_iir, 0) & ~i915->pm_rps_events;
+	pm_iir |= vlv_wa_c0_ei(i915, pm_iir);
+
+	client_boost = atomic_read(&rps->num_waiters);
+
+	mutex_lock(&rps->lock);
+
+	min = rps->min_freq_softlimit;
+	max = rps->max_freq_softlimit;
+	if (client_boost && max < rps->boost_freq)
+		max = rps->boost_freq;
+
+	GEM_BUG_ON(min < rps->min_freq);
+	GEM_BUG_ON(max > rps->max_freq);
+	GEM_BUG_ON(max < min);
+
+	adj = rps->last_adj;
+	freq = rps->cur_freq;
+	if (client_boost && freq < rps->boost_freq) {
+		freq = rps->boost_freq;
+		adj = 0;
+	} else if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) {
+		if (adj > 0)
+			adj *= 2;
+		else /* CHV needs even encode values */
+			adj = IS_CHERRYVIEW(i915) ? 2 : 1;
+
+		if (freq >= max)
+			adj = 0;
+	} else if (client_boost) {
+		adj = 0;
+	} else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) {
+		if (freq > max_t(int, rps->efficient_freq, min))
+			freq = max_t(int, rps->efficient_freq, min);
+		else if (freq > min_t(int, rps->efficient_freq, min))
+			freq = min_t(int, rps->efficient_freq, min);
+
+		adj = 0;
+	} else if (pm_iir & GEN6_PM_RP_DOWN_THRESHOLD) {
+		if (adj < 0)
+			adj *= 2;
+		else /* CHV needs even encode values */
+			adj = IS_CHERRYVIEW(i915) ? -2 : -1;
+
+		if (freq <= min)
+			adj = 0;
+	} else { /* unknown/external event */
+		adj = 0;
+	}
+
+	if (intel_set_rps(i915, clamp_t(int, freq + adj, min, max))) {
+		DRM_DEBUG_DRIVER("Failed to set new GPU frequency\n");
+		adj = 0;
+	}
+
+	mutex_unlock(&rps->lock);
+
+	if (pm_iir) {
+		spin_lock_irq(&i915->irq_lock);
+		if (rps->interrupts_enabled)
+			gen6_unmask_pm_irq(i915, i915->pm_rps_events);
+		spin_unlock_irq(&i915->irq_lock);
+		rps->last_adj = adj;
+	}
+}
+
 void gen6_rps_busy(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -434,19 +574,17 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv)
 	if (rps->enabled) {
 		u8 freq;
 
-		if (dev_priv->pm_rps_events & GEN6_PM_RP_UP_EI_EXPIRED)
-			gen6_rps_reset_ei(dev_priv);
 		I915_WRITE(GEN6_PMINTRMSK,
 			   gen6_rps_pm_mask(dev_priv, rps->cur_freq));
 
 		gen6_enable_rps_interrupts(dev_priv);
+		memset(&rps->ei, 0, sizeof(rps->ei));
 
 		/*
 		 * Use the user's desired frequency as a guide, but for better
 		 * performance, jump directly to RPe as our starting frequency.
 		 */
-		freq = max(rps->cur_freq,
-			   rps->efficient_freq);
+		freq = max(rps->cur_freq, rps->efficient_freq);
 
 		if (intel_set_rps(dev_priv,
 				  clamp(freq,
@@ -515,28 +653,6 @@ void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *client)
 	atomic_inc(client ? &client->boosts : &rps->boosts);
 }
 
-int intel_set_rps(struct drm_i915_private *i915, u8 val)
-{
-	struct intel_rps *rps = &i915->gt_pm.rps;
-	int err;
-
-	lockdep_assert_held(&rps->lock);
-	GEM_BUG_ON(val > rps->max_freq);
-	GEM_BUG_ON(val < rps->min_freq);
-
-	if (!rps->enabled) {
-		rps->cur_freq = val;
-		return 0;
-	}
-
-	if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
-		err = valleyview_set_rps(i915, val);
-	else
-		err = gen6_set_rps(i915, val);
-
-	return err;
-}
-
 static void gen9_disable_rc6(struct drm_i915_private *dev_priv)
 {
 	I915_WRITE(GEN6_RC_CONTROL, 0);
@@ -2124,6 +2240,7 @@ void intel_init_gt_powersave(struct drm_i915_private *i915)
 	struct intel_rps *rps = &i915->gt_pm.rps;
 
 	mutex_init(&rps->lock);
+	INIT_WORK(&rps->work, intel_rps_work);
 
 	/*
 	 * RPM depends on RC6 to save restore the GT HW context, so make RC6 a
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
index 20e937d6c7e0..5c52ca208df1 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -95,7 +95,6 @@ void intel_disable_gt_powersave(struct drm_i915_private *i915);
 void intel_suspend_gt_powersave(struct drm_i915_private *i915);
 
 void gen6_rps_busy(struct drm_i915_private *i915);
-void gen6_rps_reset_ei(struct drm_i915_private *i915);
 void gen6_rps_idle(struct drm_i915_private *i915);
 void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 42/71] drm/i915: Move all the RPS irq handlers to intel_gt_pm
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (39 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 41/71] drm/i915: Move rps worker " Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 43/71] drm/i915: Track HAS_RPS alongside HAS_RC6 in the device info Chris Wilson
                   ` (10 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Since all the RPS handling code is in intel_gt_pm, move the irq handlers
there as well so that it all contained within one file.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h         |   4 -
 drivers/gpu/drm/i915/i915_irq.c         | 315 +++---------------------
 drivers/gpu/drm/i915/intel_drv.h        |  10 +-
 drivers/gpu/drm/i915/intel_gt_pm.c      | 236 +++++++++++++++++-
 drivers/gpu/drm/i915/intel_gt_pm.h      |  11 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
 6 files changed, 277 insertions(+), 300 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 9308e52d92bb..996769c9617a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1580,10 +1580,6 @@ struct drm_i915_private {
 		u32 de_irq_mask[I915_MAX_PIPES];
 	};
 	u32 gt_irq_mask;
-	u32 pm_imr;
-	u32 pm_ier;
-	u32 pm_rps_events;
-	u32 pm_guc_events;
 	u32 pipestat_irq_mask[I915_MAX_PIPES];
 
 	struct i915_hotplug hotplug;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f02f6cdf3374..155de01ea756 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -204,7 +204,6 @@ static void gen2_assert_iir_is_zero(struct drm_i915_private *dev_priv,
 	POSTING_READ16(type##IMR); \
 } while (0)
 
-static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir);
 static void gen9_guc_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir);
 
 /* For display hotplug interrupt */
@@ -343,220 +342,6 @@ void gen5_disable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask)
 	ilk_update_gt_irq(dev_priv, mask, 0);
 }
 
-static i915_reg_t gen6_pm_iir(struct drm_i915_private *dev_priv)
-{
-	WARN_ON_ONCE(INTEL_GEN(dev_priv) >= 11);
-
-	return INTEL_GEN(dev_priv) >= 8 ? GEN8_GT_IIR(2) : GEN6_PMIIR;
-}
-
-static i915_reg_t gen6_pm_imr(struct drm_i915_private *dev_priv)
-{
-	if (INTEL_GEN(dev_priv) >= 11)
-		return GEN11_GPM_WGBOXPERF_INTR_MASK;
-	else if (INTEL_GEN(dev_priv) >= 8)
-		return GEN8_GT_IMR(2);
-	else
-		return GEN6_PMIMR;
-}
-
-static i915_reg_t gen6_pm_ier(struct drm_i915_private *dev_priv)
-{
-	if (INTEL_GEN(dev_priv) >= 11)
-		return GEN11_GPM_WGBOXPERF_INTR_ENABLE;
-	else if (INTEL_GEN(dev_priv) >= 8)
-		return GEN8_GT_IER(2);
-	else
-		return GEN6_PMIER;
-}
-
-/**
- * snb_update_pm_irq - update GEN6_PMIMR
- * @dev_priv: driver private
- * @interrupt_mask: mask of interrupt bits to update
- * @enabled_irq_mask: mask of interrupt bits to enable
- */
-static void snb_update_pm_irq(struct drm_i915_private *dev_priv,
-			      uint32_t interrupt_mask,
-			      uint32_t enabled_irq_mask)
-{
-	uint32_t new_val;
-
-	WARN_ON(enabled_irq_mask & ~interrupt_mask);
-
-	lockdep_assert_held(&dev_priv->irq_lock);
-
-	new_val = dev_priv->pm_imr;
-	new_val &= ~interrupt_mask;
-	new_val |= (~enabled_irq_mask & interrupt_mask);
-
-	if (new_val != dev_priv->pm_imr) {
-		dev_priv->pm_imr = new_val;
-		I915_WRITE(gen6_pm_imr(dev_priv), dev_priv->pm_imr);
-		POSTING_READ(gen6_pm_imr(dev_priv));
-	}
-}
-
-void gen6_unmask_pm_irq(struct drm_i915_private *dev_priv, u32 mask)
-{
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return;
-
-	snb_update_pm_irq(dev_priv, mask, mask);
-}
-
-static void __gen6_mask_pm_irq(struct drm_i915_private *dev_priv, u32 mask)
-{
-	snb_update_pm_irq(dev_priv, mask, 0);
-}
-
-void gen6_mask_pm_irq(struct drm_i915_private *dev_priv, u32 mask)
-{
-	if (WARN_ON(!intel_irqs_enabled(dev_priv)))
-		return;
-
-	__gen6_mask_pm_irq(dev_priv, mask);
-}
-
-static void gen6_reset_pm_iir(struct drm_i915_private *dev_priv, u32 reset_mask)
-{
-	i915_reg_t reg = gen6_pm_iir(dev_priv);
-
-	lockdep_assert_held(&dev_priv->irq_lock);
-
-	I915_WRITE(reg, reset_mask);
-	I915_WRITE(reg, reset_mask);
-	POSTING_READ(reg);
-}
-
-static void gen6_enable_pm_irq(struct drm_i915_private *dev_priv, u32 enable_mask)
-{
-	lockdep_assert_held(&dev_priv->irq_lock);
-
-	dev_priv->pm_ier |= enable_mask;
-	I915_WRITE(gen6_pm_ier(dev_priv), dev_priv->pm_ier);
-	gen6_unmask_pm_irq(dev_priv, enable_mask);
-	/* unmask_pm_irq provides an implicit barrier (POSTING_READ) */
-}
-
-static void gen6_disable_pm_irq(struct drm_i915_private *dev_priv, u32 disable_mask)
-{
-	lockdep_assert_held(&dev_priv->irq_lock);
-
-	dev_priv->pm_ier &= ~disable_mask;
-	__gen6_mask_pm_irq(dev_priv, disable_mask);
-	I915_WRITE(gen6_pm_ier(dev_priv), dev_priv->pm_ier);
-	/* though a barrier is missing here, but don't really need a one */
-}
-
-void gen11_reset_rps_interrupts(struct drm_i915_private *dev_priv)
-{
-	spin_lock_irq(&dev_priv->irq_lock);
-
-	while (gen11_reset_one_iir(dev_priv, 0, GEN11_GTPM))
-		;
-
-	dev_priv->gt_pm.rps.pm_iir = 0;
-
-	spin_unlock_irq(&dev_priv->irq_lock);
-}
-
-void gen6_reset_rps_interrupts(struct drm_i915_private *dev_priv)
-{
-	spin_lock_irq(&dev_priv->irq_lock);
-	gen6_reset_pm_iir(dev_priv, dev_priv->pm_rps_events);
-	dev_priv->gt_pm.rps.pm_iir = 0;
-	spin_unlock_irq(&dev_priv->irq_lock);
-}
-
-void gen6_enable_rps_interrupts(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	if (READ_ONCE(rps->interrupts_enabled))
-		return;
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	WARN_ON_ONCE(rps->pm_iir);
-
-	if (INTEL_GEN(dev_priv) >= 11)
-		WARN_ON_ONCE(gen11_reset_one_iir(dev_priv, 0, GEN11_GTPM));
-	else
-		WARN_ON_ONCE(I915_READ(gen6_pm_iir(dev_priv)) & dev_priv->pm_rps_events);
-
-	rps->interrupts_enabled = true;
-	gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events);
-
-	spin_unlock_irq(&dev_priv->irq_lock);
-}
-
-void gen6_disable_rps_interrupts(struct drm_i915_private *dev_priv)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	if (!READ_ONCE(rps->interrupts_enabled))
-		return;
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	rps->interrupts_enabled = false;
-
-	I915_WRITE(GEN6_PMINTRMSK, gen6_sanitize_rps_pm_mask(dev_priv, ~0u));
-
-	gen6_disable_pm_irq(dev_priv, dev_priv->pm_rps_events);
-
-	spin_unlock_irq(&dev_priv->irq_lock);
-	synchronize_irq(dev_priv->drm.irq);
-
-	/* Now that we will not be generating any more work, flush any
-	 * outstanding tasks. As we are called on the RPS idle path,
-	 * we will reset the GPU to minimum frequencies, so the current
-	 * state of the worker can be discarded.
-	 */
-	cancel_work_sync(&rps->work);
-	if (INTEL_GEN(dev_priv) >= 11)
-		gen11_reset_rps_interrupts(dev_priv);
-	else
-		gen6_reset_rps_interrupts(dev_priv);
-}
-
-void gen9_reset_guc_interrupts(struct drm_i915_private *dev_priv)
-{
-	assert_rpm_wakelock_held(dev_priv);
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	gen6_reset_pm_iir(dev_priv, dev_priv->pm_guc_events);
-	spin_unlock_irq(&dev_priv->irq_lock);
-}
-
-void gen9_enable_guc_interrupts(struct drm_i915_private *dev_priv)
-{
-	assert_rpm_wakelock_held(dev_priv);
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	if (!dev_priv->guc.interrupts_enabled) {
-		WARN_ON_ONCE(I915_READ(gen6_pm_iir(dev_priv)) &
-				       dev_priv->pm_guc_events);
-		dev_priv->guc.interrupts_enabled = true;
-		gen6_enable_pm_irq(dev_priv, dev_priv->pm_guc_events);
-	}
-	spin_unlock_irq(&dev_priv->irq_lock);
-}
-
-void gen9_disable_guc_interrupts(struct drm_i915_private *dev_priv)
-{
-	assert_rpm_wakelock_held(dev_priv);
-
-	spin_lock_irq(&dev_priv->irq_lock);
-	dev_priv->guc.interrupts_enabled = false;
-
-	gen6_disable_pm_irq(dev_priv, dev_priv->pm_guc_events);
-
-	spin_unlock_irq(&dev_priv->irq_lock);
-	synchronize_irq(dev_priv->drm.irq);
-
-	gen9_reset_guc_interrupts(dev_priv);
-}
-
 /**
  * bdw_update_port_irq - update DE port interrupt
  * @dev_priv: driver private
@@ -1370,11 +1155,11 @@ static void gen8_gt_irq_ack(struct drm_i915_private *i915,
 
 	if (master_ctl & (GEN8_GT_PM_IRQ | GEN8_GT_GUC_IRQ)) {
 		gt_iir[2] = raw_reg_read(regs, GEN8_GT_IIR(2));
-		if (likely(gt_iir[2] & (i915->pm_rps_events |
-					i915->pm_guc_events)))
+		if (likely(gt_iir[2] & (i915->gt_pm.rps.pm_events |
+					i915->gt_pm.rps.guc_events)))
 			raw_reg_write(regs, GEN8_GT_IIR(2),
-				      gt_iir[2] & (i915->pm_rps_events |
-						   i915->pm_guc_events));
+				      gt_iir[2] & (i915->gt_pm.rps.pm_events |
+						   i915->gt_pm.rps.guc_events));
 	}
 
 	if (master_ctl & GEN8_GT_VECS_IRQ) {
@@ -1407,7 +1192,7 @@ static void gen8_gt_irq_handler(struct drm_i915_private *i915,
 	}
 
 	if (master_ctl & (GEN8_GT_PM_IRQ | GEN8_GT_GUC_IRQ)) {
-		gen6_rps_irq_handler(i915, gt_iir[2]);
+		intel_gt_pm_irq_handler(i915, gt_iir[2]);
 		gen9_guc_irq_handler(i915, gt_iir[2]);
 	}
 }
@@ -1658,35 +1443,6 @@ static void i9xx_pipe_crc_irq_handler(struct drm_i915_private *dev_priv,
 				     res1, res2);
 }
 
-/* The RPS events need forcewake, so we add them to a work queue and mask their
- * IMR bits until the work is done. Other interrupts can be processed without
- * the work queue. */
-static void gen6_rps_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
-{
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	if (pm_iir & dev_priv->pm_rps_events) {
-		spin_lock(&dev_priv->irq_lock);
-		gen6_mask_pm_irq(dev_priv, pm_iir & dev_priv->pm_rps_events);
-		if (rps->interrupts_enabled) {
-			rps->pm_iir |= pm_iir & dev_priv->pm_rps_events;
-			schedule_work(&rps->work);
-		}
-		spin_unlock(&dev_priv->irq_lock);
-	}
-
-	if (INTEL_GEN(dev_priv) >= 8)
-		return;
-
-	if (HAS_VEBOX(dev_priv)) {
-		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
-			notify_ring(dev_priv->engine[VECS]);
-
-		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
-			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n", pm_iir);
-	}
-}
-
 static void gen9_guc_irq_handler(struct drm_i915_private *dev_priv, u32 gt_iir)
 {
 	if (gt_iir & GEN9_GUC_TO_HOST_INT_EVENT)
@@ -1894,6 +1650,19 @@ static void i9xx_hpd_irq_handler(struct drm_i915_private *dev_priv,
 	}
 }
 
+static void gen6_pm_extra_irq_handler(struct drm_i915_private *dev_priv,
+				      u32 pm_iir)
+{
+	if (HAS_VEBOX(dev_priv)) {
+		if (pm_iir & PM_VEBOX_USER_INTERRUPT)
+			notify_ring(dev_priv->engine[VECS]);
+
+		if (pm_iir & PM_VEBOX_CS_ERROR_INTERRUPT)
+			DRM_DEBUG("Command parser error, pm_iir 0x%08x\n",
+				  pm_iir);
+	}
+}
+
 static irqreturn_t valleyview_irq_handler(int irq, void *arg)
 {
 	struct drm_device *dev = arg;
@@ -1968,7 +1737,7 @@ static irqreturn_t valleyview_irq_handler(int irq, void *arg)
 		if (gt_iir)
 			snb_gt_irq_handler(dev_priv, gt_iir);
 		if (pm_iir)
-			gen6_rps_irq_handler(dev_priv, pm_iir);
+			intel_gt_pm_irq_handler(dev_priv, pm_iir);
 
 		if (hotplug_status)
 			i9xx_hpd_irq_handler(dev_priv, hotplug_status);
@@ -2420,7 +2189,8 @@ static irqreturn_t ironlake_irq_handler(int irq, void *arg)
 		if (pm_iir) {
 			I915_WRITE(GEN6_PMIIR, pm_iir);
 			ret = IRQ_HANDLED;
-			gen6_rps_irq_handler(dev_priv, pm_iir);
+			intel_gt_pm_irq_handler(dev_priv, pm_iir);
+			gen6_pm_extra_irq_handler(dev_priv, pm_iir);
 		}
 	}
 
@@ -2716,7 +2486,7 @@ gen11_other_irq_handler(struct drm_i915_private * const i915,
 			const u8 instance, const u16 iir)
 {
 	if (instance == OTHER_GTPM_INSTANCE)
-		return gen6_rps_irq_handler(i915, iir);
+		return intel_gt_pm_irq_handler(i915, iir);
 
 	WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n",
 		  instance, iir);
@@ -3626,11 +3396,11 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev)
 		 */
 		if (HAS_VEBOX(dev_priv)) {
 			pm_irqs |= PM_VEBOX_USER_INTERRUPT;
-			dev_priv->pm_ier |= PM_VEBOX_USER_INTERRUPT;
+			dev_priv->gt_pm.ier |= PM_VEBOX_USER_INTERRUPT;
 		}
 
-		dev_priv->pm_imr = 0xffffffff;
-		GEN3_IRQ_INIT(GEN6_PM, dev_priv->pm_imr, pm_irqs);
+		dev_priv->gt_pm.imr = 0xffffffff;
+		GEN3_IRQ_INIT(GEN6_PM, dev_priv->gt_pm.imr, pm_irqs);
 	}
 }
 
@@ -3752,15 +3522,15 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv)
 	if (HAS_L3_DPF(dev_priv))
 		gt_interrupts[0] |= GT_RENDER_L3_PARITY_ERROR_INTERRUPT;
 
-	dev_priv->pm_ier = 0x0;
-	dev_priv->pm_imr = ~dev_priv->pm_ier;
+	dev_priv->gt_pm.ier = 0x0;
+	dev_priv->gt_pm.imr = ~dev_priv->gt_pm.ier;
 	GEN8_IRQ_INIT_NDX(GT, 0, ~gt_interrupts[0], gt_interrupts[0]);
 	GEN8_IRQ_INIT_NDX(GT, 1, ~gt_interrupts[1], gt_interrupts[1]);
 	/*
 	 * RPS interrupts will get enabled/disabled on demand when RPS itself
 	 * is enabled/disabled. Same wil be the case for GuC interrupts.
 	 */
-	GEN8_IRQ_INIT_NDX(GT, 2, dev_priv->pm_imr, dev_priv->pm_ier);
+	GEN8_IRQ_INIT_NDX(GT, 2, dev_priv->gt_pm.imr, dev_priv->gt_pm.ier);
 	GEN8_IRQ_INIT_NDX(GT, 3, ~gt_interrupts[3], gt_interrupts[3]);
 }
 
@@ -3857,8 +3627,8 @@ static void gen11_gt_irq_postinstall(struct drm_i915_private *dev_priv)
 	 * RPS interrupts will get enabled/disabled on demand when RPS itself
 	 * is enabled/disabled.
 	 */
-	dev_priv->pm_ier = 0x0;
-	dev_priv->pm_imr = ~dev_priv->pm_ier;
+	dev_priv->gt_pm.ier = 0x0;
+	dev_priv->gt_pm.imr = ~dev_priv->gt_pm.ier;
 	I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0);
 	I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_MASK,  ~0);
 }
@@ -4241,7 +4011,6 @@ static irqreturn_t i965_irq_handler(int irq, void *arg)
 void intel_irq_init(struct drm_i915_private *dev_priv)
 {
 	struct drm_device *dev = &dev_priv->drm;
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 	int i;
 
 	intel_hpd_init_work(dev_priv);
@@ -4250,30 +4019,6 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
 	for (i = 0; i < MAX_L3_SLICES; ++i)
 		dev_priv->l3_parity.remap_info[i] = NULL;
 
-	if (HAS_GUC_SCHED(dev_priv))
-		dev_priv->pm_guc_events = GEN9_GUC_TO_HOST_INT_EVENT;
-
-	/* Let's track the enabled rps events */
-	if (IS_VALLEYVIEW(dev_priv))
-		/* WaGsvRC0ResidencyMethod:vlv */
-		dev_priv->pm_rps_events = GEN6_PM_RP_UP_EI_EXPIRED;
-	else
-		dev_priv->pm_rps_events = GEN6_PM_RPS_EVENTS;
-
-	rps->pm_intrmsk_mbz = 0;
-
-	/*
-	 * SNB,IVB,HSW can while VLV,CHV may hard hang on looping batchbuffer
-	 * if GEN6_PM_UP_EI_EXPIRED is masked.
-	 *
-	 * TODO: verify if this can be reproduced on VLV,CHV.
-	 */
-	if (INTEL_GEN(dev_priv) <= 7)
-		rps->pm_intrmsk_mbz |= GEN6_PM_RP_UP_EI_EXPIRED;
-
-	if (INTEL_GEN(dev_priv) >= 8)
-		rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
-
 	if (IS_GEN2(dev_priv)) {
 		/* Gen2 doesn't have a hardware frame counter */
 		dev->max_vblank_count = 0;
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index d5c680094979..df4a5cfdb6d6 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -1342,12 +1342,10 @@ bool gen11_reset_one_iir(struct drm_i915_private * const i915,
 			 const unsigned int bit);
 void gen5_enable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask);
 void gen5_disable_gt_irq(struct drm_i915_private *dev_priv, uint32_t mask);
-void gen6_mask_pm_irq(struct drm_i915_private *dev_priv, u32 mask);
-void gen6_unmask_pm_irq(struct drm_i915_private *dev_priv, u32 mask);
-void gen11_reset_rps_interrupts(struct drm_i915_private *dev_priv);
-void gen6_reset_rps_interrupts(struct drm_i915_private *dev_priv);
-void gen6_enable_rps_interrupts(struct drm_i915_private *dev_priv);
-void gen6_disable_rps_interrupts(struct drm_i915_private *dev_priv);
+
+bool gen11_reset_one_iir(struct drm_i915_private * const i915,
+			 const unsigned int bank,
+			 const unsigned int bit);
 
 static inline u32 gen6_sanitize_rps_pm_mask(const struct drm_i915_private *i915,
 					    u32 mask)
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index c51b40c791f8..79cb4dbafbea 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -315,7 +315,7 @@ static u32 gen6_rps_pm_mask(struct drm_i915_private *i915, u8 val)
 		mask |= (GEN6_PM_RP_UP_EI_EXPIRED |
 			 GEN6_PM_RP_UP_THRESHOLD);
 
-	mask &= i915->pm_rps_events;
+	mask &= rps->pm_events;
 
 	return gen6_sanitize_rps_pm_mask(i915, ~mask);
 }
@@ -443,6 +443,145 @@ static int intel_set_rps(struct drm_i915_private *i915, u8 val)
 	return err;
 }
 
+static i915_reg_t gen6_pm_iir(struct drm_i915_private *dev_priv)
+{
+	return INTEL_GEN(dev_priv) >= 8 ? GEN8_GT_IIR(2) : GEN6_PMIIR;
+}
+
+static i915_reg_t gen6_pm_ier(struct drm_i915_private *dev_priv)
+{
+	return INTEL_GEN(dev_priv) >= 8 ? GEN8_GT_IER(2) : GEN6_PMIER;
+}
+
+static i915_reg_t gen6_pm_imr(struct drm_i915_private *dev_priv)
+{
+	return INTEL_GEN(dev_priv) >= 8 ? GEN8_GT_IMR(2) : GEN6_PMIMR;
+}
+
+static void gen6_update_pm_irq(struct drm_i915_private *dev_priv,
+			       u32 interrupt_mask,
+			       u32 enabled_irq_mask)
+{
+	u32 new_val;
+
+	lockdep_assert_held(&dev_priv->irq_lock);
+	GEM_BUG_ON(enabled_irq_mask & ~interrupt_mask);
+
+	new_val = dev_priv->gt_pm.imr;
+	new_val &= ~interrupt_mask;
+	new_val |= ~enabled_irq_mask & interrupt_mask;
+
+	if (new_val != dev_priv->gt_pm.imr) {
+		dev_priv->gt_pm.imr = new_val;
+		I915_WRITE(gen6_pm_imr(dev_priv), dev_priv->gt_pm.imr);
+	}
+}
+
+static void gen6_reset_pm_iir(struct drm_i915_private *dev_priv,
+			      u32 reset_mask)
+{
+	i915_reg_t reg = gen6_pm_iir(dev_priv);
+
+	lockdep_assert_held(&dev_priv->irq_lock);
+
+	I915_WRITE(reg, reset_mask);
+	I915_WRITE(reg, reset_mask);
+	POSTING_READ(reg);
+}
+
+static void gen6_enable_pm_irq(struct drm_i915_private *dev_priv,
+			       u32 enable_mask)
+{
+	lockdep_assert_held(&dev_priv->irq_lock);
+
+	dev_priv->gt_pm.ier |= enable_mask;
+	I915_WRITE(gen6_pm_ier(dev_priv), dev_priv->gt_pm.ier);
+	gen6_unmask_pm_irq(dev_priv, enable_mask);
+	/* unmask_pm_irq provides an implicit barrier (POSTING_READ) */
+}
+
+static void gen6_disable_pm_irq(struct drm_i915_private *dev_priv,
+				u32 disable_mask)
+{
+	lockdep_assert_held(&dev_priv->irq_lock);
+
+	dev_priv->gt_pm.ier &= ~disable_mask;
+	gen6_update_pm_irq(dev_priv, disable_mask, 0);
+	I915_WRITE(gen6_pm_ier(dev_priv), dev_priv->gt_pm.ier);
+	/* though a barrier is missing here, but don't really need a one */
+}
+
+static void gen6_reset_rps_interrupts(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	gen6_reset_pm_iir(dev_priv, rps->pm_events);
+	rps->pm_iir = 0;
+	spin_unlock_irq(&dev_priv->irq_lock);
+}
+
+static void gen11_reset_rps_interrupts(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	spin_lock_irq(&dev_priv->irq_lock);
+
+	while (gen11_reset_one_iir(dev_priv, 0, GEN11_GTPM))
+		cpu_relax();
+	rps->pm_iir = 0;
+
+	spin_unlock_irq(&dev_priv->irq_lock);
+}
+
+static void enable_rps_interrupts(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	if (READ_ONCE(rps->interrupts_enabled))
+		return;
+
+	if (WARN_ON_ONCE(IS_GEN11(dev_priv)))
+		return;
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	WARN_ON_ONCE(rps->pm_iir);
+	WARN_ON_ONCE(I915_READ(gen6_pm_iir(dev_priv)) & rps->pm_events);
+	rps->interrupts_enabled = true;
+	gen6_enable_pm_irq(dev_priv, rps->pm_events);
+
+	spin_unlock_irq(&dev_priv->irq_lock);
+}
+
+static void disable_rps_interrupts(struct drm_i915_private *dev_priv)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	if (!READ_ONCE(rps->interrupts_enabled))
+		return;
+
+	if (WARN_ON_ONCE(IS_GEN11(dev_priv)))
+		return;
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	rps->interrupts_enabled = false;
+
+	I915_WRITE(GEN6_PMINTRMSK, gen6_sanitize_rps_pm_mask(dev_priv, ~0u));
+
+	gen6_disable_pm_irq(dev_priv, rps->pm_events);
+
+	spin_unlock_irq(&dev_priv->irq_lock);
+	synchronize_irq(dev_priv->drm.irq);
+
+	/* Now that we will not be generating any more work, flush any
+	 * outstanding tasks. As we are called on the RPS idle path,
+	 * we will reset the GPU to minimum frequencies, so the current
+	 * state of the worker can be discarded.
+	 */
+	cancel_work_sync(&rps->work);
+	gen6_reset_rps_interrupts(dev_priv);
+}
+
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
 			struct intel_rps_ei *ei)
 {
@@ -500,7 +639,7 @@ static void intel_rps_work(struct work_struct *work)
 	bool client_boost;
 	u32 pm_iir;
 
-	pm_iir = xchg(&rps->pm_iir, 0) & ~i915->pm_rps_events;
+	pm_iir = xchg(&rps->pm_iir, 0) & ~rps->pm_events;
 	pm_iir |= vlv_wa_c0_ei(i915, pm_iir);
 
 	client_boost = atomic_read(&rps->num_waiters);
@@ -560,12 +699,27 @@ static void intel_rps_work(struct work_struct *work)
 	if (pm_iir) {
 		spin_lock_irq(&i915->irq_lock);
 		if (rps->interrupts_enabled)
-			gen6_unmask_pm_irq(i915, i915->pm_rps_events);
+			gen6_unmask_pm_irq(i915, rps->pm_events);
 		spin_unlock_irq(&i915->irq_lock);
 		rps->last_adj = adj;
 	}
 }
 
+void intel_gt_pm_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
+{
+	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+
+	if (pm_iir & rps->pm_events) {
+		spin_lock(&dev_priv->irq_lock);
+		gen6_mask_pm_irq(dev_priv, pm_iir & rps->pm_events);
+		if (rps->interrupts_enabled) {
+			rps->pm_iir |= pm_iir & rps->pm_events;
+			schedule_work(&rps->work);
+		}
+		spin_unlock(&dev_priv->irq_lock);
+	}
+}
+
 void gen6_rps_busy(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
@@ -577,7 +731,7 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv)
 		I915_WRITE(GEN6_PMINTRMSK,
 			   gen6_rps_pm_mask(dev_priv, rps->cur_freq));
 
-		gen6_enable_rps_interrupts(dev_priv);
+		enable_rps_interrupts(dev_priv);
 		memset(&rps->ei, 0, sizeof(rps->ei));
 
 		/*
@@ -605,7 +759,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 	 * our rpm wakeref. And then disable the interrupts to stop any
 	 * futher RPS reclocking whilst we are asleep.
 	 */
-	gen6_disable_rps_interrupts(dev_priv);
+	disable_rps_interrupts(dev_priv);
 
 	mutex_lock(&rps->lock);
 	if (rps->enabled) {
@@ -2242,6 +2396,30 @@ void intel_init_gt_powersave(struct drm_i915_private *i915)
 	mutex_init(&rps->lock);
 	INIT_WORK(&rps->work, intel_rps_work);
 
+	if (HAS_GUC_SCHED(i915))
+		rps->guc_events = GEN9_GUC_TO_HOST_INT_EVENT;
+
+	/* Let's track the enabled rps events */
+	if (IS_VALLEYVIEW(i915))
+		/* WaGsvRC0ResidencyMethod:vlv */
+		rps->pm_events = GEN6_PM_RP_UP_EI_EXPIRED;
+	else
+		rps->pm_events = GEN6_PM_RPS_EVENTS;
+
+	rps->pm_intrmsk_mbz = 0;
+
+	/*
+	 * SNB,IVB,HSW can while VLV,CHV may hard hang on looping batchbuffer
+	 * if GEN6_PM_UP_EI_EXPIRED is masked.
+	 *
+	 * TODO: verify if this can be reproduced on VLV,CHV.
+	 */
+	if (INTEL_GEN(i915) <= 7)
+		rps->pm_intrmsk_mbz |= GEN6_PM_RP_UP_EI_EXPIRED;
+
+	if (INTEL_GEN(i915) >= 8)
+		rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
+
 	/*
 	 * RPM depends on RC6 to save restore the GT HW context, so make RC6 a
 	 * requirement.
@@ -2539,3 +2717,51 @@ int intel_freq_opcode(const struct drm_i915_private *i915, int val)
 	else
 		return DIV_ROUND_CLOSEST(val, GT_FREQUENCY_MULTIPLIER);
 }
+
+void gen6_unmask_pm_irq(struct drm_i915_private *i915, u32 mask)
+{
+	gen6_update_pm_irq(i915, mask, mask);
+}
+
+void gen6_mask_pm_irq(struct drm_i915_private *i915, u32 mask)
+{
+	gen6_update_pm_irq(i915, mask, 0);
+}
+
+void gen9_reset_guc_interrupts(struct drm_i915_private *i915)
+{
+	assert_rpm_wakelock_held(i915);
+
+	spin_lock_irq(&i915->irq_lock);
+	gen6_reset_pm_iir(i915, i915->gt_pm.rps.guc_events);
+	spin_unlock_irq(&i915->irq_lock);
+}
+
+void gen9_enable_guc_interrupts(struct drm_i915_private *dev_priv)
+{
+	assert_rpm_wakelock_held(dev_priv);
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	if (!dev_priv->guc.interrupts_enabled) {
+		WARN_ON_ONCE(I915_READ(gen6_pm_iir(dev_priv)) &
+				       dev_priv->gt_pm.rps.guc_events);
+		dev_priv->guc.interrupts_enabled = true;
+		gen6_enable_pm_irq(dev_priv, dev_priv->gt_pm.rps.guc_events);
+	}
+	spin_unlock_irq(&dev_priv->irq_lock);
+}
+
+void gen9_disable_guc_interrupts(struct drm_i915_private *dev_priv)
+{
+	assert_rpm_wakelock_held(dev_priv);
+
+	spin_lock_irq(&dev_priv->irq_lock);
+	dev_priv->guc.interrupts_enabled = false;
+
+	gen6_disable_pm_irq(dev_priv, dev_priv->gt_pm.rps.guc_events);
+
+	spin_unlock_irq(&dev_priv->irq_lock);
+	synchronize_irq(dev_priv->drm.irq);
+
+	gen9_reset_guc_interrupts(dev_priv);
+}
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
index 5c52ca208df1..7a14a2e14b30 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -31,6 +31,9 @@ struct intel_rps {
 	/* PM interrupt bits that should never be masked */
 	u32 pm_intrmsk_mbz;
 
+	u32 pm_events;
+	u32 guc_events;
+
 	/*
 	 * Frequencies are stored in potentially platform dependent multiples.
 	 * In other words, *_freq needs to be multiplied by X to be interesting.
@@ -82,6 +85,9 @@ struct intel_gt_pm {
 	struct intel_rps rps;
 	struct intel_rc6 rc6;
 	struct intel_llc_pstate llc_pstate;
+
+	u32 imr;
+	u32 ier;
 };
 
 void intel_gpu_ips_init(struct drm_i915_private *i915);
@@ -94,6 +100,8 @@ void intel_enable_gt_powersave(struct drm_i915_private *i915);
 void intel_disable_gt_powersave(struct drm_i915_private *i915);
 void intel_suspend_gt_powersave(struct drm_i915_private *i915);
 
+void intel_gt_pm_irq_handler(struct drm_i915_private *i915, u32 pm_iir);
+
 void gen6_rps_busy(struct drm_i915_private *i915);
 void gen6_rps_idle(struct drm_i915_private *i915);
 void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
@@ -101,4 +109,7 @@ void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
 int intel_gpu_freq(const struct drm_i915_private *i915, int val);
 int intel_freq_opcode(const struct drm_i915_private *i915, int val);
 
+void gen6_unmask_pm_irq(struct drm_i915_private *i915, u32 mask);
+void gen6_mask_pm_irq(struct drm_i915_private *i915, u32 mask);
+
 #endif /* __INTEL_GT_PM_H__ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d17dbaacec80..3396f6bc147b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -36,6 +36,7 @@
 #include "i915_gem_render_state.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include "intel_gt_pm.h"
 #include "intel_workarounds.h"
 
 /* Rough estimate of the typical request size, performing a flush,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 43/71] drm/i915: Track HAS_RPS alongside HAS_RC6 in the device info
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (40 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 42/71] drm/i915: Move all the RPS irq handlers to intel_gt_pm Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 44/71] drm/i915: Remove defunct intel_suspend_gt_powersave() Chris Wilson
                   ` (9 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

For consistency (and elegance!), add intel_device_info.has_rps.
The immediate boon is that RPS support is now emitted along the other
capabilities in the debug log and after errors.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          |  2 ++
 drivers/gpu/drm/i915/i915_pci.c          |  6 ++++++
 drivers/gpu/drm/i915/intel_device_info.h |  1 +
 drivers/gpu/drm/i915/intel_gt_pm.c       | 20 ++++++++++++++++----
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 996769c9617a..21f56684028f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2503,6 +2503,8 @@ intel_info(const struct drm_i915_private *dev_priv)
 #define HAS_RC6p(dev_priv)		 ((dev_priv)->info.has_rc6p)
 #define HAS_RC6pp(dev_priv)		 (false) /* HW was never validated */
 
+#define HAS_RPS(dev_priv)	(INTEL_INFO(dev_priv)->has_rps)
+
 #define HAS_CSR(dev_priv)	((dev_priv)->info.has_csr)
 
 #define HAS_RUNTIME_PM(dev_priv) ((dev_priv)->info.has_runtime_pm)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 4364922e935d..b0070e914228 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -235,6 +235,7 @@ static const struct intel_device_info intel_ironlake_m_info = {
 	GEN5_FEATURES,
 	PLATFORM(INTEL_IRONLAKE),
 	.is_mobile = 1, .has_fbc = 1,
+	.has_rps = true,
 };
 
 #define GEN6_FEATURES \
@@ -246,6 +247,7 @@ static const struct intel_device_info intel_ironlake_m_info = {
 	.has_llc = 1, \
 	.has_rc6 = 1, \
 	.has_rc6p = 1, \
+	.has_rps = true, \
 	.has_aliasing_ppgtt = 1, \
 	GEN_DEFAULT_PIPEOFFSETS, \
 	GEN_DEFAULT_PAGE_SIZES, \
@@ -290,6 +292,7 @@ static const struct intel_device_info intel_sandybridge_m_gt2_info = {
 	.has_llc = 1, \
 	.has_rc6 = 1, \
 	.has_rc6p = 1, \
+	.has_rps = true, \
 	.has_aliasing_ppgtt = 1, \
 	.has_full_ppgtt = 1, \
 	GEN_DEFAULT_PIPEOFFSETS, \
@@ -343,6 +346,7 @@ static const struct intel_device_info intel_valleyview_info = {
 	.has_psr = 1,
 	.has_runtime_pm = 1,
 	.has_rc6 = 1,
+	.has_rps = true,
 	.has_gmch_display = 1,
 	.has_hotplug = 1,
 	.has_aliasing_ppgtt = 1,
@@ -437,6 +441,7 @@ static const struct intel_device_info intel_cherryview_info = {
 	.has_runtime_pm = 1,
 	.has_resource_streamer = 1,
 	.has_rc6 = 1,
+	.has_rps = true,
 	.has_logical_ring_contexts = 1,
 	.has_gmch_display = 1,
 	.has_aliasing_ppgtt = 1,
@@ -510,6 +515,7 @@ static const struct intel_device_info intel_skylake_gt4_info = {
 	.has_csr = 1, \
 	.has_resource_streamer = 1, \
 	.has_rc6 = 1, \
+	.has_rps = true, \
 	.has_dp_mst = 1, \
 	.has_logical_ring_contexts = 1, \
 	.has_logical_ring_preemption = 1, \
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 3f9881c548ef..2e01bc6eb5a2 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -103,6 +103,7 @@ enum intel_platform {
 	func(has_psr); \
 	func(has_rc6); \
 	func(has_rc6p); \
+	func(has_rps); \
 	func(has_resource_streamer); \
 	func(has_runtime_pm); \
 	func(has_snoop); \
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index 79cb4dbafbea..f096c5cb84ba 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -724,6 +724,9 @@ void gen6_rps_busy(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
+	if (!HAS_RPS(dev_priv))
+		return;
+
 	mutex_lock(&rps->lock);
 	if (rps->enabled) {
 		u8 freq;
@@ -753,6 +756,9 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
+	if (!HAS_RPS(dev_priv))
+		return;
+
 	/*
 	 * Flush our bottom-half so that it does not race with us
 	 * setting the idle frequency and so that it is bounded by
@@ -780,6 +786,9 @@ void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *client)
 	unsigned long flags;
 	bool boost;
 
+	if (!HAS_RPS(rq->i915))
+		return;
+
 	/*
 	 * This is intentionally racy! We peek at the state here, then
 	 * validate inside the RPS worker.
@@ -922,8 +931,10 @@ static bool sanitize_rc6(struct drm_i915_private *i915)
 	struct intel_device_info *info = mkwrite_device_info(i915);
 
 	/* Powersaving is controlled by the host when inside a VM */
-	if (intel_vgpu_active(i915))
+	if (intel_vgpu_active(i915)) {
 		info->has_rc6 = 0;
+		info->has_rps = 0;
+	}
 
 	if (info->has_rc6 &&
 	    IS_GEN9_LP(i915) && !bxt_check_bios_rc6_setup(i915)) {
@@ -2554,7 +2565,7 @@ static void intel_disable_rps(struct drm_i915_private *i915)
 		valleyview_disable_rps(i915);
 	else if (INTEL_GEN(i915) >= 6)
 		gen6_disable_rps(i915);
-	else if (IS_IRONLAKE_M(i915))
+	else if (INTEL_GEN(i915) >= 5)
 		ironlake_disable_drps(i915);
 
 	i915->gt_pm.rps.enabled = false;
@@ -2624,7 +2635,7 @@ static void intel_enable_rps(struct drm_i915_private *i915)
 		gen8_enable_rps(i915);
 	} else if (INTEL_GEN(i915) >= 6) {
 		gen6_enable_rps(i915);
-	} else if (IS_IRONLAKE_M(i915)) {
+	} else if (INTEL_GEN(i915) >= 5) {
 		ironlake_enable_drps(i915);
 		intel_init_emon(i915);
 	}
@@ -2648,7 +2659,8 @@ void intel_enable_gt_powersave(struct drm_i915_private *i915)
 
 	if (HAS_RC6(i915))
 		intel_enable_rc6(i915);
-	intel_enable_rps(i915);
+	if (HAS_RPS(i915))
+		intel_enable_rps(i915);
 	if (HAS_LLC(i915))
 		intel_enable_llc_pstate(i915);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 44/71] drm/i915: Remove defunct intel_suspend_gt_powersave()
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (41 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 43/71] drm/i915: Track HAS_RPS alongside HAS_RC6 in the device info Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 45/71] drm/i915: Reorder GT interface code Chris Wilson
                   ` (8 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Since commit b7137e0cf1e5 ("drm/i915: Defer enabling rc6 til after we
submit the first batch/context"), intel_suspend_gt_powersave() has been
a no-op. As we still do not need to do anything explicitly on suspend
(we do everything required on idling), remove the defunct function.

References: b7137e0cf1e5 ("drm/i915: Defer enabling rc6 til after we submit the first batch/context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c    |  1 -
 drivers/gpu/drm/i915/intel_gt_pm.c | 16 ----------------
 drivers/gpu/drm/i915/intel_gt_pm.h |  1 -
 3 files changed, 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7393034fb806..3d6e749787a8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4939,7 +4939,6 @@ int i915_gem_suspend(struct drm_i915_private *dev_priv)
 	int ret;
 
 	intel_runtime_pm_get(dev_priv);
-	intel_suspend_gt_powersave(dev_priv);
 
 	mutex_lock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index f096c5cb84ba..f8b75814796d 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -2491,22 +2491,6 @@ void intel_cleanup_gt_powersave(struct drm_i915_private *i915)
 		intel_runtime_pm_put(i915);
 }
 
-/**
- * intel_suspend_gt_powersave - suspend PM work and helper threads
- * @i915: i915 device
- *
- * We don't want to disable RC6 or other features here, we just want
- * to make sure any work we've queued has finished and won't bother
- * us while we're suspended.
- */
-void intel_suspend_gt_powersave(struct drm_i915_private *i915)
-{
-	if (INTEL_GEN(i915) < 6)
-		return;
-
-	/* gen6_rps_idle() will be called later to disable interrupts */
-}
-
 void intel_sanitize_gt_powersave(struct drm_i915_private *i915)
 {
 	i915->gt_pm.rps.enabled = true; /* force RPS disabling */
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
index 7a14a2e14b30..afb7a5858dff 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -98,7 +98,6 @@ void intel_cleanup_gt_powersave(struct drm_i915_private *i915);
 void intel_sanitize_gt_powersave(struct drm_i915_private *i915);
 void intel_enable_gt_powersave(struct drm_i915_private *i915);
 void intel_disable_gt_powersave(struct drm_i915_private *i915);
-void intel_suspend_gt_powersave(struct drm_i915_private *i915);
 
 void intel_gt_pm_irq_handler(struct drm_i915_private *i915, u32 pm_iir);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 45/71] drm/i915: Reorder GT interface code
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (42 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 44/71] drm/i915: Remove defunct intel_suspend_gt_powersave() Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 46/71] drm/i915: Split control of rps and rc6 Chris Wilson
                   ` (7 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Try to order the intel_gt_pm code to match the order it is used:
 	init
	enable
	disable
	cleanup

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/intel_gt_pm.c | 170 ++++++++++++++---------------
 drivers/gpu/drm/i915/intel_gt_pm.h |   5 +-
 2 files changed, 88 insertions(+), 87 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index f8b75814796d..b69ddb5be3e4 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -2400,6 +2400,18 @@ static void intel_init_emon(struct drm_i915_private *dev_priv)
 	dev_priv->ips.corr = (lcfuse & LCFUSE_HIV_MASK);
 }
 
+void intel_sanitize_gt_powersave(struct drm_i915_private *i915)
+{
+	i915->gt_pm.rps.enabled = true; /* force RPS disabling */
+	i915->gt_pm.rc6.enabled = true; /* force RC6 disabling */
+	intel_disable_gt_powersave(i915);
+
+	if (INTEL_GEN(i915) >= 11)
+		gen11_reset_rps_interrupts(i915);
+	else
+		gen6_reset_rps_interrupts(i915);
+}
+
 void intel_init_gt_powersave(struct drm_i915_private *i915)
 {
 	struct intel_rps *rps = &i915->gt_pm.rps;
@@ -2482,91 +2494,6 @@ void intel_init_gt_powersave(struct drm_i915_private *i915)
 	mutex_unlock(&rps->lock);
 }
 
-void intel_cleanup_gt_powersave(struct drm_i915_private *i915)
-{
-	if (IS_VALLEYVIEW(i915))
-		valleyview_cleanup_gt_powersave(i915);
-
-	if (!HAS_RC6(i915))
-		intel_runtime_pm_put(i915);
-}
-
-void intel_sanitize_gt_powersave(struct drm_i915_private *i915)
-{
-	i915->gt_pm.rps.enabled = true; /* force RPS disabling */
-	i915->gt_pm.rc6.enabled = true; /* force RC6 disabling */
-	intel_disable_gt_powersave(i915);
-
-	if (INTEL_GEN(i915) >= 11)
-		gen11_reset_rps_interrupts(i915);
-	else
-		gen6_reset_rps_interrupts(i915);
-}
-
-static inline void intel_disable_llc_pstate(struct drm_i915_private *i915)
-{
-	lockdep_assert_held(&i915->gt_pm.rps.lock);
-
-	if (!i915->gt_pm.llc_pstate.enabled)
-		return;
-
-	/* Currently there is no HW configuration to be done to disable. */
-
-	i915->gt_pm.llc_pstate.enabled = false;
-}
-
-static void intel_disable_rc6(struct drm_i915_private *i915)
-{
-	lockdep_assert_held(&i915->gt_pm.rps.lock);
-
-	if (!i915->gt_pm.rc6.enabled)
-		return;
-
-	if (INTEL_GEN(i915) >= 9)
-		gen9_disable_rc6(i915);
-	else if (IS_CHERRYVIEW(i915))
-		cherryview_disable_rc6(i915);
-	else if (IS_VALLEYVIEW(i915))
-		valleyview_disable_rc6(i915);
-	else if (INTEL_GEN(i915) >= 6)
-		gen6_disable_rc6(i915);
-
-	i915->gt_pm.rc6.enabled = false;
-}
-
-static void intel_disable_rps(struct drm_i915_private *i915)
-{
-	lockdep_assert_held(&i915->gt_pm.rps.lock);
-
-	if (!i915->gt_pm.rps.enabled)
-		return;
-
-	if (INTEL_GEN(i915) >= 9)
-		gen9_disable_rps(i915);
-	else if (IS_CHERRYVIEW(i915))
-		cherryview_disable_rps(i915);
-	else if (IS_VALLEYVIEW(i915))
-		valleyview_disable_rps(i915);
-	else if (INTEL_GEN(i915) >= 6)
-		gen6_disable_rps(i915);
-	else if (INTEL_GEN(i915) >= 5)
-		ironlake_disable_drps(i915);
-
-	i915->gt_pm.rps.enabled = false;
-}
-
-void intel_disable_gt_powersave(struct drm_i915_private *i915)
-{
-	mutex_lock(&i915->gt_pm.rps.lock);
-
-	intel_disable_rc6(i915);
-	intel_disable_rps(i915);
-	if (HAS_LLC(i915))
-		intel_disable_llc_pstate(i915);
-
-	mutex_unlock(&i915->gt_pm.rps.lock);
-}
-
 static inline void intel_enable_llc_pstate(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
@@ -2651,6 +2578,79 @@ void intel_enable_gt_powersave(struct drm_i915_private *i915)
 	mutex_unlock(&i915->gt_pm.rps.lock);
 }
 
+static inline void intel_disable_llc_pstate(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
+
+	if (!i915->gt_pm.llc_pstate.enabled)
+		return;
+
+	/* Currently there is no HW configuration to be done to disable. */
+
+	i915->gt_pm.llc_pstate.enabled = false;
+}
+
+static void intel_disable_rc6(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
+
+	if (!i915->gt_pm.rc6.enabled)
+		return;
+
+	if (INTEL_GEN(i915) >= 9)
+		gen9_disable_rc6(i915);
+	else if (IS_CHERRYVIEW(i915))
+		cherryview_disable_rc6(i915);
+	else if (IS_VALLEYVIEW(i915))
+		valleyview_disable_rc6(i915);
+	else if (INTEL_GEN(i915) >= 6)
+		gen6_disable_rc6(i915);
+
+	i915->gt_pm.rc6.enabled = false;
+}
+
+static void intel_disable_rps(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
+
+	if (!i915->gt_pm.rps.enabled)
+		return;
+
+	if (INTEL_GEN(i915) >= 9)
+		gen9_disable_rps(i915);
+	else if (IS_CHERRYVIEW(i915))
+		cherryview_disable_rps(i915);
+	else if (IS_VALLEYVIEW(i915))
+		valleyview_disable_rps(i915);
+	else if (INTEL_GEN(i915) >= 6)
+		gen6_disable_rps(i915);
+	else if (INTEL_GEN(i915) >= 5)
+		ironlake_disable_drps(i915);
+
+	i915->gt_pm.rps.enabled = false;
+}
+
+void intel_disable_gt_powersave(struct drm_i915_private *i915)
+{
+	mutex_lock(&i915->gt_pm.rps.lock);
+
+	intel_disable_rc6(i915);
+	intel_disable_rps(i915);
+	if (HAS_LLC(i915))
+		intel_disable_llc_pstate(i915);
+
+	mutex_unlock(&i915->gt_pm.rps.lock);
+}
+
+void intel_cleanup_gt_powersave(struct drm_i915_private *i915)
+{
+	if (IS_VALLEYVIEW(i915))
+		valleyview_cleanup_gt_powersave(i915);
+
+	if (!HAS_RC6(i915))
+		intel_runtime_pm_put(i915);
+}
+
 static int byt_gpu_freq(const struct drm_i915_private *i915, int val)
 {
 	const struct intel_rps *rps = &i915->gt_pm.rps;
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
index afb7a5858dff..bd400c9aed7c 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -93,11 +93,12 @@ struct intel_gt_pm {
 void intel_gpu_ips_init(struct drm_i915_private *i915);
 void intel_gpu_ips_teardown(void);
 
-void intel_init_gt_powersave(struct drm_i915_private *i915);
-void intel_cleanup_gt_powersave(struct drm_i915_private *i915);
 void intel_sanitize_gt_powersave(struct drm_i915_private *i915);
+
+void intel_init_gt_powersave(struct drm_i915_private *i915);
 void intel_enable_gt_powersave(struct drm_i915_private *i915);
 void intel_disable_gt_powersave(struct drm_i915_private *i915);
+void intel_cleanup_gt_powersave(struct drm_i915_private *i915);
 
 void intel_gt_pm_irq_handler(struct drm_i915_private *i915, u32 pm_iir);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 46/71] drm/i915: Split control of rps and rc6
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (43 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 45/71] drm/i915: Reorder GT interface code Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 47/71] drm/i915: Enabling rc6 and rps have different requirements, so separate them Chris Wilson
                   ` (6 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

Allow ourselves to individually toggle rps or rc6. This will be used
later when we want to enable rps/rc6 at different phases during the
device bring up.

Whilst here, convert the intel_$verb_gt_powersave over to
intel_gt_pm_$verb scheme.

v2: Resurrect llc_pstate, we will need to restore state on resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c      |   6 +-
 drivers/gpu/drm/i915/i915_gem.c      |  23 +++---
 drivers/gpu/drm/i915/intel_display.c |   6 +-
 drivers/gpu/drm/i915/intel_gt_pm.c   | 104 ++++++++++++++++-----------
 drivers/gpu/drm/i915/intel_gt_pm.h   |  17 +++--
 5 files changed, 95 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index fef245f01a32..3ed2a85ccac0 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1067,7 +1067,7 @@ static int i915_driver_init_mmio(struct drm_i915_private *dev_priv)
  */
 static void i915_driver_cleanup_mmio(struct drm_i915_private *dev_priv)
 {
-	intel_sanitize_gt_powersave(dev_priv);
+	intel_gt_pm_sanitize(dev_priv);
 	intel_uncore_fini(dev_priv);
 	i915_mmio_cleanup(dev_priv);
 	pci_dev_put(dev_priv->bridge_dev);
@@ -1176,7 +1176,7 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv)
 	intel_uncore_sanitize(dev_priv);
 
 	/* BIOS often leaves RC6 enabled, but disable it for hw init */
-	intel_sanitize_gt_powersave(dev_priv);
+	intel_gt_pm_sanitize(dev_priv);
 
 	intel_opregion_setup(dev_priv);
 
@@ -1716,7 +1716,7 @@ static int i915_drm_resume(struct drm_device *dev)
 	int ret;
 
 	disable_rpm_wakeref_asserts(dev_priv);
-	intel_sanitize_gt_powersave(dev_priv);
+	intel_gt_pm_sanitize(dev_priv);
 
 	ret = i915_ggtt_enable_hw(dev_priv);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3d6e749787a8..1b47eeed7820 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -221,7 +221,10 @@ void i915_gem_unpark(struct drm_i915_private *i915)
 	if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
 		i915->gt.epoch = 1;
 
-	intel_enable_gt_powersave(i915);
+	intel_gt_pm_enable_rps(i915);
+	intel_gt_pm_enable_rc6(i915);
+	intel_gt_pm_enable_llc(i915);
+
 	i915_update_gfx_val(i915);
 	if (INTEL_GEN(i915) >= 6)
 		gen6_rps_busy(i915);
@@ -5358,10 +5361,12 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		goto err_unlock;
 	}
 
+	intel_gt_pm_init(dev_priv);
+
 	ret = i915_gem_contexts_init(dev_priv);
 	if (ret) {
 		GEM_BUG_ON(ret == -EIO);
-		goto err_ggtt;
+		goto err_pm;
 	}
 
 	ret = intel_engines_init(dev_priv);
@@ -5370,11 +5375,9 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 		goto err_context;
 	}
 
-	intel_init_gt_powersave(dev_priv);
-
 	ret = intel_uc_init(dev_priv);
 	if (ret)
-		goto err_pm;
+		goto err_engines;
 
 	ret = i915_gem_init_hw(dev_priv);
 	if (ret)
@@ -5422,15 +5425,15 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 	intel_uc_fini_hw(dev_priv);
 err_uc_init:
 	intel_uc_fini(dev_priv);
-err_pm:
-	if (ret != -EIO) {
-		intel_cleanup_gt_powersave(dev_priv);
+err_engines:
+	if (ret != -EIO)
 		i915_gem_cleanup_engines(dev_priv);
-	}
 err_context:
 	if (ret != -EIO)
 		i915_gem_contexts_fini(dev_priv);
-err_ggtt:
+err_pm:
+	if (ret != -EIO)
+		intel_gt_pm_fini(dev_priv);
 err_unlock:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	mutex_unlock(&dev_priv->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 444d09539f70..464a3c787fbd 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15549,7 +15549,9 @@ void intel_modeset_cleanup(struct drm_device *dev)
 	flush_work(&dev_priv->atomic_helper.free_work);
 	WARN_ON(!llist_empty(&dev_priv->atomic_helper.free_list));
 
-	intel_disable_gt_powersave(dev_priv);
+	intel_gt_pm_disable_llc(dev_priv);
+	intel_gt_pm_disable_rc6(dev_priv);
+	intel_gt_pm_disable_rps(dev_priv);
 
 	/*
 	 * Interrupts and polling as the first thing to avoid creating havoc.
@@ -15578,7 +15580,7 @@ void intel_modeset_cleanup(struct drm_device *dev)
 
 	intel_cleanup_overlay(dev_priv);
 
-	intel_cleanup_gt_powersave(dev_priv);
+	intel_gt_pm_fini(dev_priv);
 
 	intel_teardown_gmbus(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index b69ddb5be3e4..53b7a669bf83 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -2400,11 +2400,15 @@ static void intel_init_emon(struct drm_i915_private *dev_priv)
 	dev_priv->ips.corr = (lcfuse & LCFUSE_HIV_MASK);
 }
 
-void intel_sanitize_gt_powersave(struct drm_i915_private *i915)
+void intel_gt_pm_sanitize(struct drm_i915_private *i915)
 {
-	i915->gt_pm.rps.enabled = true; /* force RPS disabling */
+	intel_gt_pm_disable_llc(i915);
+
 	i915->gt_pm.rc6.enabled = true; /* force RC6 disabling */
-	intel_disable_gt_powersave(i915);
+	intel_gt_pm_disable_rc6(i915);
+
+	i915->gt_pm.rps.enabled = true; /* force RPS disabling */
+	intel_gt_pm_disable_rps(i915);
 
 	if (INTEL_GEN(i915) >= 11)
 		gen11_reset_rps_interrupts(i915);
@@ -2412,7 +2416,7 @@ void intel_sanitize_gt_powersave(struct drm_i915_private *i915)
 		gen6_reset_rps_interrupts(i915);
 }
 
-void intel_init_gt_powersave(struct drm_i915_private *i915)
+void intel_gt_pm_init(struct drm_i915_private *i915)
 {
 	struct intel_rps *rps = &i915->gt_pm.rps;
 
@@ -2494,19 +2498,7 @@ void intel_init_gt_powersave(struct drm_i915_private *i915)
 	mutex_unlock(&rps->lock);
 }
 
-static inline void intel_enable_llc_pstate(struct drm_i915_private *i915)
-{
-	lockdep_assert_held(&i915->gt_pm.rps.lock);
-
-	if (i915->gt_pm.llc_pstate.enabled)
-		return;
-
-	gen6_update_ring_freq(i915);
-
-	i915->gt_pm.llc_pstate.enabled = true;
-}
-
-static void intel_enable_rc6(struct drm_i915_private *i915)
+static void __enable_rc6(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
@@ -2527,7 +2519,17 @@ static void intel_enable_rc6(struct drm_i915_private *i915)
 	i915->gt_pm.rc6.enabled = true;
 }
 
-static void intel_enable_rps(struct drm_i915_private *i915)
+void intel_gt_pm_enable_rc6(struct drm_i915_private *i915)
+{
+	if (!HAS_RC6(i915))
+		return;
+
+	mutex_lock(&i915->gt_pm.rps.lock);
+	__enable_rc6(i915);
+	mutex_unlock(&i915->gt_pm.rps.lock);
+}
+
+static void __enable_rps(struct drm_i915_private *i915)
 {
 	struct intel_rps *rps = &i915->gt_pm.rps;
 
@@ -2560,37 +2562,38 @@ static void intel_enable_rps(struct drm_i915_private *i915)
 	rps->enabled = true;
 }
 
-void intel_enable_gt_powersave(struct drm_i915_private *i915)
+void intel_gt_pm_enable_rps(struct drm_i915_private *i915)
 {
-	/* Powersaving is controlled by the host when inside a VM */
-	if (intel_vgpu_active(i915))
+	if (!HAS_RPS(i915))
 		return;
 
 	mutex_lock(&i915->gt_pm.rps.lock);
-
-	if (HAS_RC6(i915))
-		intel_enable_rc6(i915);
-	if (HAS_RPS(i915))
-		intel_enable_rps(i915);
-	if (HAS_LLC(i915))
-		intel_enable_llc_pstate(i915);
-
+	__enable_rps(i915);
 	mutex_unlock(&i915->gt_pm.rps.lock);
 }
 
-static inline void intel_disable_llc_pstate(struct drm_i915_private *i915)
+static void __enable_llc(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
-	if (!i915->gt_pm.llc_pstate.enabled)
+	if (i915->gt_pm.llc_pstate.enabled)
 		return;
 
-	/* Currently there is no HW configuration to be done to disable. */
+	gen6_update_ring_freq(i915);
+	i915->gt_pm.llc_pstate.enabled = true;
+}
 
-	i915->gt_pm.llc_pstate.enabled = false;
+void intel_gt_pm_enable_llc(struct drm_i915_private *i915)
+{
+	if (!HAS_LLC(i915))
+		return;
+
+	mutex_lock(&i915->gt_pm.rps.lock);
+	__enable_llc(i915);
+	mutex_unlock(&i915->gt_pm.rps.lock);
 }
 
-static void intel_disable_rc6(struct drm_i915_private *i915)
+static void __disable_rc6(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
@@ -2609,7 +2612,14 @@ static void intel_disable_rc6(struct drm_i915_private *i915)
 	i915->gt_pm.rc6.enabled = false;
 }
 
-static void intel_disable_rps(struct drm_i915_private *i915)
+void intel_gt_pm_disable_rc6(struct drm_i915_private *i915)
+{
+	mutex_lock(&i915->gt_pm.rps.lock);
+	__disable_rc6(i915);
+	mutex_unlock(&i915->gt_pm.rps.lock);
+}
+
+static void __disable_rps(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
@@ -2630,19 +2640,31 @@ static void intel_disable_rps(struct drm_i915_private *i915)
 	i915->gt_pm.rps.enabled = false;
 }
 
-void intel_disable_gt_powersave(struct drm_i915_private *i915)
+void intel_gt_pm_disable_rps(struct drm_i915_private *i915)
 {
 	mutex_lock(&i915->gt_pm.rps.lock);
+	__disable_rps(i915);
+	mutex_unlock(&i915->gt_pm.rps.lock);
+}
 
-	intel_disable_rc6(i915);
-	intel_disable_rps(i915);
-	if (HAS_LLC(i915))
-		intel_disable_llc_pstate(i915);
+static void __disable_llc(struct drm_i915_private *i915)
+{
+	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
+	if (!i915->gt_pm.llc_pstate.enabled)
+		return;
+
+	i915->gt_pm.llc_pstate.enabled = false;
+}
+
+void intel_gt_pm_disable_llc(struct drm_i915_private *i915)
+{
+	mutex_lock(&i915->gt_pm.rps.lock);
+	__disable_llc(i915);
 	mutex_unlock(&i915->gt_pm.rps.lock);
 }
 
-void intel_cleanup_gt_powersave(struct drm_i915_private *i915)
+void intel_gt_pm_fini(struct drm_i915_private *i915)
 {
 	if (IS_VALLEYVIEW(i915))
 		valleyview_cleanup_gt_powersave(i915);
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
index bd400c9aed7c..2e93bc6238c1 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -93,12 +93,19 @@ struct intel_gt_pm {
 void intel_gpu_ips_init(struct drm_i915_private *i915);
 void intel_gpu_ips_teardown(void);
 
-void intel_sanitize_gt_powersave(struct drm_i915_private *i915);
+void intel_gt_pm_sanitize(struct drm_i915_private *i915);
 
-void intel_init_gt_powersave(struct drm_i915_private *i915);
-void intel_enable_gt_powersave(struct drm_i915_private *i915);
-void intel_disable_gt_powersave(struct drm_i915_private *i915);
-void intel_cleanup_gt_powersave(struct drm_i915_private *i915);
+void intel_gt_pm_init(struct drm_i915_private *i915);
+void intel_gt_pm_fini(struct drm_i915_private *i915);
+
+void intel_gt_pm_enable_rps(struct drm_i915_private *i915);
+void intel_gt_pm_disable_rps(struct drm_i915_private *i915);
+
+void intel_gt_pm_enable_rc6(struct drm_i915_private *i915);
+void intel_gt_pm_disable_rc6(struct drm_i915_private *i915);
+
+void intel_gt_pm_enable_llc(struct drm_i915_private *i915);
+void intel_gt_pm_disable_llc(struct drm_i915_private *i915);
 
 void intel_gt_pm_irq_handler(struct drm_i915_private *i915, u32 pm_iir);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 47/71] drm/i915: Enabling rc6 and rps have different requirements, so separate them
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (44 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 46/71] drm/i915: Split control of rps and rc6 Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 48/71] drm/i915: Simplify rc6/rps enabling Chris Wilson
                   ` (5 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

On Ironlake, we are required to not enable rc6 until the GPU is loaded
with a valid context; after that point it can start to use a powersaving
context for rc6. This seems a reasonable requirement to impose on all
generations as we are already priming the system by loading a context on
resume. We can simply then delay enabling rc6 until we know the GPU is
awake.

v2: Reorder intel_gt_pm_fini in i915_gem_fini to match setup ordering,
and remove the superfluous intel_gt_pm_sanitize() on mmio cleanup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c      |  2 +-
 drivers/gpu/drm/i915/i915_gem.c      | 40 +++++++++++++++++++++-------
 drivers/gpu/drm/i915/intel_display.c |  6 -----
 drivers/gpu/drm/i915/intel_gt_pm.c   |  2 ++
 4 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 3ed2a85ccac0..74b99cf65adb 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -639,6 +639,7 @@ static void i915_gem_fini(struct drm_i915_private *dev_priv)
 	intel_uc_fini(dev_priv);
 	i915_gem_cleanup_engines(dev_priv);
 	i915_gem_contexts_fini(dev_priv);
+	intel_gt_pm_fini(dev_priv);
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
 	intel_uc_fini_misc(dev_priv);
@@ -1067,7 +1068,6 @@ static int i915_driver_init_mmio(struct drm_i915_private *dev_priv)
  */
 static void i915_driver_cleanup_mmio(struct drm_i915_private *dev_priv)
 {
-	intel_gt_pm_sanitize(dev_priv);
 	intel_uncore_fini(dev_priv);
 	i915_mmio_cleanup(dev_priv);
 	pci_dev_put(dev_priv->bridge_dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1b47eeed7820..1af135925073 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -221,10 +221,6 @@ void i915_gem_unpark(struct drm_i915_private *i915)
 	if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
 		i915->gt.epoch = 1;
 
-	intel_gt_pm_enable_rps(i915);
-	intel_gt_pm_enable_rc6(i915);
-	intel_gt_pm_enable_llc(i915);
-
 	i915_update_gfx_val(i915);
 	if (INTEL_GEN(i915) >= 6)
 		gen6_rps_busy(i915);
@@ -3341,11 +3337,38 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 		i915_gem_reset_finish_engine(engine);
 	}
 
+	intel_gt_pm_sanitize(i915);
+
 	GEM_TRACE("end\n");
 
 	wake_up_all(&i915->gpu_error.reset_queue);
 }
 
+static int load_power_context(struct drm_i915_private *i915)
+{
+	int err;
+
+	intel_gt_pm_sanitize(i915);
+	intel_gt_pm_enable_rps(i915);
+
+	err = i915_gem_switch_to_kernel_context(i915);
+	if (err)
+		goto err;
+
+	err = i915_gem_wait_for_idle(i915, I915_WAIT_LOCKED);
+	if (err)
+		goto err;
+
+	intel_gt_pm_enable_rc6(i915);
+	intel_gt_pm_enable_llc(i915);
+
+	return 0;
+
+err:
+	intel_gt_pm_sanitize(i915);
+	return err;
+}
+
 bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 {
 	struct i915_timeline *tl;
@@ -5040,7 +5063,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	intel_uc_resume(i915);
 
 	/* Always reload a context for powersaving. */
-	if (i915_gem_switch_to_kernel_context(i915))
+	if (load_power_context(i915))
 		goto err_wedged;
 
 out_unlock:
@@ -5235,11 +5258,8 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 			goto err_active;
 	}
 
-	err = i915_gem_switch_to_kernel_context(i915);
-	if (err)
-		goto err_active;
-
-	err = i915_gem_wait_for_idle(i915, I915_WAIT_LOCKED);
+	/* Flush the default context image to memory, and enable powersaving. */
+	err = load_power_context(i915);
 	if (err)
 		goto err_active;
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 464a3c787fbd..e7cb31ab0dd1 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -15549,10 +15549,6 @@ void intel_modeset_cleanup(struct drm_device *dev)
 	flush_work(&dev_priv->atomic_helper.free_work);
 	WARN_ON(!llist_empty(&dev_priv->atomic_helper.free_list));
 
-	intel_gt_pm_disable_llc(dev_priv);
-	intel_gt_pm_disable_rc6(dev_priv);
-	intel_gt_pm_disable_rps(dev_priv);
-
 	/*
 	 * Interrupts and polling as the first thing to avoid creating havoc.
 	 * Too much stuff here (turning of connectors, ...) would
@@ -15580,8 +15576,6 @@ void intel_modeset_cleanup(struct drm_device *dev)
 
 	intel_cleanup_overlay(dev_priv);
 
-	intel_gt_pm_fini(dev_priv);
-
 	intel_teardown_gmbus(dev_priv);
 
 	destroy_workqueue(dev_priv->modeset_wq);
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index 53b7a669bf83..42171c4ba20c 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -2666,6 +2666,8 @@ void intel_gt_pm_disable_llc(struct drm_i915_private *i915)
 
 void intel_gt_pm_fini(struct drm_i915_private *i915)
 {
+	intel_gt_pm_sanitize(i915);
+
 	if (IS_VALLEYVIEW(i915))
 		valleyview_cleanup_gt_powersave(i915);
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 48/71] drm/i915: Simplify rc6/rps enabling
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (45 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 47/71] drm/i915: Enabling rc6 and rps have different requirements, so separate them Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 49/71] drm/i915: Refactor frequency bounds computation Chris Wilson
                   ` (4 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

As we know that whenever the GT is awake, rc6 and rps are enabled (if
available), then we can remove the individual tracking and enabling to
the gen6_rps_busy/gen6_rps_idle() (now called intel_gt_pm_busy and
intel_gt_pm_idle) entry points.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |   6 +-
 drivers/gpu/drm/i915/i915_drv.c      |   3 -
 drivers/gpu/drm/i915/i915_gem.c      |  15 +-
 drivers/gpu/drm/i915/i915_sysfs.c    |   6 +-
 drivers/gpu/drm/i915/intel_display.c |   4 +-
 drivers/gpu/drm/i915/intel_gt_pm.c   | 291 +++++++++------------------
 drivers/gpu/drm/i915/intel_gt_pm.h   |  26 +--
 7 files changed, 121 insertions(+), 230 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3c3ddf8ff2ae..f5cef3876a59 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2191,9 +2191,9 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 	struct drm_file *file;
 
-	seq_printf(m, "RPS enabled? %d\n", rps->enabled);
 	seq_printf(m, "GPU busy? %s [%d requests]\n",
 		   yesno(dev_priv->gt.awake), dev_priv->gt.active_requests);
+	seq_printf(m, "RPS active? %s\n", yesno(rps->active));
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
@@ -2226,9 +2226,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 		   atomic_read(&rps->boosts));
 	mutex_unlock(&dev->filelist_mutex);
 
-	if (INTEL_GEN(dev_priv) >= 6 &&
-	    rps->enabled &&
-	    dev_priv->gt.active_requests) {
+	if (INTEL_GEN(dev_priv) >= 6 && dev_priv->gt.awake) {
 		u32 rpup, rpupei;
 		u32 rpdown, rpdownei;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 74b99cf65adb..f21a0669f828 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2578,9 +2578,6 @@ static int intel_runtime_suspend(struct device *kdev)
 	struct drm_i915_private *dev_priv = to_i915(dev);
 	int ret;
 
-	if (WARN_ON_ONCE(!(dev_priv->gt_pm.rc6.enabled && HAS_RC6(dev_priv))))
-		return -ENODEV;
-
 	if (WARN_ON_ONCE(!HAS_RUNTIME_PM(dev_priv)))
 		return -ENODEV;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1af135925073..5ed90f979ff5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -170,10 +170,9 @@ static u32 __i915_gem_park(struct drm_i915_private *i915)
 	i915_pmu_gt_parked(i915);
 	i915_vma_parked(i915);
 
-	i915->gt.awake = false;
+	intel_gt_pm_idle(i915);
 
-	if (INTEL_GEN(i915) >= 6)
-		gen6_rps_idle(i915);
+	i915->gt.awake = false;
 
 	intel_display_power_put(i915, POWER_DOMAIN_GT_IRQ);
 
@@ -221,9 +220,9 @@ void i915_gem_unpark(struct drm_i915_private *i915)
 	if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
 		i915->gt.epoch = 1;
 
+	intel_gt_pm_busy(i915);
 	i915_update_gfx_val(i915);
-	if (INTEL_GEN(i915) >= 6)
-		gen6_rps_busy(i915);
+
 	i915_pmu_gt_unparked(i915);
 
 	intel_engines_unpark(i915);
@@ -484,10 +483,8 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
 	 * forcing the clocks too high for the whole system, we only allow
 	 * each client to waitboost once in a busy period.
 	 */
-	if (rps_client && !i915_request_started(rq)) {
-		if (INTEL_GEN(rq->i915) >= 6)
-			gen6_rps_boost(rq, rps_client);
-	}
+	if (rps_client && !i915_request_started(rq))
+		intel_rps_boost(rq, rps_client);
 
 	timeout = i915_request_wait(rq, flags, timeout);
 
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index a72aab28399f..db9d55fe449b 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -377,7 +377,8 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 			  intel_gpu_freq(dev_priv, val));
 
 	rps->max_freq_softlimit = val;
-	schedule_work(&rps->work);
+	if (rps->active)
+		schedule_work(&rps->work);
 
 unlock:
 	mutex_unlock(&rps->lock);
@@ -419,7 +420,8 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
 	}
 
 	rps->min_freq_softlimit = val;
-	schedule_work(&rps->work);
+	if (rps->active)
+		schedule_work(&rps->work);
 
 unlock:
 	mutex_unlock(&rps->lock);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index e7cb31ab0dd1..c9b913029fce 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -12717,7 +12717,7 @@ static int do_rps_boost(struct wait_queue_entry *_wait,
 	 * vblank without our intervention, so leave RPS alone.
 	 */
 	if (!i915_request_started(rq))
-		gen6_rps_boost(rq, NULL);
+		intel_rps_boost(rq, NULL);
 	i915_request_put(rq);
 
 	drm_crtc_vblank_put(wait->crtc);
@@ -12735,7 +12735,7 @@ static void add_rps_boost_after_vblank(struct drm_crtc *crtc,
 	if (!dma_fence_is_i915(fence))
 		return;
 
-	if (INTEL_GEN(to_i915(crtc->dev)) < 6)
+	if (!HAS_RPS(to_i915(crtc->dev)))
 		return;
 
 	if (drm_crtc_vblank_get(crtc))
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index 42171c4ba20c..8d53a392afd3 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -327,15 +327,11 @@ static u32 gen6_rps_pm_mask(struct drm_i915_private *i915, u8 val)
  */
 static int gen6_set_rps(struct drm_i915_private *dev_priv, u8 val)
 {
-	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-
-	if (val != rps->cur_freq) {
+	if (val != dev_priv->gt_pm.rps.cur_freq) {
 		if (INTEL_GEN(dev_priv) >= 9)
-			I915_WRITE(GEN6_RPNSWREQ,
-				   GEN9_FREQUENCY(val));
+			I915_WRITE(GEN6_RPNSWREQ, GEN9_FREQUENCY(val));
 		else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
-			I915_WRITE(GEN6_RPNSWREQ,
-				   HSW_FREQUENCY(val));
+			I915_WRITE(GEN6_RPNSWREQ, HSW_FREQUENCY(val));
 		else
 			I915_WRITE(GEN6_RPNSWREQ,
 				   GEN6_FREQUENCY(val) |
@@ -352,9 +348,6 @@ static int gen6_set_rps(struct drm_i915_private *dev_priv, u8 val)
 	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS, intel_rps_limits(dev_priv, val));
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
 
-	rps->cur_freq = val;
-	trace_intel_gpu_freq_change(intel_gpu_freq(dev_priv, val));
-
 	return 0;
 }
 
@@ -377,48 +370,17 @@ static int valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
 	gen6_set_rps_thresholds(dev_priv, val);
 	I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, val));
 
-	dev_priv->gt_pm.rps.cur_freq = val;
-	trace_intel_gpu_freq_change(intel_gpu_freq(dev_priv, val));
-
 	return 0;
 }
 
-/*
- * vlv_set_rps_idle: Set the frequency to idle, if Gfx clocks are down
- *
- * If Gfx is Idle, then
- * 1. Forcewake Media well.
- * 2. Request idle freq.
- * 3. Release Forcewake of Media well.
- */
-static void vlv_set_rps_idle(struct drm_i915_private *i915)
+static int __intel_set_rps(struct drm_i915_private *i915, u8 val)
 {
-	struct intel_rps *rps = &i915->gt_pm.rps;
-	u32 val = rps->idle_freq;
-	int err;
-
-	if (rps->cur_freq <= val)
-		return;
-
-	/*
-	 * The punit delays the write of the frequency and voltage until it
-	 * determines the GPU is awake. During normal usage we don't want to
-	 * waste power changing the frequency if the GPU is sleeping (rc6).
-	 * However, the GPU and driver is now idle and we do not want to delay
-	 * switching to minimum voltage (reducing power whilst idle) as we do
-	 * not expect to be woken in the near future and so must flush the
-	 * change by waking the device.
-	 *
-	 * We choose to take the media powerwell (either would do to trick the
-	 * punit into committing the voltage change) as that takes a lot less
-	 * power than the render powerwell.
-	 */
-	intel_uncore_forcewake_get(i915, FORCEWAKE_MEDIA);
-	err = valleyview_set_rps(i915, val);
-	intel_uncore_forcewake_put(i915, FORCEWAKE_MEDIA);
-
-	if (err)
-		DRM_ERROR("Failed to set RPS for idle\n");
+	if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
+		return valleyview_set_rps(i915, val);
+	else if (INTEL_GEN(i915) >= 6)
+		return gen6_set_rps(i915, val);
+	else
+		return 0;
 }
 
 static int intel_set_rps(struct drm_i915_private *i915, u8 val)
@@ -427,20 +389,20 @@ static int intel_set_rps(struct drm_i915_private *i915, u8 val)
 	int err;
 
 	lockdep_assert_held(&rps->lock);
+	GEM_BUG_ON(!rps->active);
 	GEM_BUG_ON(val > rps->max_freq);
 	GEM_BUG_ON(val < rps->min_freq);
 
-	if (!rps->enabled) {
+	err = __intel_set_rps(i915, val);
+	if (err)
+		return err;
+
+	if (val != rps->cur_freq) {
+		trace_intel_gpu_freq_change(intel_gpu_freq(i915, val));
 		rps->cur_freq = val;
-		return 0;
 	}
 
-	if (IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))
-		err = valleyview_set_rps(i915, val);
-	else
-		err = gen6_set_rps(i915, val);
-
-	return err;
+	return 0;
 }
 
 static i915_reg_t gen6_pm_iir(struct drm_i915_private *dev_priv)
@@ -538,18 +500,11 @@ static void enable_rps_interrupts(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
-	if (READ_ONCE(rps->interrupts_enabled))
-		return;
-
 	if (WARN_ON_ONCE(IS_GEN11(dev_priv)))
 		return;
 
 	spin_lock_irq(&dev_priv->irq_lock);
-	WARN_ON_ONCE(rps->pm_iir);
-	WARN_ON_ONCE(I915_READ(gen6_pm_iir(dev_priv)) & rps->pm_events);
-	rps->interrupts_enabled = true;
 	gen6_enable_pm_irq(dev_priv, rps->pm_events);
-
 	spin_unlock_irq(&dev_priv->irq_lock);
 }
 
@@ -557,29 +512,15 @@ static void disable_rps_interrupts(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
-	if (!READ_ONCE(rps->interrupts_enabled))
-		return;
-
 	if (WARN_ON_ONCE(IS_GEN11(dev_priv)))
 		return;
 
 	spin_lock_irq(&dev_priv->irq_lock);
-	rps->interrupts_enabled = false;
-
 	I915_WRITE(GEN6_PMINTRMSK, gen6_sanitize_rps_pm_mask(dev_priv, ~0u));
-
 	gen6_disable_pm_irq(dev_priv, rps->pm_events);
-
 	spin_unlock_irq(&dev_priv->irq_lock);
-	synchronize_irq(dev_priv->drm.irq);
 
-	/* Now that we will not be generating any more work, flush any
-	 * outstanding tasks. As we are called on the RPS idle path,
-	 * we will reset the GPU to minimum frequencies, so the current
-	 * state of the worker can be discarded.
-	 */
-	cancel_work_sync(&rps->work);
-	gen6_reset_rps_interrupts(dev_priv);
+	synchronize_irq(dev_priv->drm.irq);
 }
 
 static void vlv_c0_read(struct drm_i915_private *dev_priv,
@@ -646,6 +587,9 @@ static void intel_rps_work(struct work_struct *work)
 
 	mutex_lock(&rps->lock);
 
+	if (!rps->active)
+		goto unlock;
+
 	min = rps->min_freq_softlimit;
 	max = rps->max_freq_softlimit;
 	if (client_boost && max < rps->boost_freq)
@@ -694,106 +638,125 @@ static void intel_rps_work(struct work_struct *work)
 		adj = 0;
 	}
 
-	mutex_unlock(&rps->lock);
-
 	if (pm_iir) {
 		spin_lock_irq(&i915->irq_lock);
-		if (rps->interrupts_enabled)
-			gen6_unmask_pm_irq(i915, rps->pm_events);
+		gen6_unmask_pm_irq(i915, rps->pm_events);
 		spin_unlock_irq(&i915->irq_lock);
 		rps->last_adj = adj;
 	}
+
+unlock:
+	mutex_unlock(&rps->lock);
 }
 
 void intel_gt_pm_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
-	if (pm_iir & rps->pm_events) {
+	if (rps->active && pm_iir & rps->pm_events) {
 		spin_lock(&dev_priv->irq_lock);
 		gen6_mask_pm_irq(dev_priv, pm_iir & rps->pm_events);
-		if (rps->interrupts_enabled) {
-			rps->pm_iir |= pm_iir & rps->pm_events;
-			schedule_work(&rps->work);
-		}
+		rps->pm_iir |= pm_iir & rps->pm_events;
 		spin_unlock(&dev_priv->irq_lock);
+
+		schedule_work(&rps->work);
 	}
 }
 
-void gen6_rps_busy(struct drm_i915_private *dev_priv)
+void intel_gt_pm_busy(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
+	u8 freq;
 
 	if (!HAS_RPS(dev_priv))
 		return;
 
-	mutex_lock(&rps->lock);
-	if (rps->enabled) {
-		u8 freq;
+	GEM_BUG_ON(rps->pm_iir);
+	GEM_BUG_ON(rps->active);
 
-		I915_WRITE(GEN6_PMINTRMSK,
-			   gen6_rps_pm_mask(dev_priv, rps->cur_freq));
+	mutex_lock(&rps->lock);
+	rps->active = true;
 
-		enable_rps_interrupts(dev_priv);
-		memset(&rps->ei, 0, sizeof(rps->ei));
+	/*
+	 * Use the user's desired frequency as a guide, but for better
+	 * performance, jump directly to RPe as our starting frequency.
+	 */
+	freq = max(rps->cur_freq, rps->efficient_freq);
+	if (intel_set_rps(dev_priv,
+			  clamp(freq,
+				rps->min_freq_softlimit,
+				rps->max_freq_softlimit)))
+		DRM_DEBUG_DRIVER("Failed to set busy frequency\n");
 
-		/*
-		 * Use the user's desired frequency as a guide, but for better
-		 * performance, jump directly to RPe as our starting frequency.
-		 */
-		freq = max(rps->cur_freq, rps->efficient_freq);
+	rps->last_adj = 0;
 
-		if (intel_set_rps(dev_priv,
-				  clamp(freq,
-					rps->min_freq_softlimit,
-					rps->max_freq_softlimit)))
-			DRM_DEBUG_DRIVER("Failed to set idle frequency\n");
+	if (INTEL_GEN(dev_priv) >= 6) {
+		memset(&rps->ei, 0, sizeof(rps->ei));
+		enable_rps_interrupts(dev_priv);
 	}
+
 	mutex_unlock(&rps->lock);
 }
 
-void gen6_rps_idle(struct drm_i915_private *dev_priv)
+void intel_gt_pm_idle(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 
-	if (!HAS_RPS(dev_priv))
+	if (!rps->active)
 		return;
 
-	/*
-	 * Flush our bottom-half so that it does not race with us
-	 * setting the idle frequency and so that it is bounded by
-	 * our rpm wakeref. And then disable the interrupts to stop any
-	 * futher RPS reclocking whilst we are asleep.
-	 */
+	mutex_lock(&rps->lock);
+
 	disable_rps_interrupts(dev_priv);
 
-	mutex_lock(&rps->lock);
-	if (rps->enabled) {
-		if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv))
-			vlv_set_rps_idle(dev_priv);
-		else
-			gen6_set_rps(dev_priv, rps->idle_freq);
-		rps->last_adj = 0;
+	if (rps->cur_freq > rps->idle_freq) {
+		/*
+		 * The punit delays the write of the frequency and voltage
+		 * until it determines the GPU is awake. During normal usage we
+		 * don't want to waste power changing the frequency if the GPU
+		 * is sleeping (rc6).  However, the GPU and driver is now idle
+		 * and we do not want to delay switching to minimum voltage
+		 * (reducing power whilst idle) as we do not expect to be woken
+		 * in the near future and so must flush the change by waking
+		 * the device.
+		 *
+		 * We choose to take the media powerwell (either would do to
+		 * trick the punit into committing the voltage change) as that
+		 * takes a lot less power than the render powerwell.
+		 */
+		intel_uncore_forcewake_get(dev_priv, FORCEWAKE_MEDIA);
+		if (__intel_set_rps(dev_priv, rps->idle_freq))
+			DRM_DEBUG_DRIVER("Failed to set idle frequency\n");
+		rps->cur_freq = rps->idle_freq;
+		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_MEDIA);
+	}
+
+	if (INTEL_GEN(dev_priv) >= 6) {
 		I915_WRITE(GEN6_PMINTRMSK,
 			   gen6_sanitize_rps_pm_mask(dev_priv, ~0));
 	}
+
+	rps->last_adj = 0;
+	rps->active = false;
 	mutex_unlock(&rps->lock);
+
+	/*
+	 * Now that we will not be generating any more work, flush any
+	 * outstanding tasks. As we are called on the RPS idle path,
+	 * we will reset the GPU to minimum frequencies, so the current
+	 * state of the worker can be discarded.
+	 */
+	cancel_work_sync(&rps->work);
+	gen6_reset_rps_interrupts(dev_priv);
 }
 
-void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *client)
+void intel_rps_boost(struct i915_request *rq, struct intel_rps_client *client)
 {
 	struct intel_rps *rps = &rq->i915->gt_pm.rps;
 	unsigned long flags;
 	bool boost;
 
-	if (!HAS_RPS(rq->i915))
-		return;
-
-	/*
-	 * This is intentionally racy! We peek at the state here, then
-	 * validate inside the RPS worker.
-	 */
-	if (!rps->enabled)
+	if (!READ_ONCE(rps->active))
 		return;
 
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
@@ -1005,20 +968,6 @@ static void gen6_init_rps_frequencies(struct drm_i915_private *dev_priv)
 	}
 }
 
-static void reset_rps(struct drm_i915_private *i915,
-		      int (*set)(struct drm_i915_private *, u8))
-{
-	struct intel_rps *rps = &i915->gt_pm.rps;
-	u8 freq = rps->cur_freq;
-
-	/* force a reset */
-	rps->power = -1;
-	rps->cur_freq = -1;
-
-	if (set(i915, freq))
-		DRM_ERROR("Failed to reset RPS to initial values\n");
-}
-
 /* See the Gen9_GT_PM_Programming_Guide doc for the below */
 static void gen9_enable_rps(struct drm_i915_private *dev_priv)
 {
@@ -1040,7 +989,6 @@ static void gen9_enable_rps(struct drm_i915_private *dev_priv)
 	 * Up/Down EI & threshold registers, as well as the RP_CONTROL,
 	 * RP_INTERRUPT_LIMITS & RPNSWREQ registers.
 	 */
-	reset_rps(dev_priv, gen6_set_rps);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
@@ -1210,8 +1158,6 @@ static void gen8_enable_rps(struct drm_i915_private *dev_priv)
 		   GEN6_RP_UP_BUSY_AVG |
 		   GEN6_RP_DOWN_IDLE_AVG);
 
-	reset_rps(dev_priv, gen6_set_rps);
-
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
@@ -1311,8 +1257,6 @@ static void gen6_enable_rps(struct drm_i915_private *dev_priv)
 	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 50000);
 	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
 
-	reset_rps(dev_priv, gen6_set_rps);
-
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
@@ -1829,8 +1773,6 @@ static void cherryview_enable_rps(struct drm_i915_private *dev_priv)
 	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", yesno(val & GPLLENABLE));
 	DRM_DEBUG_DRIVER("GPU status: 0x%08x\n", val);
 
-	reset_rps(dev_priv, valleyview_set_rps);
-
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
@@ -1915,8 +1857,6 @@ static void valleyview_enable_rps(struct drm_i915_private *dev_priv)
 	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", yesno(val & GPLLENABLE));
 	DRM_DEBUG_DRIVER("GPU status: 0x%08x\n", val);
 
-	reset_rps(dev_priv, valleyview_set_rps);
-
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
@@ -2403,11 +2343,7 @@ static void intel_init_emon(struct drm_i915_private *dev_priv)
 void intel_gt_pm_sanitize(struct drm_i915_private *i915)
 {
 	intel_gt_pm_disable_llc(i915);
-
-	i915->gt_pm.rc6.enabled = true; /* force RC6 disabling */
 	intel_gt_pm_disable_rc6(i915);
-
-	i915->gt_pm.rps.enabled = true; /* force RPS disabling */
 	intel_gt_pm_disable_rps(i915);
 
 	if (INTEL_GEN(i915) >= 11)
@@ -2502,9 +2438,6 @@ static void __enable_rc6(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
-	if (i915->gt_pm.rc6.enabled)
-		return;
-
 	if (IS_CHERRYVIEW(i915))
 		cherryview_enable_rc6(i915);
 	else if (IS_VALLEYVIEW(i915))
@@ -2515,8 +2448,6 @@ static void __enable_rc6(struct drm_i915_private *i915)
 		gen8_enable_rc6(i915);
 	else if (INTEL_GEN(i915) >= 6)
 		gen6_enable_rc6(i915);
-
-	i915->gt_pm.rc6.enabled = true;
 }
 
 void intel_gt_pm_enable_rc6(struct drm_i915_private *i915)
@@ -2535,9 +2466,6 @@ static void __enable_rps(struct drm_i915_private *i915)
 
 	lockdep_assert_held(&rps->lock);
 
-	if (rps->enabled)
-		return;
-
 	if (IS_CHERRYVIEW(i915)) {
 		cherryview_enable_rps(i915);
 	} else if (IS_VALLEYVIEW(i915)) {
@@ -2559,7 +2487,12 @@ static void __enable_rps(struct drm_i915_private *i915)
 	WARN_ON(rps->efficient_freq < rps->min_freq);
 	WARN_ON(rps->efficient_freq > rps->max_freq);
 
-	rps->enabled = true;
+	/* Force a reset */
+	rps->cur_freq = rps->max_freq;
+	rps->power = -1;
+	__intel_set_rps(i915, rps->idle_freq);
+
+	rps->cur_freq = rps->idle_freq;
 }
 
 void intel_gt_pm_enable_rps(struct drm_i915_private *i915)
@@ -2576,11 +2509,7 @@ static void __enable_llc(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
-	if (i915->gt_pm.llc_pstate.enabled)
-		return;
-
 	gen6_update_ring_freq(i915);
-	i915->gt_pm.llc_pstate.enabled = true;
 }
 
 void intel_gt_pm_enable_llc(struct drm_i915_private *i915)
@@ -2597,9 +2526,6 @@ static void __disable_rc6(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
-	if (!i915->gt_pm.rc6.enabled)
-		return;
-
 	if (INTEL_GEN(i915) >= 9)
 		gen9_disable_rc6(i915);
 	else if (IS_CHERRYVIEW(i915))
@@ -2608,8 +2534,6 @@ static void __disable_rc6(struct drm_i915_private *i915)
 		valleyview_disable_rc6(i915);
 	else if (INTEL_GEN(i915) >= 6)
 		gen6_disable_rc6(i915);
-
-	i915->gt_pm.rc6.enabled = false;
 }
 
 void intel_gt_pm_disable_rc6(struct drm_i915_private *i915)
@@ -2623,9 +2547,6 @@ static void __disable_rps(struct drm_i915_private *i915)
 {
 	lockdep_assert_held(&i915->gt_pm.rps.lock);
 
-	if (!i915->gt_pm.rps.enabled)
-		return;
-
 	if (INTEL_GEN(i915) >= 9)
 		gen9_disable_rps(i915);
 	else if (IS_CHERRYVIEW(i915))
@@ -2636,8 +2557,6 @@ static void __disable_rps(struct drm_i915_private *i915)
 		gen6_disable_rps(i915);
 	else if (INTEL_GEN(i915) >= 5)
 		ironlake_disable_drps(i915);
-
-	i915->gt_pm.rps.enabled = false;
 }
 
 void intel_gt_pm_disable_rps(struct drm_i915_private *i915)
@@ -2647,21 +2566,9 @@ void intel_gt_pm_disable_rps(struct drm_i915_private *i915)
 	mutex_unlock(&i915->gt_pm.rps.lock);
 }
 
-static void __disable_llc(struct drm_i915_private *i915)
-{
-	lockdep_assert_held(&i915->gt_pm.rps.lock);
-
-	if (!i915->gt_pm.llc_pstate.enabled)
-		return;
-
-	i915->gt_pm.llc_pstate.enabled = false;
-}
-
 void intel_gt_pm_disable_llc(struct drm_i915_private *i915)
 {
-	mutex_lock(&i915->gt_pm.rps.lock);
-	__disable_llc(i915);
-	mutex_unlock(&i915->gt_pm.rps.lock);
+	/* Nothing to do here. */
 }
 
 void intel_gt_pm_fini(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
index 2e93bc6238c1..66818828930d 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -19,14 +19,10 @@ struct intel_rps_ei {
 
 struct intel_rps {
 	struct mutex lock;
-
-	/*
-	 * work, interrupts_enabled and pm_iir are protected by
-	 * i915->irq_lock
-	 */
 	struct work_struct work;
-	bool interrupts_enabled;
-	u32 pm_iir;
+
+	bool active;
+	u32 pm_iir; /* protected by i915->irq_lock */
 
 	/* PM interrupt bits that should never be masked */
 	u32 pm_intrmsk_mbz;
@@ -63,7 +59,6 @@ struct intel_rps {
 	int last_adj;
 	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
 
-	bool enabled;
 	atomic_t num_waiters;
 	atomic_t boosts;
 
@@ -72,19 +67,13 @@ struct intel_rps {
 };
 
 struct intel_rc6 {
-	bool enabled;
 	u64 prev_hw_residency[4];
 	u64 cur_residency[4];
 };
 
-struct intel_llc_pstate {
-	bool enabled;
-};
-
 struct intel_gt_pm {
-	struct intel_rps rps;
 	struct intel_rc6 rc6;
-	struct intel_llc_pstate llc_pstate;
+	struct intel_rps rps;
 
 	u32 imr;
 	u32 ier;
@@ -107,11 +96,12 @@ void intel_gt_pm_disable_rc6(struct drm_i915_private *i915);
 void intel_gt_pm_enable_llc(struct drm_i915_private *i915);
 void intel_gt_pm_disable_llc(struct drm_i915_private *i915);
 
+void intel_gt_pm_busy(struct drm_i915_private *i915);
+void intel_gt_pm_idle(struct drm_i915_private *i915);
+
 void intel_gt_pm_irq_handler(struct drm_i915_private *i915, u32 pm_iir);
 
-void gen6_rps_busy(struct drm_i915_private *i915);
-void gen6_rps_idle(struct drm_i915_private *i915);
-void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
+void intel_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
 
 int intel_gpu_freq(const struct drm_i915_private *i915, int val);
 int intel_freq_opcode(const struct drm_i915_private *i915, int val);
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 49/71] drm/i915: Refactor frequency bounds computation
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (46 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 48/71] drm/i915: Simplify rc6/rps enabling Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 50/71] drm/i915: Rename rps min/max frequencies Chris Wilson
                   ` (3 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

When choosing the initial frequency in intel_gt_pm_busy() we also need
to calculate the current min/max bounds. As this calculation is going to
become more complex with the intersection of several different limits,
refactor it to a common function. The alternative wold be to feed the
initial reclocking through the RPS worker, but the latency in this case
is undesirable.

v2: Only apply the rps->last_adj update if the frequency was unclamped.
The intention is that we don't continue to accumulate the adjustment
when we hit the bounds.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/intel_gt_pm.c | 57 +++++++++++-------------------
 1 file changed, 21 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index 8d53a392afd3..c2754a9c01de 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -383,15 +383,25 @@ static int __intel_set_rps(struct drm_i915_private *i915, u8 val)
 		return 0;
 }
 
-static int intel_set_rps(struct drm_i915_private *i915, u8 val)
+static int adjust_rps(struct drm_i915_private *i915, int freq, int adj)
 {
 	struct intel_rps *rps = &i915->gt_pm.rps;
+	int min, max, val;
 	int err;
 
 	lockdep_assert_held(&rps->lock);
 	GEM_BUG_ON(!rps->active);
-	GEM_BUG_ON(val > rps->max_freq);
-	GEM_BUG_ON(val < rps->min_freq);
+
+	min = rps->min_freq_softlimit;
+	max = rps->max_freq_softlimit;
+	if (atomic_read(&rps->num_waiters) && max < rps->boost_freq)
+		max = rps->boost_freq;
+
+	GEM_BUG_ON(min < rps->min_freq);
+	GEM_BUG_ON(max > rps->max_freq);
+	GEM_BUG_ON(max < min);
+
+	val = clamp(freq + adj, min, max);
 
 	err = __intel_set_rps(i915, val);
 	if (err)
@@ -400,6 +410,7 @@ static int intel_set_rps(struct drm_i915_private *i915, u8 val)
 	if (val != rps->cur_freq) {
 		trace_intel_gpu_freq_change(intel_gpu_freq(i915, val));
 		rps->cur_freq = val;
+		rps->last_adj = val == freq + adj ? adj : 0;
 	}
 
 	return 0;
@@ -576,8 +587,8 @@ static void intel_rps_work(struct work_struct *work)
 	struct drm_i915_private *i915 =
 		container_of(work, struct drm_i915_private, gt_pm.rps.work);
 	struct intel_rps *rps = &i915->gt_pm.rps;
-	int freq, adj, min, max;
 	bool client_boost;
+	int freq, adj;
 	u32 pm_iir;
 
 	pm_iir = xchg(&rps->pm_iir, 0) & ~rps->pm_events;
@@ -590,15 +601,6 @@ static void intel_rps_work(struct work_struct *work)
 	if (!rps->active)
 		goto unlock;
 
-	min = rps->min_freq_softlimit;
-	max = rps->max_freq_softlimit;
-	if (client_boost && max < rps->boost_freq)
-		max = rps->boost_freq;
-
-	GEM_BUG_ON(min < rps->min_freq);
-	GEM_BUG_ON(max > rps->max_freq);
-	GEM_BUG_ON(max < min);
-
 	adj = rps->last_adj;
 	freq = rps->cur_freq;
 	if (client_boost && freq < rps->boost_freq) {
@@ -609,16 +611,13 @@ static void intel_rps_work(struct work_struct *work)
 			adj *= 2;
 		else /* CHV needs even encode values */
 			adj = IS_CHERRYVIEW(i915) ? 2 : 1;
-
-		if (freq >= max)
-			adj = 0;
 	} else if (client_boost) {
 		adj = 0;
 	} else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) {
-		if (freq > max_t(int, rps->efficient_freq, min))
-			freq = max_t(int, rps->efficient_freq, min);
-		else if (freq > min_t(int, rps->efficient_freq, min))
-			freq = min_t(int, rps->efficient_freq, min);
+		if (freq > rps->efficient_freq)
+			freq = rps->efficient_freq;
+		else if (freq > rps->idle_freq)
+			freq = rps->idle_freq;
 
 		adj = 0;
 	} else if (pm_iir & GEN6_PM_RP_DOWN_THRESHOLD) {
@@ -626,23 +625,17 @@ static void intel_rps_work(struct work_struct *work)
 			adj *= 2;
 		else /* CHV needs even encode values */
 			adj = IS_CHERRYVIEW(i915) ? -2 : -1;
-
-		if (freq <= min)
-			adj = 0;
 	} else { /* unknown/external event */
 		adj = 0;
 	}
 
-	if (intel_set_rps(i915, clamp_t(int, freq + adj, min, max))) {
+	if (adjust_rps(i915, freq, adj))
 		DRM_DEBUG_DRIVER("Failed to set new GPU frequency\n");
-		adj = 0;
-	}
 
 	if (pm_iir) {
 		spin_lock_irq(&i915->irq_lock);
 		gen6_unmask_pm_irq(i915, rps->pm_events);
 		spin_unlock_irq(&i915->irq_lock);
-		rps->last_adj = adj;
 	}
 
 unlock:
@@ -666,7 +659,6 @@ void intel_gt_pm_irq_handler(struct drm_i915_private *dev_priv, u32 pm_iir)
 void intel_gt_pm_busy(struct drm_i915_private *dev_priv)
 {
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
-	u8 freq;
 
 	if (!HAS_RPS(dev_priv))
 		return;
@@ -681,14 +673,7 @@ void intel_gt_pm_busy(struct drm_i915_private *dev_priv)
 	 * Use the user's desired frequency as a guide, but for better
 	 * performance, jump directly to RPe as our starting frequency.
 	 */
-	freq = max(rps->cur_freq, rps->efficient_freq);
-	if (intel_set_rps(dev_priv,
-			  clamp(freq,
-				rps->min_freq_softlimit,
-				rps->max_freq_softlimit)))
-		DRM_DEBUG_DRIVER("Failed to set busy frequency\n");
-
-	rps->last_adj = 0;
+	adjust_rps(dev_priv, max(rps->cur_freq, rps->efficient_freq), 0);
 
 	if (INTEL_GEN(dev_priv) >= 6) {
 		memset(&rps->ei, 0, sizeof(rps->ei));
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 50/71] drm/i915: Rename rps min/max frequencies
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (47 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 49/71] drm/i915: Refactor frequency bounds computation Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03  6:37 ` [PATCH 51/71] drm/i915: Pull IPS into GT power management Chris Wilson
                   ` (2 subsequent siblings)
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

In preparation for more layers of limits, rename the existing limits to
hw and user.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  34 ++++---
 drivers/gpu/drm/i915/i915_pmu.c     |   4 +-
 drivers/gpu/drm/i915/i915_sysfs.c   |  23 ++---
 drivers/gpu/drm/i915/intel_gt_pm.c  | 151 ++++++++++++++--------------
 drivers/gpu/drm/i915/intel_gt_pm.h  |  18 ++--
 5 files changed, 118 insertions(+), 112 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f5cef3876a59..4f88d6614686 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1100,13 +1100,13 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 			   intel_gpu_freq(dev_priv, (freq_sts >> 8) & 0xff));
 
 		seq_printf(m, "current GPU freq: %d MHz\n",
-			   intel_gpu_freq(dev_priv, rps->cur_freq));
+			   intel_gpu_freq(dev_priv, rps->freq));
 
 		seq_printf(m, "max GPU freq: %d MHz\n",
-			   intel_gpu_freq(dev_priv, rps->max_freq));
+			   intel_gpu_freq(dev_priv, rps->max_freq_hw));
 
 		seq_printf(m, "min GPU freq: %d MHz\n",
-			   intel_gpu_freq(dev_priv, rps->min_freq));
+			   intel_gpu_freq(dev_priv, rps->min_freq_hw));
 
 		seq_printf(m, "idle GPU freq: %d MHz\n",
 			   intel_gpu_freq(dev_priv, rps->idle_freq));
@@ -1238,19 +1238,19 @@ static int i915_frequency_info(struct seq_file *m, void *unused)
 		seq_printf(m, "Max non-overclocked (RP0) frequency: %dMHz\n",
 			   intel_gpu_freq(dev_priv, max_freq));
 		seq_printf(m, "Max overclocked frequency: %dMHz\n",
-			   intel_gpu_freq(dev_priv, rps->max_freq));
+			   intel_gpu_freq(dev_priv, rps->max_freq_hw));
 
 		seq_printf(m, "Current freq: %d MHz\n",
-			   intel_gpu_freq(dev_priv, rps->cur_freq));
+			   intel_gpu_freq(dev_priv, rps->freq));
 		seq_printf(m, "Actual freq: %d MHz\n", cagf);
 		seq_printf(m, "Idle freq: %d MHz\n",
 			   intel_gpu_freq(dev_priv, rps->idle_freq));
 		seq_printf(m, "Min freq: %d MHz\n",
-			   intel_gpu_freq(dev_priv, rps->min_freq));
+			   intel_gpu_freq(dev_priv, rps->min_freq_hw));
 		seq_printf(m, "Boost freq: %d MHz\n",
 			   intel_gpu_freq(dev_priv, rps->boost_freq));
 		seq_printf(m, "Max freq: %d MHz\n",
-			   intel_gpu_freq(dev_priv, rps->max_freq));
+			   intel_gpu_freq(dev_priv, rps->max_freq_hw));
 		seq_printf(m,
 			   "efficient (RPe) frequency: %d MHz\n",
 			   intel_gpu_freq(dev_priv, rps->efficient_freq));
@@ -1801,8 +1801,8 @@ static int i915_ring_freq_table(struct seq_file *m, void *unused)
 	if (!HAS_LLC(dev_priv))
 		return -ENODEV;
 
-	min_gpu_freq = rps->min_freq;
-	max_gpu_freq = rps->max_freq;
+	min_gpu_freq = rps->min_freq_hw;
+	max_gpu_freq = rps->max_freq_hw;
 	if (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) >= 10) {
 		/* Convert GT frequency to 50 HZ units */
 		min_gpu_freq /= GEN9_FREQ_SCALER;
@@ -2197,13 +2197,15 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 	seq_printf(m, "CPU waiting? %d\n", count_irq_waiters(dev_priv));
 	seq_printf(m, "Boosts outstanding? %d\n",
 		   atomic_read(&rps->num_waiters));
-	seq_printf(m, "Frequency requested %d\n",
-		   intel_gpu_freq(dev_priv, rps->cur_freq));
-	seq_printf(m, "  min hard:%d, soft:%d; max soft:%d, hard:%d\n",
-		   intel_gpu_freq(dev_priv, rps->min_freq),
-		   intel_gpu_freq(dev_priv, rps->min_freq_softlimit),
-		   intel_gpu_freq(dev_priv, rps->max_freq_softlimit),
-		   intel_gpu_freq(dev_priv, rps->max_freq));
+	seq_printf(m, "Frequency requested %d [%d, %d]\n",
+		   intel_gpu_freq(dev_priv, rps->freq),
+		   intel_gpu_freq(dev_priv, rps->min),
+		   intel_gpu_freq(dev_priv, rps->max));
+	seq_printf(m, "  min hard:%d, user:%d; max user:%d, hard:%d\n",
+		   intel_gpu_freq(dev_priv, rps->min_freq_hw),
+		   intel_gpu_freq(dev_priv, rps->min_freq_user),
+		   intel_gpu_freq(dev_priv, rps->max_freq_user),
+		   intel_gpu_freq(dev_priv, rps->max_freq_hw));
 	seq_printf(m, "  idle:%d, efficient:%d, boost:%d\n",
 		   intel_gpu_freq(dev_priv, rps->idle_freq),
 		   intel_gpu_freq(dev_priv, rps->efficient_freq),
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index f374af971395..4c2e4617a1aa 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -214,7 +214,7 @@ static void frequency_sample(struct drm_i915_private *dev_priv)
 	    config_enabled_mask(I915_PMU_ACTUAL_FREQUENCY)) {
 		u32 val;
 
-		val = dev_priv->gt_pm.rps.cur_freq;
+		val = dev_priv->gt_pm.rps.freq;
 		if (dev_priv->gt.awake &&
 		    intel_runtime_pm_get_if_in_use(dev_priv)) {
 			val = intel_get_cagf(dev_priv,
@@ -230,7 +230,7 @@ static void frequency_sample(struct drm_i915_private *dev_priv)
 	    config_enabled_mask(I915_PMU_REQUESTED_FREQUENCY)) {
 		update_sample(&dev_priv->pmu.sample[__I915_SAMPLE_FREQ_REQ], 1,
 			      intel_gpu_freq(dev_priv,
-					     dev_priv->gt_pm.rps.cur_freq));
+					     dev_priv->gt_pm.rps.freq));
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index db9d55fe449b..2d4c7f2e0878 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -286,8 +286,7 @@ static ssize_t gt_cur_freq_mhz_show(struct device *kdev,
 	struct drm_i915_private *dev_priv = kdev_minor_to_i915(kdev);
 
 	return snprintf(buf, PAGE_SIZE, "%d\n",
-			intel_gpu_freq(dev_priv,
-				       dev_priv->gt_pm.rps.cur_freq));
+			intel_gpu_freq(dev_priv, dev_priv->gt_pm.rps.freq));
 }
 
 static ssize_t gt_boost_freq_mhz_show(struct device *kdev, struct device_attribute *attr, char *buf)
@@ -315,7 +314,7 @@ static ssize_t gt_boost_freq_mhz_store(struct device *kdev,
 
 	/* Validate against (static) hardware limits */
 	val = intel_freq_opcode(dev_priv, val);
-	if (val < rps->min_freq || val > rps->max_freq)
+	if (val < rps->min_freq_hw || val > rps->max_freq_hw)
 		return -EINVAL;
 
 	mutex_lock(&rps->lock);
@@ -346,7 +345,7 @@ static ssize_t gt_max_freq_mhz_show(struct device *kdev, struct device_attribute
 
 	return snprintf(buf, PAGE_SIZE, "%d\n",
 			intel_gpu_freq(dev_priv,
-				       dev_priv->gt_pm.rps.max_freq_softlimit));
+				       dev_priv->gt_pm.rps.max_freq_user));
 }
 
 static ssize_t gt_max_freq_mhz_store(struct device *kdev,
@@ -365,9 +364,7 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 	val = intel_freq_opcode(dev_priv, val);
 
 	mutex_lock(&rps->lock);
-	if (val < rps->min_freq ||
-	    val > rps->max_freq ||
-	    val < rps->min_freq_softlimit) {
+	if (val < rps->min_freq_user || val > rps->max_freq_hw) {
 		ret = -EINVAL;
 		goto unlock;
 	}
@@ -376,7 +373,7 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 		DRM_DEBUG("User requested overclocking to %d\n",
 			  intel_gpu_freq(dev_priv, val));
 
-	rps->max_freq_softlimit = val;
+	rps->max_freq_user = val;
 	if (rps->active)
 		schedule_work(&rps->work);
 
@@ -393,7 +390,7 @@ static ssize_t gt_min_freq_mhz_show(struct device *kdev, struct device_attribute
 
 	return snprintf(buf, PAGE_SIZE, "%d\n",
 			intel_gpu_freq(dev_priv,
-				       dev_priv->gt_pm.rps.min_freq_softlimit));
+				       dev_priv->gt_pm.rps.min_freq_user));
 }
 
 static ssize_t gt_min_freq_mhz_store(struct device *kdev,
@@ -412,14 +409,12 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
 	val = intel_freq_opcode(dev_priv, val);
 
 	mutex_lock(&rps->lock);
-	if (val < rps->min_freq ||
-	    val > rps->max_freq ||
-	    val > rps->max_freq_softlimit) {
+	if (val < rps->min_freq_hw || val > rps->max_freq_user) {
 		ret = -EINVAL;
 		goto unlock;
 	}
 
-	rps->min_freq_softlimit = val;
+	rps->min_freq_user = val;
 	if (rps->active)
 		schedule_work(&rps->work);
 
@@ -455,7 +450,7 @@ static ssize_t gt_rp_mhz_show(struct device *kdev, struct device_attribute *attr
 	else if (attr == &dev_attr_gt_RP1_freq_mhz)
 		val = intel_gpu_freq(dev_priv, rps->rp1_freq);
 	else if (attr == &dev_attr_gt_RPn_freq_mhz)
-		val = intel_gpu_freq(dev_priv, rps->min_freq);
+		val = intel_gpu_freq(dev_priv, rps->min_freq_hw);
 	else
 		BUG();
 
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index c2754a9c01de..f71c39e528cc 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -178,13 +178,13 @@ static u32 intel_rps_limits(struct drm_i915_private *i915, u8 val)
 	 * receive a down interrupt.
 	 */
 	if (INTEL_GEN(i915) >= 9) {
-		limits = (rps->max_freq_softlimit) << 23;
-		if (val <= rps->min_freq_softlimit)
-			limits |= (rps->min_freq_softlimit) << 14;
+		limits = rps->max << 23;
+		if (val <= rps->min)
+			limits |= rps->min << 14;
 	} else {
-		limits = rps->max_freq_softlimit << 24;
-		if (val <= rps->min_freq_softlimit)
-			limits |= rps->min_freq_softlimit << 16;
+		limits = rps->max << 24;
+		if (val <= rps->min)
+			limits |= rps->min << 16;
 	}
 
 	return limits;
@@ -200,30 +200,27 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 	new_power = rps->power;
 	switch (rps->power) {
 	case LOW_POWER:
-		if (val > rps->efficient_freq + 1 &&
-		    val > rps->cur_freq)
+		if (val > rps->efficient_freq + 1 && val > rps->freq)
 			new_power = BETWEEN;
 		break;
 
 	case BETWEEN:
-		if (val <= rps->efficient_freq &&
-		    val < rps->cur_freq)
+		if (val <= rps->efficient_freq && val < rps->freq)
 			new_power = LOW_POWER;
-		else if (val >= rps->rp0_freq &&
-			 val > rps->cur_freq)
+		else if (val >= rps->rp0_freq && val > rps->freq)
 			new_power = HIGH_POWER;
 		break;
 
 	case HIGH_POWER:
 		if (val < (rps->rp1_freq + rps->rp0_freq) >> 1 &&
-		    val < rps->cur_freq)
+		    val < rps->freq)
 			new_power = BETWEEN;
 		break;
 	}
 	/* Max/min bins are special */
-	if (val <= rps->min_freq_softlimit)
+	if (val <= rps->min)
 		new_power = LOW_POWER;
-	if (val >= rps->max_freq_softlimit)
+	if (val >= rps->max)
 		new_power = HIGH_POWER;
 	if (new_power == rps->power)
 		return;
@@ -306,12 +303,12 @@ static u32 gen6_rps_pm_mask(struct drm_i915_private *i915, u8 val)
 	u32 mask = 0;
 
 	/* We use UP_EI_EXPIRED interupts for both up/down in manual mode */
-	if (val > rps->min_freq_softlimit)
+	if (val > rps->min)
 		mask |= (GEN6_PM_RP_UP_EI_EXPIRED |
 			 GEN6_PM_RP_DOWN_THRESHOLD |
 			 GEN6_PM_RP_DOWN_TIMEOUT);
 
-	if (val < rps->max_freq_softlimit)
+	if (val < rps->max)
 		mask |= (GEN6_PM_RP_UP_EI_EXPIRED |
 			 GEN6_PM_RP_UP_THRESHOLD);
 
@@ -327,7 +324,7 @@ static u32 gen6_rps_pm_mask(struct drm_i915_private *i915, u8 val)
  */
 static int gen6_set_rps(struct drm_i915_private *dev_priv, u8 val)
 {
-	if (val != dev_priv->gt_pm.rps.cur_freq) {
+	if (val != dev_priv->gt_pm.rps.freq) {
 		if (INTEL_GEN(dev_priv) >= 9)
 			I915_WRITE(GEN6_RPNSWREQ, GEN9_FREQUENCY(val));
 		else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv))
@@ -359,7 +356,7 @@ static int valleyview_set_rps(struct drm_i915_private *dev_priv, u8 val)
 		      "Odd GPU freq value\n"))
 		val &= ~1;
 
-	if (val != dev_priv->gt_pm.rps.cur_freq) {
+	if (val != dev_priv->gt_pm.rps.freq) {
 		vlv_punit_get(dev_priv);
 		err = vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, val);
 		vlv_punit_put(dev_priv);
@@ -392,25 +389,28 @@ static int adjust_rps(struct drm_i915_private *i915, int freq, int adj)
 	lockdep_assert_held(&rps->lock);
 	GEM_BUG_ON(!rps->active);
 
-	min = rps->min_freq_softlimit;
-	max = rps->max_freq_softlimit;
+	min = rps->min_freq_user;
+	max = rps->max_freq_user;
 	if (atomic_read(&rps->num_waiters) && max < rps->boost_freq)
 		max = rps->boost_freq;
 
-	GEM_BUG_ON(min < rps->min_freq);
-	GEM_BUG_ON(max > rps->max_freq);
+	GEM_BUG_ON(min < rps->min_freq_hw);
+	GEM_BUG_ON(max > rps->max_freq_hw);
 	GEM_BUG_ON(max < min);
 
+	rps->min = min;
+	rps->max = max;
+
 	val = clamp(freq + adj, min, max);
 
 	err = __intel_set_rps(i915, val);
 	if (err)
 		return err;
 
-	if (val != rps->cur_freq) {
+	if (val != rps->freq) {
 		trace_intel_gpu_freq_change(intel_gpu_freq(i915, val));
-		rps->cur_freq = val;
 		rps->last_adj = val == freq + adj ? adj : 0;
+		rps->freq = val;
 	}
 
 	return 0;
@@ -602,7 +602,7 @@ static void intel_rps_work(struct work_struct *work)
 		goto unlock;
 
 	adj = rps->last_adj;
-	freq = rps->cur_freq;
+	freq = rps->freq;
 	if (client_boost && freq < rps->boost_freq) {
 		freq = rps->boost_freq;
 		adj = 0;
@@ -673,7 +673,7 @@ void intel_gt_pm_busy(struct drm_i915_private *dev_priv)
 	 * Use the user's desired frequency as a guide, but for better
 	 * performance, jump directly to RPe as our starting frequency.
 	 */
-	adjust_rps(dev_priv, max(rps->cur_freq, rps->efficient_freq), 0);
+	adjust_rps(dev_priv, max(rps->freq, rps->efficient_freq), 0);
 
 	if (INTEL_GEN(dev_priv) >= 6) {
 		memset(&rps->ei, 0, sizeof(rps->ei));
@@ -694,7 +694,7 @@ void intel_gt_pm_idle(struct drm_i915_private *dev_priv)
 
 	disable_rps_interrupts(dev_priv);
 
-	if (rps->cur_freq > rps->idle_freq) {
+	if (rps->freq > rps->idle_freq) {
 		/*
 		 * The punit delays the write of the frequency and voltage
 		 * until it determines the GPU is awake. During normal usage we
@@ -712,7 +712,7 @@ void intel_gt_pm_idle(struct drm_i915_private *dev_priv)
 		intel_uncore_forcewake_get(dev_priv, FORCEWAKE_MEDIA);
 		if (__intel_set_rps(dev_priv, rps->idle_freq))
 			DRM_DEBUG_DRIVER("Failed to set idle frequency\n");
-		rps->cur_freq = rps->idle_freq;
+		rps->freq = rps->idle_freq;
 		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_MEDIA);
 	}
 
@@ -758,7 +758,7 @@ void intel_rps_boost(struct i915_request *rq, struct intel_rps_client *client)
 	if (!boost)
 		return;
 
-	if (READ_ONCE(rps->cur_freq) < rps->boost_freq)
+	if (READ_ONCE(rps->freq) < rps->boost_freq)
 		schedule_work(&rps->work);
 
 	atomic_inc(client ? &client->boosts : &rps->boosts);
@@ -908,22 +908,22 @@ static void gen6_init_rps_frequencies(struct drm_i915_private *dev_priv)
 
 	/* All of these values are in units of 50MHz */
 
-	/* static values from HW: RP0 > RP1 > RPn (min_freq) */
+	/* static values from HW: RP0 > RP1 > RPn (min_freq_hw) */
 	if (IS_GEN9_LP(dev_priv)) {
 		u32 rp_state_cap = I915_READ(BXT_RP_STATE_CAP);
 
 		rps->rp0_freq = (rp_state_cap >> 16) & 0xff;
 		rps->rp1_freq = (rp_state_cap >>  8) & 0xff;
-		rps->min_freq = (rp_state_cap >>  0) & 0xff;
+		rps->min_freq_hw = (rp_state_cap >>  0) & 0xff;
 	} else {
 		u32 rp_state_cap = I915_READ(GEN6_RP_STATE_CAP);
 
 		rps->rp0_freq = (rp_state_cap >>  0) & 0xff;
 		rps->rp1_freq = (rp_state_cap >>  8) & 0xff;
-		rps->min_freq = (rp_state_cap >> 16) & 0xff;
+		rps->min_freq_hw = (rp_state_cap >> 16) & 0xff;
 	}
 	/* hw_max = RP0 until we check for overclocking */
-	rps->max_freq = rps->rp0_freq;
+	rps->max_freq_hw = rps->rp0_freq;
 
 	rps->efficient_freq = rps->rp1_freq;
 	if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv) ||
@@ -936,8 +936,8 @@ static void gen6_init_rps_frequencies(struct drm_i915_private *dev_priv)
 			rps->efficient_freq =
 				clamp_t(u8,
 					((ddcc_status >> 8) & 0xff),
-					rps->min_freq,
-					rps->max_freq);
+					rps->min_freq_hw,
+					rps->max_freq_hw);
 	}
 
 	if (IS_GEN9_BC(dev_priv) || IS_CANNONLAKE(dev_priv)) {
@@ -947,8 +947,8 @@ static void gen6_init_rps_frequencies(struct drm_i915_private *dev_priv)
 		 */
 		rps->rp0_freq *= GEN9_FREQ_SCALER;
 		rps->rp1_freq *= GEN9_FREQ_SCALER;
-		rps->min_freq *= GEN9_FREQ_SCALER;
-		rps->max_freq *= GEN9_FREQ_SCALER;
+		rps->min_freq_hw *= GEN9_FREQ_SCALER;
+		rps->max_freq_hw *= GEN9_FREQ_SCALER;
 		rps->efficient_freq *= GEN9_FREQ_SCALER;
 	}
 }
@@ -1124,8 +1124,8 @@ static void gen8_enable_rps(struct drm_i915_private *dev_priv)
 
 	/* Docs recommend 900MHz, and 300 MHz respectively */
 	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS,
-		   rps->max_freq_softlimit << 24 |
-		   rps->min_freq_softlimit << 16);
+		   rps->max_freq_hw << 24 |
+		   rps->min_freq_hw << 16);
 
 	I915_WRITE(GEN6_RP_UP_THRESHOLD, 7600000 / 128); /* 76ms busyness per EI, 90% */
 	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 31300000 / 128); /* 313ms busyness per EI, 70%*/
@@ -1257,7 +1257,7 @@ static void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
 
 	lockdep_assert_held(&rps->lock);
 
-	if (rps->max_freq <= rps->min_freq)
+	if (rps->max_freq_hw <= rps->min_freq_hw)
 		return;
 
 	policy = cpufreq_cpu_get(0);
@@ -1279,8 +1279,8 @@ static void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
 	/* convert DDR frequency from units of 266.6MHz to bandwidth */
 	min_ring_freq = mult_frac(min_ring_freq, 8, 3);
 
-	min_gpu_freq = rps->min_freq;
-	max_gpu_freq = rps->max_freq;
+	min_gpu_freq = rps->min_freq_hw;
+	max_gpu_freq = rps->max_freq_hw;
 	if (IS_GEN9_BC(dev_priv) || INTEL_GEN(dev_priv) > 10) {
 		/* Convert GT frequency to 50 HZ units */
 		min_gpu_freq /= GEN9_FREQ_SCALER;
@@ -1575,11 +1575,11 @@ static void valleyview_init_gt_powersave(struct drm_i915_private *i915)
 	}
 	DRM_DEBUG_DRIVER("DDR speed: %d MHz\n", i915->mem_freq);
 
-	rps->max_freq = valleyview_rps_max_freq(i915);
-	rps->rp0_freq = rps->max_freq;
+	rps->max_freq_hw = valleyview_rps_max_freq(i915);
+	rps->rp0_freq = rps->max_freq_hw;
 	DRM_DEBUG_DRIVER("max GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(i915, rps->max_freq),
-			 rps->max_freq);
+			 intel_gpu_freq(i915, rps->max_freq_hw),
+			 rps->max_freq_hw);
 
 	rps->efficient_freq = valleyview_rps_rpe_freq(i915);
 	DRM_DEBUG_DRIVER("RPe GPU freq: %d MHz (%u)\n",
@@ -1591,10 +1591,10 @@ static void valleyview_init_gt_powersave(struct drm_i915_private *i915)
 			 intel_gpu_freq(i915, rps->rp1_freq),
 			 rps->rp1_freq);
 
-	rps->min_freq = valleyview_rps_min_freq(i915);
+	rps->min_freq_hw = valleyview_rps_min_freq(i915);
 	DRM_DEBUG_DRIVER("min GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(i915, rps->min_freq),
-			 rps->min_freq);
+			 intel_gpu_freq(i915, rps->min_freq_hw),
+			 rps->min_freq_hw);
 
 	vlv_iosf_sb_put(i915,
 			BIT(VLV_IOSF_SB_PUNIT) |
@@ -1628,11 +1628,11 @@ static void cherryview_init_gt_powersave(struct drm_i915_private *i915)
 	}
 	DRM_DEBUG_DRIVER("DDR speed: %d MHz\n", i915->mem_freq);
 
-	rps->max_freq = cherryview_rps_max_freq(i915);
-	rps->rp0_freq = rps->max_freq;
+	rps->max_freq_hw = cherryview_rps_max_freq(i915);
+	rps->rp0_freq = rps->max_freq_hw;
 	DRM_DEBUG_DRIVER("max GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(i915, rps->max_freq),
-			 rps->max_freq);
+			 intel_gpu_freq(i915, rps->max_freq_hw),
+			 rps->max_freq_hw);
 
 	rps->efficient_freq = cherryview_rps_rpe_freq(i915);
 	DRM_DEBUG_DRIVER("RPe GPU freq: %d MHz (%u)\n",
@@ -1644,18 +1644,18 @@ static void cherryview_init_gt_powersave(struct drm_i915_private *i915)
 			 intel_gpu_freq(i915, rps->rp1_freq),
 			 rps->rp1_freq);
 
-	rps->min_freq = cherryview_rps_min_freq(i915);
+	rps->min_freq_hw = cherryview_rps_min_freq(i915);
 	DRM_DEBUG_DRIVER("min GPU freq: %d MHz (%u)\n",
-			 intel_gpu_freq(i915, rps->min_freq),
-			 rps->min_freq);
+			 intel_gpu_freq(i915, rps->min_freq_hw),
+			 rps->min_freq_hw);
 
 	vlv_iosf_sb_put(i915,
 			BIT(VLV_IOSF_SB_PUNIT) |
 			BIT(VLV_IOSF_SB_NC) |
 			BIT(VLV_IOSF_SB_CCK));
 
-	WARN_ONCE((rps->max_freq | rps->efficient_freq | rps->rp1_freq |
-		   rps->min_freq) & 1,
+	WARN_ONCE((rps->max_freq_hw | rps->efficient_freq | rps->rp1_freq |
+		   rps->min_freq_hw) & 1,
 		  "Odd GPU freq values\n");
 }
 
@@ -2035,7 +2035,7 @@ static unsigned long __i915_gfx_val(struct drm_i915_private *dev_priv)
 
 	lockdep_assert_held(&mchdev_lock);
 
-	pxvid = I915_READ(PXVFREQ(dev_priv->gt_pm.rps.cur_freq));
+	pxvid = I915_READ(PXVFREQ(dev_priv->gt_pm.rps.freq));
 	pxvid = (pxvid >> 24) & 0x7f;
 	ext_v = pvid_to_extvid(dev_priv, pxvid);
 
@@ -2388,14 +2388,13 @@ void intel_gt_pm_init(struct drm_i915_private *i915)
 		gen6_init_rps_frequencies(i915);
 
 	/* Derive initial user preferences/limits from the hardware limits */
-	rps->idle_freq = rps->min_freq;
-	rps->cur_freq = rps->idle_freq;
+	rps->idle_freq = rps->min_freq_hw;
 
-	rps->max_freq_softlimit = rps->max_freq;
-	rps->min_freq_softlimit = rps->min_freq;
+	rps->max_freq_user = rps->max_freq_hw;
+	rps->min_freq_user = rps->min_freq_hw;
 
 	if (IS_HASWELL(i915) || IS_BROADWELL(i915))
-		rps->min_freq_softlimit =
+		rps->min_freq_user =
 			max_t(int,
 			      rps->efficient_freq,
 			      intel_freq_opcode(i915, 450));
@@ -2407,14 +2406,18 @@ void intel_gt_pm_init(struct drm_i915_private *i915)
 		sandybridge_pcode_read(i915, GEN6_READ_OC_PARAMS, &params);
 		if (params & BIT(31)) { /* OC supported */
 			DRM_DEBUG_DRIVER("Overclocking supported, max: %dMHz, overclock: %dMHz\n",
-					 (rps->max_freq & 0xff) * 50,
+					 (rps->max_freq_hw & 0xff) * 50,
 					 (params & 0xff) * 50);
-			rps->max_freq = params & 0xff;
+			rps->max_freq_hw = params & 0xff;
 		}
 	}
 
 	/* Finally allow us to boost to max by default */
-	rps->boost_freq = rps->max_freq;
+	rps->boost_freq = rps->max_freq_hw;
+
+	rps->freq = rps->idle_freq;
+	rps->min = rps->min_freq_hw;
+	rps->max = rps->max_freq_hw;
 
 	mutex_unlock(&rps->lock);
 }
@@ -2466,18 +2469,18 @@ static void __enable_rps(struct drm_i915_private *i915)
 		intel_init_emon(i915);
 	}
 
-	WARN_ON(rps->max_freq < rps->min_freq);
-	WARN_ON(rps->idle_freq > rps->max_freq);
+	WARN_ON(rps->max_freq_hw < rps->min_freq_hw);
+	WARN_ON(rps->idle_freq > rps->max_freq_hw);
 
-	WARN_ON(rps->efficient_freq < rps->min_freq);
-	WARN_ON(rps->efficient_freq > rps->max_freq);
+	WARN_ON(rps->efficient_freq < rps->min_freq_hw);
+	WARN_ON(rps->efficient_freq > rps->max_freq_hw);
 
 	/* Force a reset */
-	rps->cur_freq = rps->max_freq;
+	rps->freq = rps->max_freq_hw;
 	rps->power = -1;
 	__intel_set_rps(i915, rps->idle_freq);
 
-	rps->cur_freq = rps->idle_freq;
+	rps->freq = rps->idle_freq;
 }
 
 void intel_gt_pm_enable_rps(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
index 66818828930d..db67d81ae51a 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -41,16 +41,22 @@ struct intel_rps {
 	 * default, and is considered to be above the hard limit if it's
 	 * possible at all.
 	 */
-	u8 cur_freq;		/* Current frequency (cached, may not == HW) */
-	u8 min_freq_softlimit;	/* Minimum frequency permitted by the driver */
-	u8 max_freq_softlimit;	/* Max frequency permitted by the driver */
-	u8 max_freq;		/* Maximum frequency, RP0 if not overclocking */
-	u8 min_freq;		/* AKA RPn. Minimum frequency */
-	u8 boost_freq;		/* Frequency to request when wait boosting */
+	u8 freq;		/* Current frequency (cached, may not == HW) */
+	u8 min;
+	u8 max;
+
+	u8 min_freq_hw;		/* AKA RPn. Minimum frequency */
+	u8 max_freq_hw;		/* Maximum frequency, RP0 if not overclocking */
+	u8 min_freq_user;	/* Minimum frequency permitted by the driver */
+	u8 max_freq_user;	/* Max frequency permitted by the driver */
+
 	u8 idle_freq;		/* Frequency to request when we are idle */
 	u8 efficient_freq;	/* AKA RPe. Pre-determined balanced frequency */
+	u8 boost_freq;		/* Frequency to request when wait boosting */
+
 	u8 rp1_freq;		/* "less than" RP0 power/freqency */
 	u8 rp0_freq;		/* Non-overclocked max frequency. */
+
 	u16 gpll_ref_freq;	/* vlv/chv GPLL reference frequency */
 
 	u8 up_threshold; /* Current %busy required to uplock */
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 51/71] drm/i915: Pull IPS into GT power management
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (48 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 50/71] drm/i915: Rename rps min/max frequencies Chris Wilson
@ 2018-05-03  6:37 ` Chris Wilson
  2018-05-03 10:13 ` [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Lionel Landwerlin
  2018-05-03 16:37 ` Tvrtko Ursulin
  51 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  6:37 UTC (permalink / raw)
  To: intel-gfx

IPS was the precursor to RPS on Ironlake. It serves the same function,
and so should be pulled under the intel_gt_pm umbrella.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h    | 27 ----------
 drivers/gpu/drm/i915/i915_irq.c    | 21 ++++----
 drivers/gpu/drm/i915/intel_gt_pm.c | 83 +++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_gt_pm.h | 27 ++++++++++
 drivers/gpu/drm/i915/intel_pm.c    |  8 +--
 5 files changed, 88 insertions(+), 78 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 21f56684028f..544ef802bc56 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -730,28 +730,6 @@ struct vlv_s0ix_state {
 	u32 clock_gate_dis2;
 };
 
-/* defined intel_pm.c */
-extern spinlock_t mchdev_lock;
-
-struct intel_ilk_power_mgmt {
-	u8 cur_delay;
-	u8 min_delay;
-	u8 max_delay;
-	u8 fmax;
-	u8 fstart;
-
-	u64 last_count1;
-	unsigned long last_time1;
-	unsigned long chipset_power;
-	u64 last_count2;
-	u64 last_time2;
-	unsigned long gfx_power;
-	u8 corr;
-
-	int c_m;
-	int r_t;
-};
-
 struct drm_i915_private;
 struct i915_power_well;
 
@@ -1715,13 +1693,8 @@ struct drm_i915_private {
 	/* Cannot be determined by PCIID. You must always read a register. */
 	u32 edram_cap;
 
-	/* gen6+ GT PM state */
 	struct intel_gt_pm gt_pm;
 
-	/* ilk-only ips/rps state. Everything in here is protected by the global
-	 * mchdev_lock in intel_pm.c */
-	struct intel_ilk_power_mgmt ips;
-
 	struct i915_power_domains power_domains;
 
 	struct i915_psr psr;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 155de01ea756..c3ddba3418b3 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -887,6 +887,7 @@ int intel_get_crtc_scanline(struct intel_crtc *crtc)
 
 static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 {
+	struct intel_ips *ips = &dev_priv->gt_pm.ips;
 	u32 busy_up, busy_down, max_avg, min_avg;
 	u8 new_delay;
 
@@ -894,7 +895,7 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 
 	I915_WRITE16(MEMINTRSTS, I915_READ(MEMINTRSTS));
 
-	new_delay = dev_priv->ips.cur_delay;
+	new_delay = ips->cur_delay;
 
 	I915_WRITE16(MEMINTRSTS, MEMINT_EVAL_CHG);
 	busy_up = I915_READ(RCPREVBSYTUPAVG);
@@ -904,19 +905,19 @@ static void ironlake_rps_change_irq_handler(struct drm_i915_private *dev_priv)
 
 	/* Handle RCS change request from hw */
 	if (busy_up > max_avg) {
-		if (dev_priv->ips.cur_delay != dev_priv->ips.max_delay)
-			new_delay = dev_priv->ips.cur_delay - 1;
-		if (new_delay < dev_priv->ips.max_delay)
-			new_delay = dev_priv->ips.max_delay;
+		if (ips->cur_delay != ips->max_delay)
+			new_delay = ips->cur_delay - 1;
+		if (new_delay < ips->max_delay)
+			new_delay = ips->max_delay;
 	} else if (busy_down < min_avg) {
-		if (dev_priv->ips.cur_delay != dev_priv->ips.min_delay)
-			new_delay = dev_priv->ips.cur_delay + 1;
-		if (new_delay > dev_priv->ips.min_delay)
-			new_delay = dev_priv->ips.min_delay;
+		if (ips->cur_delay != ips->min_delay)
+			new_delay = ips->cur_delay + 1;
+		if (new_delay > ips->min_delay)
+			new_delay = ips->min_delay;
 	}
 
 	if (ironlake_set_drps(dev_priv, new_delay))
-		dev_priv->ips.cur_delay = new_delay;
+		ips->cur_delay = new_delay;
 
 	spin_unlock(&mchdev_lock);
 
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.c b/drivers/gpu/drm/i915/intel_gt_pm.c
index f71c39e528cc..f15342889ed4 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/intel_gt_pm.c
@@ -66,6 +66,7 @@ bool ironlake_set_drps(struct drm_i915_private *dev_priv, u8 val)
 
 static void ironlake_enable_drps(struct drm_i915_private *dev_priv)
 {
+	struct intel_ips *ips = &dev_priv->gt_pm.ips;
 	u32 rgvmodectl;
 	u8 fmax, fmin, fstart, vstart;
 
@@ -96,12 +97,12 @@ static void ironlake_enable_drps(struct drm_i915_private *dev_priv)
 	vstart = (I915_READ(PXVFREQ(fstart)) & PXVFREQ_PX_MASK) >>
 		PXVFREQ_PX_SHIFT;
 
-	dev_priv->ips.fmax = fmax; /* IPS callback will increase this */
-	dev_priv->ips.fstart = fstart;
+	ips->fmax = fmax; /* IPS callback will increase this */
+	ips->fstart = fstart;
 
-	dev_priv->ips.max_delay = fstart;
-	dev_priv->ips.min_delay = fmin;
-	dev_priv->ips.cur_delay = fstart;
+	ips->max_delay = fstart;
+	ips->min_delay = fmin;
+	ips->cur_delay = fstart;
 
 	DRM_DEBUG_DRIVER("fmax: %d, fmin: %d, fstart: %d\n",
 			 fmax, fmin, fstart);
@@ -124,11 +125,11 @@ static void ironlake_enable_drps(struct drm_i915_private *dev_priv)
 
 	ironlake_set_drps(dev_priv, fstart);
 
-	dev_priv->ips.last_count1 = I915_READ(DMIEC) +
-		I915_READ(DDREC) + I915_READ(CSIEC);
-	dev_priv->ips.last_time1 = jiffies_to_msecs(jiffies);
-	dev_priv->ips.last_count2 = I915_READ(GFXEC);
-	dev_priv->ips.last_time2 = ktime_get_raw_ns();
+	ips->last_count1 =
+		I915_READ(DMIEC) + I915_READ(DDREC) + I915_READ(CSIEC);
+	ips->last_time1 = jiffies_to_msecs(jiffies);
+	ips->last_count2 = I915_READ(GFXEC);
+	ips->last_time2 = ktime_get_raw_ns();
 
 	spin_unlock_irq(&mchdev_lock);
 }
@@ -149,7 +150,7 @@ static void ironlake_disable_drps(struct drm_i915_private *dev_priv)
 	I915_WRITE(DEIMR, I915_READ(DEIMR) | DE_PCU_EVENT);
 
 	/* Go back to the starting frequency */
-	ironlake_set_drps(dev_priv, dev_priv->ips.fstart);
+	ironlake_set_drps(dev_priv, dev_priv->gt_pm.ips.fstart);
 	mdelay(1);
 	rgvswctl |= MEMCTL_CMD_STS;
 	I915_WRITE(MEMSWCTL, rgvswctl);
@@ -1873,6 +1874,7 @@ static const struct cparams {
 
 static unsigned long __i915_chipset_val(struct drm_i915_private *dev_priv)
 {
+	struct intel_ips *ips = &dev_priv->gt_pm.ips;
 	u64 total_count, diff, ret;
 	u32 count1, count2, count3, m = 0, c = 0;
 	unsigned long now = jiffies_to_msecs(jiffies), diff1;
@@ -1880,7 +1882,7 @@ static unsigned long __i915_chipset_val(struct drm_i915_private *dev_priv)
 
 	lockdep_assert_held(&mchdev_lock);
 
-	diff1 = now - dev_priv->ips.last_time1;
+	diff1 = now - ips->last_time1;
 
 	/*
 	 * Prevent division-by-zero if we are asking too fast.
@@ -1889,7 +1891,7 @@ static unsigned long __i915_chipset_val(struct drm_i915_private *dev_priv)
 	 * in such cases.
 	 */
 	if (diff1 <= 10)
-		return dev_priv->ips.chipset_power;
+		return ips->chipset_power;
 
 	count1 = I915_READ(DMIEC);
 	count2 = I915_READ(DDREC);
@@ -1898,16 +1900,15 @@ static unsigned long __i915_chipset_val(struct drm_i915_private *dev_priv)
 	total_count = count1 + count2 + count3;
 
 	/* FIXME: handle per-counter overflow */
-	if (total_count < dev_priv->ips.last_count1) {
-		diff = ~0UL - dev_priv->ips.last_count1;
+	if (total_count < ips->last_count1) {
+		diff = ~0UL - ips->last_count1;
 		diff += total_count;
 	} else {
-		diff = total_count - dev_priv->ips.last_count1;
+		diff = total_count - ips->last_count1;
 	}
 
 	for (i = 0; i < ARRAY_SIZE(cparams); i++) {
-		if (cparams[i].i == dev_priv->ips.c_m &&
-		    cparams[i].t == dev_priv->ips.r_t) {
+		if (cparams[i].i == ips->c_m && cparams[i].t == ips->r_t) {
 			m = cparams[i].m;
 			c = cparams[i].c;
 			break;
@@ -1918,10 +1919,10 @@ static unsigned long __i915_chipset_val(struct drm_i915_private *dev_priv)
 	ret = ((m * diff) + c);
 	ret = div_u64(ret, 10);
 
-	dev_priv->ips.last_count1 = total_count;
-	dev_priv->ips.last_time1 = now;
+	ips->last_count1 = total_count;
+	ips->last_time1 = now;
 
-	dev_priv->ips.chipset_power = ret;
+	ips->chipset_power = ret;
 
 	return ret;
 }
@@ -1983,13 +1984,14 @@ static u32 pvid_to_extvid(struct drm_i915_private *i915, u8 pxvid)
 
 static void __i915_update_gfx_val(struct drm_i915_private *dev_priv)
 {
+	struct intel_ips *ips = &dev_priv->gt_pm.ips;
 	u64 now, diff, diffms;
 	u32 count;
 
 	lockdep_assert_held(&mchdev_lock);
 
 	now = ktime_get_raw_ns();
-	diffms = now - dev_priv->ips.last_time2;
+	diffms = now - ips->last_time2;
 	do_div(diffms, NSEC_PER_MSEC);
 
 	/* Don't divide by 0 */
@@ -1998,20 +2000,20 @@ static void __i915_update_gfx_val(struct drm_i915_private *dev_priv)
 
 	count = I915_READ(GFXEC);
 
-	if (count < dev_priv->ips.last_count2) {
-		diff = ~0UL - dev_priv->ips.last_count2;
+	if (count < ips->last_count2) {
+		diff = ~0UL - ips->last_count2;
 		diff += count;
 	} else {
-		diff = count - dev_priv->ips.last_count2;
+		diff = count - ips->last_count2;
 	}
 
-	dev_priv->ips.last_count2 = count;
-	dev_priv->ips.last_time2 = now;
+	ips->last_count2 = count;
+	ips->last_time2 = now;
 
 	/* More magic constants... */
 	diff = diff * 1181;
 	diff = div_u64(diff, diffms * 10);
-	dev_priv->ips.gfx_power = diff;
+	ips->gfx_power = diff;
 }
 
 void i915_update_gfx_val(struct drm_i915_private *i915)
@@ -2030,6 +2032,7 @@ void i915_update_gfx_val(struct drm_i915_private *i915)
 
 static unsigned long __i915_gfx_val(struct drm_i915_private *dev_priv)
 {
+	struct intel_ips *ips = &dev_priv->gt_pm.ips;
 	unsigned long t, corr, state1, corr2, state2;
 	u32 pxvid, ext_v;
 
@@ -2055,14 +2058,14 @@ static unsigned long __i915_gfx_val(struct drm_i915_private *dev_priv)
 
 	corr = corr * ((150142 * state1) / 10000 - 78642);
 	corr /= 100000;
-	corr2 = (corr * dev_priv->ips.corr);
+	corr2 = (corr * ips->corr);
 
 	state2 = (corr2 * state1) / 10000;
 	state2 /= 100; /* convert to mW */
 
 	__i915_update_gfx_val(dev_priv);
 
-	return dev_priv->ips.gfx_power + state2;
+	return ips->gfx_power + state2;
 }
 
 unsigned long i915_gfx_val(struct drm_i915_private *i915)
@@ -2133,14 +2136,17 @@ EXPORT_SYMBOL_GPL(i915_read_mch_val);
 bool i915_gpu_raise(void)
 {
 	struct drm_i915_private *i915;
+	struct intel_ips *ips;
 
 	i915 = mchdev_get();
 	if (!i915)
 		return false;
 
+	ips = &i915->gt_pm.ips;
+
 	spin_lock_irq(&mchdev_lock);
-	if (i915->ips.max_delay > i915->ips.fmax)
-		i915->ips.max_delay--;
+	if (ips->max_delay > ips->fmax)
+		ips->max_delay--;
 	spin_unlock_irq(&mchdev_lock);
 
 	drm_dev_put(&i915->drm);
@@ -2157,14 +2163,17 @@ EXPORT_SYMBOL_GPL(i915_gpu_raise);
 bool i915_gpu_lower(void)
 {
 	struct drm_i915_private *i915;
+	struct intel_ips *ips;
 
 	i915 = mchdev_get();
 	if (!i915)
 		return false;
 
+	ips = &i915->gt_pm.ips;
+
 	spin_lock_irq(&mchdev_lock);
-	if (i915->ips.max_delay < i915->ips.min_delay)
-		i915->ips.max_delay++;
+	if (ips->max_delay < ips->min_delay)
+		ips->max_delay++;
 	spin_unlock_irq(&mchdev_lock);
 
 	drm_dev_put(&i915->drm);
@@ -2209,8 +2218,8 @@ bool i915_gpu_turbo_disable(void)
 		return false;
 
 	spin_lock_irq(&mchdev_lock);
-	i915->ips.max_delay = i915->ips.fstart;
-	ret = ironlake_set_drps(i915, i915->ips.fstart);
+	i915->gt_pm.ips.max_delay = i915->gt_pm.ips.fstart;
+	ret = ironlake_set_drps(i915, i915->gt_pm.ips.fstart);
 	spin_unlock_irq(&mchdev_lock);
 
 	drm_dev_put(&i915->drm);
@@ -2322,7 +2331,7 @@ static void intel_init_emon(struct drm_i915_private *dev_priv)
 
 	lcfuse = I915_READ(LCFUSE02);
 
-	dev_priv->ips.corr = (lcfuse & LCFUSE_HIV_MASK);
+	dev_priv->gt_pm.ips.corr = (lcfuse & LCFUSE_HIV_MASK);
 }
 
 void intel_gt_pm_sanitize(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/intel_gt_pm.h b/drivers/gpu/drm/i915/intel_gt_pm.h
index db67d81ae51a..23436bb213dd 100644
--- a/drivers/gpu/drm/i915/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/intel_gt_pm.h
@@ -72,6 +72,28 @@ struct intel_rps {
 	struct intel_rps_ei ei;
 };
 
+/* defined intel_gt_pm.c */
+extern spinlock_t mchdev_lock;
+
+struct intel_ips {
+	u8 cur_delay;
+	u8 min_delay;
+	u8 max_delay;
+	u8 fmax;
+	u8 fstart;
+
+	u64 last_count1;
+	unsigned long last_time1;
+	unsigned long chipset_power;
+	u64 last_count2;
+	u64 last_time2;
+	unsigned long gfx_power;
+	u8 corr;
+
+	int c_m;
+	int r_t;
+};
+
 struct intel_rc6 {
 	u64 prev_hw_residency[4];
 	u64 cur_residency[4];
@@ -80,6 +102,11 @@ struct intel_rc6 {
 struct intel_gt_pm {
 	struct intel_rc6 rc6;
 	struct intel_rps rps;
+	/*
+	 * ilk-only ips/rps state. Everything in here is protected by the
+	 * global mchdev_lock in intel_gt_pm.c
+	 */
+	struct intel_ips ips;
 
 	u32 imr;
 	u32 ier;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index b1f33c9c5f57..1cc9e312b1d0 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -186,7 +186,7 @@ static void i915_ironlake_get_mem_freq(struct drm_i915_private *dev_priv)
 		break;
 	}
 
-	dev_priv->ips.r_t = dev_priv->mem_freq;
+	dev_priv->gt_pm.ips.r_t = dev_priv->mem_freq;
 
 	switch (csipll & 0x3ff) {
 	case 0x00c:
@@ -218,11 +218,11 @@ static void i915_ironlake_get_mem_freq(struct drm_i915_private *dev_priv)
 	}
 
 	if (dev_priv->fsb_freq == 3200) {
-		dev_priv->ips.c_m = 0;
+		dev_priv->gt_pm.ips.c_m = 0;
 	} else if (dev_priv->fsb_freq > 3200 && dev_priv->fsb_freq <= 4800) {
-		dev_priv->ips.c_m = 1;
+		dev_priv->gt_pm.ips.c_m = 1;
 	} else {
-		dev_priv->ips.c_m = 2;
+		dev_priv->gt_pm.ips.c_m = 2;
 	}
 }
 
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 38/71] drm/i915: Enable render context support for Ironlake (gen5)
  2018-05-03  6:37 ` [PATCH 38/71] drm/i915: Enable render context support for Ironlake (gen5) Chris Wilson
@ 2018-05-03  8:47   ` Chris Wilson
  0 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03  8:47 UTC (permalink / raw)
  To: intel-gfx; +Cc: Kenneth Graunke

Quoting Chris Wilson (2018-05-03 07:37:24)
> Ironlake does support being able to saving and reloading context specific
> registers between contexts, providing isolation of the basic GPU state
> (as programmable by userspace). This allows userspace to assume that the
> GPU retains their state from one batch to the next, minimising the
> amount of state it needs to reload, or manually save and restore.
> 
> v2: Fix off-by-one in reading CXT_SIZE, and add a comment that the
> CXT_SIZE and context-layout do not match in bspec, but the difference is
> irrelevant as we overallocate the full page anyway (Ville).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Kenneth has landed the mesa component of enabling logical contexts for
gen4/gen5, thanks! So this looks ready to attempt a landing...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (49 preceding siblings ...)
  2018-05-03  6:37 ` [PATCH 51/71] drm/i915: Pull IPS into GT power management Chris Wilson
@ 2018-05-03 10:13 ` Lionel Landwerlin
  2018-05-03 10:18   ` Chris Wilson
  2018-05-03 16:37 ` Tvrtko Ursulin
  51 siblings, 1 reply; 65+ messages in thread
From: Lionel Landwerlin @ 2018-05-03 10:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 03/05/18 07:36, Chris Wilson wrote:
> Limit the arbitration (where preemption may occur) to inside the batch,
> and prevent it from happening on the pipecontrols/flushes we use to
> write the breadcrumb seqno. Once the user batch is complete, we have
> nothing left to do but serialise and emit the breadcrumb; switching
> contexts at this point is futile so don't.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e04798e98db2..70b722c36e65 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1934,7 +1934,7 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>   		rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
>   	}
>   
> -	cs = intel_ring_begin(rq, 4);
> +	cs = intel_ring_begin(rq, 6);
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> @@ -1963,6 +1963,9 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>   		(flags & I915_DISPATCH_RS ? MI_BATCH_RESOURCE_STREAMER : 0);
>   	*cs++ = lower_32_bits(offset);
>   	*cs++ = upper_32_bits(offset);
> +
> +	*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
> +	*cs++ = MI_NOOP;
>   	intel_ring_advance(rq, cs);
>   
>   	return 0;
> @@ -2105,7 +2108,7 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
>   	cs = gen8_emit_ggtt_write(cs, request->global_seqno,
>   				  intel_hws_seqno_address(request->engine));
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
> +	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>   	request->tail = intel_ring_offset(request, cs);
>   	assert_ring_tail_valid(request->ring, request->tail);
>   
> @@ -2121,7 +2124,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   	cs = gen8_emit_ggtt_write_rcs(cs, request->global_seqno,
>   				      intel_hws_seqno_address(request->engine));
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
> +	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>   	request->tail = intel_ring_offset(request, cs);
>   	assert_ring_tail_valid(request->ring, request->tail);
>   

Looks good to me. Just one question, you're adding a NOOP in one place, 
dropping it in the other 2. What the rational?

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring
  2018-05-03 10:13 ` [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Lionel Landwerlin
@ 2018-05-03 10:18   ` Chris Wilson
  2018-05-03 10:28     ` Lionel Landwerlin
  0 siblings, 1 reply; 65+ messages in thread
From: Chris Wilson @ 2018-05-03 10:18 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-05-03 11:13:05)
> On 03/05/18 07:36, Chris Wilson wrote:
> > Limit the arbitration (where preemption may occur) to inside the batch,
> > and prevent it from happening on the pipecontrols/flushes we use to
> > write the breadcrumb seqno. Once the user batch is complete, we have
> > nothing left to do but serialise and emit the breadcrumb; switching
> > contexts at this point is futile so don't.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michał Winiarski <michal.winiarski@intel.com>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 9 ++++++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index e04798e98db2..70b722c36e65 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -1934,7 +1934,7 @@ static int gen8_emit_bb_start(struct i915_request *rq,
> >               rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
> >       }
> >   
> > -     cs = intel_ring_begin(rq, 4);
> > +     cs = intel_ring_begin(rq, 6);
> >       if (IS_ERR(cs))
> >               return PTR_ERR(cs);
> >   
> > @@ -1963,6 +1963,9 @@ static int gen8_emit_bb_start(struct i915_request *rq,
> >               (flags & I915_DISPATCH_RS ? MI_BATCH_RESOURCE_STREAMER : 0);
> >       *cs++ = lower_32_bits(offset);
> >       *cs++ = upper_32_bits(offset);
> > +
> > +     *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
> > +     *cs++ = MI_NOOP;
> >       intel_ring_advance(rq, cs);
> >   
> >       return 0;
> > @@ -2105,7 +2108,7 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
> >       cs = gen8_emit_ggtt_write(cs, request->global_seqno,
> >                                 intel_hws_seqno_address(request->engine));
> >       *cs++ = MI_USER_INTERRUPT;
> > -     *cs++ = MI_NOOP;
> > +     *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> >       request->tail = intel_ring_offset(request, cs);
> >       assert_ring_tail_valid(request->ring, request->tail);
> >   
> > @@ -2121,7 +2124,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
> >       cs = gen8_emit_ggtt_write_rcs(cs, request->global_seqno,
> >                                     intel_hws_seqno_address(request->engine));
> >       *cs++ = MI_USER_INTERRUPT;
> > -     *cs++ = MI_NOOP;
> > +     *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> >       request->tail = intel_ring_offset(request, cs);
> >       assert_ring_tail_valid(request->ring, request->tail);
> >   
> 
> Looks good to me. Just one question, you're adding a NOOP in one place, 
> dropping it in the other 2. What the rational?

Writes into the ring have to be in multiples of 2 dwords. It doesn't
strictly, as only the RING_TAIL has to be qword aligned and we could fix
up the odd dword at the end, but for now the code insists on every
packet being an even number of dwords, hence padding with NOPs.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring
  2018-05-03 10:18   ` Chris Wilson
@ 2018-05-03 10:28     ` Lionel Landwerlin
  2018-05-03 10:38       ` Chris Wilson
  0 siblings, 1 reply; 65+ messages in thread
From: Lionel Landwerlin @ 2018-05-03 10:28 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 03/05/18 11:18, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2018-05-03 11:13:05)
>> On 03/05/18 07:36, Chris Wilson wrote:
>>> Limit the arbitration (where preemption may occur) to inside the batch,
>>> and prevent it from happening on the pipecontrols/flushes we use to
>>> write the breadcrumb seqno. Once the user batch is complete, we have
>>> nothing left to do but serialise and emit the breadcrumb; switching
>>> contexts at this point is futile so don't.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Michał Winiarski <michal.winiarski@intel.com>
>>> Cc: Michel Thierry <michel.thierry@intel.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/intel_lrc.c | 9 ++++++---
>>>    1 file changed, 6 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index e04798e98db2..70b722c36e65 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -1934,7 +1934,7 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>>>                rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
>>>        }
>>>    
>>> -     cs = intel_ring_begin(rq, 4);
>>> +     cs = intel_ring_begin(rq, 6);
>>>        if (IS_ERR(cs))
>>>                return PTR_ERR(cs);
>>>    
>>> @@ -1963,6 +1963,9 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>>>                (flags & I915_DISPATCH_RS ? MI_BATCH_RESOURCE_STREAMER : 0);
>>>        *cs++ = lower_32_bits(offset);
>>>        *cs++ = upper_32_bits(offset);
>>> +
>>> +     *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
>>> +     *cs++ = MI_NOOP;
>>>        intel_ring_advance(rq, cs);
>>>    
>>>        return 0;
>>> @@ -2105,7 +2108,7 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
>>>        cs = gen8_emit_ggtt_write(cs, request->global_seqno,
>>>                                  intel_hws_seqno_address(request->engine));
>>>        *cs++ = MI_USER_INTERRUPT;
>>> -     *cs++ = MI_NOOP;
>>> +     *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>>>        request->tail = intel_ring_offset(request, cs);
>>>        assert_ring_tail_valid(request->ring, request->tail);
>>>    
>>> @@ -2121,7 +2124,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>>>        cs = gen8_emit_ggtt_write_rcs(cs, request->global_seqno,
>>>                                      intel_hws_seqno_address(request->engine));
>>>        *cs++ = MI_USER_INTERRUPT;
>>> -     *cs++ = MI_NOOP;
>>> +     *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>>>        request->tail = intel_ring_offset(request, cs);
>>>        assert_ring_tail_valid(request->ring, request->tail);
>>>    
>> Looks good to me. Just one question, you're adding a NOOP in one place,
>> dropping it in the other 2. What the rational?
> Writes into the ring have to be in multiples of 2 dwords. It doesn't
> strictly, as only the RING_TAIL has to be qword aligned and we could fix
> up the odd dword at the end, but for now the code insists on every
> packet being an even number of dwords, hence padding with NOPs.
> -Chris
>
Thanks,

Would it make sense to add a GEM_BUG_ON() in intel_ring_advance() maybe?

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring
  2018-05-03 10:28     ` Lionel Landwerlin
@ 2018-05-03 10:38       ` Chris Wilson
  0 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03 10:38 UTC (permalink / raw)
  To: Lionel Landwerlin, intel-gfx

Quoting Lionel Landwerlin (2018-05-03 11:28:42)
> On 03/05/18 11:18, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2018-05-03 11:13:05)
> >> On 03/05/18 07:36, Chris Wilson wrote:
> >>> Limit the arbitration (where preemption may occur) to inside the batch,
> >>> and prevent it from happening on the pipecontrols/flushes we use to
> >>> write the breadcrumb seqno. Once the user batch is complete, we have
> >>> nothing left to do but serialise and emit the breadcrumb; switching
> >>> contexts at this point is futile so don't.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Michał Winiarski <michal.winiarski@intel.com>
> >>> Cc: Michel Thierry <michel.thierry@intel.com>
> >>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/intel_lrc.c | 9 ++++++---
> >>>    1 file changed, 6 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >>> index e04798e98db2..70b722c36e65 100644
> >>> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>> @@ -1934,7 +1934,7 @@ static int gen8_emit_bb_start(struct i915_request *rq,
> >>>                rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
> >>>        }
> >>>    
> >>> -     cs = intel_ring_begin(rq, 4);
> >>> +     cs = intel_ring_begin(rq, 6);
> >>>        if (IS_ERR(cs))
> >>>                return PTR_ERR(cs);
> >>>    
> >>> @@ -1963,6 +1963,9 @@ static int gen8_emit_bb_start(struct i915_request *rq,
> >>>                (flags & I915_DISPATCH_RS ? MI_BATCH_RESOURCE_STREAMER : 0);
> >>>        *cs++ = lower_32_bits(offset);
> >>>        *cs++ = upper_32_bits(offset);
> >>> +
> >>> +     *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
> >>> +     *cs++ = MI_NOOP;
> >>>        intel_ring_advance(rq, cs);
> >>>    
> >>>        return 0;
> >>> @@ -2105,7 +2108,7 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
> >>>        cs = gen8_emit_ggtt_write(cs, request->global_seqno,
> >>>                                  intel_hws_seqno_address(request->engine));
> >>>        *cs++ = MI_USER_INTERRUPT;
> >>> -     *cs++ = MI_NOOP;
> >>> +     *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> >>>        request->tail = intel_ring_offset(request, cs);
> >>>        assert_ring_tail_valid(request->ring, request->tail);
> >>>    
> >>> @@ -2121,7 +2124,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
> >>>        cs = gen8_emit_ggtt_write_rcs(cs, request->global_seqno,
> >>>                                      intel_hws_seqno_address(request->engine));
> >>>        *cs++ = MI_USER_INTERRUPT;
> >>> -     *cs++ = MI_NOOP;
> >>> +     *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
> >>>        request->tail = intel_ring_offset(request, cs);
> >>>        assert_ring_tail_valid(request->ring, request->tail);
> >>>    
> >> Looks good to me. Just one question, you're adding a NOOP in one place,
> >> dropping it in the other 2. What the rational?
> > Writes into the ring have to be in multiples of 2 dwords. It doesn't
> > strictly, as only the RING_TAIL has to be qword aligned and we could fix
> > up the odd dword at the end, but for now the code insists on every
> > packet being an even number of dwords, hence padding with NOPs.
> > -Chris
> >
> Thanks,
> 
> Would it make sense to add a GEM_BUG_ON() in intel_ring_advance() maybe?

It's in intel_ring_begin(). intel_ring_advance() just makes sure you
wrote exactly the number of dwords you said you would.

What we can do it preallocate that extra padding in intel_ring_begin()
and to the alignment at intel_ring_get_tail(). It's a bit of a fiddle
for the odd dword saved, but easy enough and we should lose any of our
sanity checks.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/71] drm/i915: Detect if we missed kicking the execlists tasklet
  2018-05-03  6:36 ` [PATCH 06/71] drm/i915: Detect if we missed kicking the execlists tasklet Chris Wilson
@ 2018-05-03 13:08   ` Chris Wilson
  0 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-03 13:08 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-05-03 07:36:52)
> If inside hangcheck we see that the engine has paused, but there is an
> execlists interrupt still pending, we know that the tasklet did not
> fire. Dump the GEM trace along with the current engine state, and kick
> the tasklet to recovery without having to go through a GPU reset.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Oh I thought I sent this earlier, but I appear not to have.
Please review as it can go in all by itself...

>  drivers/gpu/drm/i915/intel_hangcheck.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index 309e38b00e95..2d7f10492e35 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -267,6 +267,29 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
>                 }
>         }
>  
> +       if (test_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted)) {
> +               struct intel_engine_execlists *execlists = &engine->execlists;
> +               enum intel_engine_hangcheck_action ret = ENGINE_WAIT;
> +
> +               if (GEM_SHOW_DEBUG()) {
> +                       struct drm_printer p = drm_debug_printer("hangcheck");
> +
> +                       GEM_TRACE_DUMP();
> +                       intel_engine_dump(engine, &p,
> +                                         "%s stuck\n", engine->name);
> +               }
> +
> +               if (tasklet_trylock(&execlists->tasklet)) {
> +                       execlists->tasklet.func(execlists->tasklet.data);
> +                       tasklet_unlock(&execlists->tasklet);
> +
> +                       ret = ENGINE_WAIT_KICK;
> +               }
> +
> +               tasklet_hi_schedule(&execlists->tasklet);
> +               return ret;
> +       }
> +
>         return ENGINE_DEAD;
>  }
>  
> -- 
> 2.17.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring
  2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
                   ` (50 preceding siblings ...)
  2018-05-03 10:13 ` [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Lionel Landwerlin
@ 2018-05-03 16:37 ` Tvrtko Ursulin
  51 siblings, 0 replies; 65+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 16:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2018 07:36, Chris Wilson wrote:
> Limit the arbitration (where preemption may occur) to inside the batch,
> and prevent it from happening on the pipecontrols/flushes we use to
> write the breadcrumb seqno. Once the user batch is complete, we have
> nothing left to do but serialise and emit the breadcrumb; switching
> contexts at this point is futile so don't.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e04798e98db2..70b722c36e65 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1934,7 +1934,7 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>   		rq->ctx->ppgtt->pd_dirty_rings &= ~intel_engine_flag(rq->engine);
>   	}
>   
> -	cs = intel_ring_begin(rq, 4);
> +	cs = intel_ring_begin(rq, 6);
>   	if (IS_ERR(cs))
>   		return PTR_ERR(cs);
>   
> @@ -1963,6 +1963,9 @@ static int gen8_emit_bb_start(struct i915_request *rq,
>   		(flags & I915_DISPATCH_RS ? MI_BATCH_RESOURCE_STREAMER : 0);
>   	*cs++ = lower_32_bits(offset);
>   	*cs++ = upper_32_bits(offset);
> +
> +	*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
> +	*cs++ = MI_NOOP;
>   	intel_ring_advance(rq, cs);
>   
>   	return 0;
> @@ -2105,7 +2108,7 @@ static void gen8_emit_breadcrumb(struct i915_request *request, u32 *cs)
>   	cs = gen8_emit_ggtt_write(cs, request->global_seqno,
>   				  intel_hws_seqno_address(request->engine));
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
> +	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>   	request->tail = intel_ring_offset(request, cs);
>   	assert_ring_tail_valid(request->ring, request->tail);
>   
> @@ -2121,7 +2124,7 @@ static void gen8_emit_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   	cs = gen8_emit_ggtt_write_rcs(cs, request->global_seqno,
>   				      intel_hws_seqno_address(request->engine));
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
> +	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>   	request->tail = intel_ring_offset(request, cs);
>   	assert_ring_tail_valid(request->ring, request->tail);
>   
> 

Sounds to me like a completely sensible thing to do.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/71] drm/i915: Lazily unbind vma on close
  2018-05-03  6:36 ` [PATCH 03/71] drm/i915: Lazily unbind vma on close Chris Wilson
@ 2018-05-03 16:59   ` Tvrtko Ursulin
  0 siblings, 0 replies; 65+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 16:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2018 07:36, Chris Wilson wrote:
> When userspace is passing around swapbuffers using DRI, we frequently
> have to open and close the same object in the foreign address space.
> This shows itself as the same object being rebound at roughly 30fps
> (with a second object also being rebound at 30fps), which involves us
> having to rewrite the page tables and maintain the drm_mm range manager
> every time.
> 
> However, since the object still exists and it is only the local handle
> that disappears, if we are lazy and do not unbind the VMA immediately
> when the local user closes the object but defer it until the GPU is
> idle, then we can reuse the same VMA binding. We still have to be
> careful to mark the handle and lookup tables as closed to maintain the
> uABI, just allowing the underlying VMA to be resurrected if the user is
> able to access the same object from the same context again.
> 
> If the object itself is destroyed (neither userspace keeping a handle to
> it), the VMA will be reaped immediately as usual.
> 
> In the future, this will be even more useful as instantiating a new VMA
> for use on the GPU will become heavier. A nuisance indeed, so nip it in
> the bud.
> 
> v2: s/__i915_vma_final_close/i915_vma_destroy/ etc.
> v3: Leave a hint as to why we deferred the unbind on close.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h               |  1 +
>   drivers/gpu/drm/i915/i915_gem.c               |  4 +-
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  3 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c           | 14 ++--
>   drivers/gpu/drm/i915/i915_vma.c               | 73 ++++++++++++++-----
>   drivers/gpu/drm/i915/i915_vma.h               |  6 ++
>   drivers/gpu/drm/i915/selftests/huge_pages.c   |  2 +-
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  1 +
>   8 files changed, 79 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 11ff84eef52a..04e27806e581 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2062,6 +2062,7 @@ struct drm_i915_private {
>   		struct list_head timelines;
>   
>   		struct list_head active_rings;
> +		struct list_head closed_vma;
>   		u32 active_requests;
>   		u32 request_serial;
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 484354f25f98..5ece6ae4bdff 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -165,6 +165,7 @@ static u32 __i915_gem_park(struct drm_i915_private *i915)
>   	i915_timelines_park(i915);
>   
>   	i915_pmu_gt_parked(i915);
> +	i915_vma_parked(i915);
>   
>   	i915->gt.awake = false;
>   
> @@ -4795,7 +4796,7 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915,
>   					 &obj->vma_list, obj_link) {
>   			GEM_BUG_ON(i915_vma_is_active(vma));
>   			vma->flags &= ~I915_VMA_PIN_MASK;
> -			i915_vma_close(vma);
> +			i915_vma_destroy(vma);
>   		}
>   		GEM_BUG_ON(!list_empty(&obj->vma_list));
>   		GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma_tree));
> @@ -5598,6 +5599,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   
>   	INIT_LIST_HEAD(&dev_priv->gt.timelines);
>   	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
> +	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
>   
>   	i915_gem_init__mm(dev_priv);
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index c74f5df3fb5a..f627a8c47c58 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -762,7 +762,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
>   		}
>   
>   		/* transfer ref to ctx */
> -		vma->open_count++;
> +		if (!vma->open_count++)
> +			i915_vma_reopen(vma);
>   		list_add(&lut->obj_link, &obj->lut_list);
>   		list_add(&lut->ctx_link, &eb->ctx->handles_list);
>   		lut->ctx = eb->ctx;
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index e9d828324f67..272d6bb407cc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2218,6 +2218,12 @@ i915_ppgtt_create(struct drm_i915_private *dev_priv,
>   }
>   
>   void i915_ppgtt_close(struct i915_address_space *vm)
> +{
> +	GEM_BUG_ON(vm->closed);
> +	vm->closed = true;
> +}
> +
> +static void ppgtt_destroy_vma(struct i915_address_space *vm)
>   {
>   	struct list_head *phases[] = {
>   		&vm->active_list,
> @@ -2226,15 +2232,12 @@ void i915_ppgtt_close(struct i915_address_space *vm)
>   		NULL,
>   	}, **phase;
>   
> -	GEM_BUG_ON(vm->closed);
>   	vm->closed = true;
> -
>   	for (phase = phases; *phase; phase++) {
>   		struct i915_vma *vma, *vn;
>   
>   		list_for_each_entry_safe(vma, vn, *phase, vm_link)
> -			if (!i915_vma_is_closed(vma))
> -				i915_vma_close(vma);
> +			i915_vma_destroy(vma);
>   	}
>   }
>   
> @@ -2245,7 +2248,8 @@ void i915_ppgtt_release(struct kref *kref)
>   
>   	trace_i915_ppgtt_release(&ppgtt->base);
>   
> -	/* vmas should already be unbound and destroyed */
> +	ppgtt_destroy_vma(&ppgtt->base);
> +
>   	GEM_BUG_ON(!list_empty(&ppgtt->base.active_list));
>   	GEM_BUG_ON(!list_empty(&ppgtt->base.inactive_list));
>   	GEM_BUG_ON(!list_empty(&ppgtt->base.unbound_list));
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 4bda3bd29bf5..9324d476e0a7 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -46,8 +46,6 @@ i915_vma_retire(struct i915_gem_active *active, struct i915_request *rq)
>   
>   	GEM_BUG_ON(!drm_mm_node_allocated(&vma->node));
>   	list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
> -	if (unlikely(i915_vma_is_closed(vma) && !i915_vma_is_pinned(vma)))
> -		WARN_ON(i915_vma_unbind(vma));
>   
>   	GEM_BUG_ON(!i915_gem_object_is_active(obj));
>   	if (--obj->active_count)
> @@ -232,7 +230,6 @@ i915_vma_instance(struct drm_i915_gem_object *obj,
>   	if (!vma)
>   		vma = vma_create(obj, vm, view);
>   
> -	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_is_closed(vma));
>   	GEM_BUG_ON(!IS_ERR(vma) && i915_vma_compare(vma, vm, view));
>   	GEM_BUG_ON(!IS_ERR(vma) && vma_lookup(obj, vm, view) != vma);
>   	return vma;
> @@ -684,13 +681,43 @@ int __i915_vma_do_pin(struct i915_vma *vma,
>   	return ret;
>   }
>   
> -static void i915_vma_destroy(struct i915_vma *vma)
> +void i915_vma_close(struct i915_vma *vma)
> +{
> +	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
> +
> +	GEM_BUG_ON(i915_vma_is_closed(vma));
> +	vma->flags |= I915_VMA_CLOSED;
> +
> +	/*
> +	 * We defer actually closing, unbinding and destroying the VMA until
> +	 * the next idle point, or if the object is freed in the meantime. By
> +	 * postponing the unbind, we allow for it to be resurrected by the
> +	 * client, avoiding the work required to rebind the VMA. This is
> +	 * advantageous for DRI, where the client/server pass objects
> +	 * between themselves, temporarily opening a local VMA to the
> +	 * object, and then closing it again. The same object is then reused
> +	 * on the next frame (or two, depending on the depth of the swap queue)
> +	 * causing us to rebind the VMA once more. This ends up being a lot
> +	 * of wasted work for the steady state.
> +	 */
> +	list_add_tail(&vma->closed_link, &vma->vm->i915->gt.closed_vma);
> +}
> +
> +void i915_vma_reopen(struct i915_vma *vma)
> +{
> +	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
> +
> +	if (vma->flags & I915_VMA_CLOSED) {
> +		vma->flags &= ~I915_VMA_CLOSED;
> +		list_del(&vma->closed_link);
> +	}
> +}
> +
> +static void __i915_vma_destroy(struct i915_vma *vma)
>   {
>   	int i;
>   
>   	GEM_BUG_ON(vma->node.allocated);
> -	GEM_BUG_ON(i915_vma_is_active(vma));
> -	GEM_BUG_ON(!i915_vma_is_closed(vma));
>   	GEM_BUG_ON(vma->fence);
>   
>   	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
> @@ -699,6 +726,7 @@ static void i915_vma_destroy(struct i915_vma *vma)
>   
>   	list_del(&vma->obj_link);
>   	list_del(&vma->vm_link);
> +	rb_erase(&vma->obj_node, &vma->obj->vma_tree);
>   
>   	if (!i915_vma_is_ggtt(vma))
>   		i915_ppgtt_put(i915_vm_to_ppgtt(vma->vm));
> @@ -706,15 +734,30 @@ static void i915_vma_destroy(struct i915_vma *vma)
>   	kmem_cache_free(to_i915(vma->obj->base.dev)->vmas, vma);
>   }
>   
> -void i915_vma_close(struct i915_vma *vma)
> +void i915_vma_destroy(struct i915_vma *vma)
>   {
> -	GEM_BUG_ON(i915_vma_is_closed(vma));
> -	vma->flags |= I915_VMA_CLOSED;
> +	lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
>   
> -	rb_erase(&vma->obj_node, &vma->obj->vma_tree);
> +	GEM_BUG_ON(i915_vma_is_active(vma));
> +	GEM_BUG_ON(i915_vma_is_pinned(vma));
> +
> +	if (i915_vma_is_closed(vma))
> +		list_del(&vma->closed_link);
> +
> +	WARN_ON(i915_vma_unbind(vma));
> +	__i915_vma_destroy(vma);
> +}
> +
> +void i915_vma_parked(struct drm_i915_private *i915)
> +{
> +	struct i915_vma *vma, *next;
>   
> -	if (!i915_vma_is_active(vma) && !i915_vma_is_pinned(vma))
> -		WARN_ON(i915_vma_unbind(vma));
> +	list_for_each_entry_safe(vma, next, &i915->gt.closed_vma, closed_link) {
> +		GEM_BUG_ON(!i915_vma_is_closed(vma));
> +		i915_vma_destroy(vma);
> +	}
> +
> +	GEM_BUG_ON(!list_empty(&i915->gt.closed_vma));
>   }
>   
>   static void __i915_vma_iounmap(struct i915_vma *vma)
> @@ -804,7 +847,7 @@ int i915_vma_unbind(struct i915_vma *vma)
>   		return -EBUSY;
>   
>   	if (!drm_mm_node_allocated(&vma->node))
> -		goto destroy;
> +		return 0;
>   
>   	GEM_BUG_ON(obj->bind_count == 0);
>   	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
> @@ -841,10 +884,6 @@ int i915_vma_unbind(struct i915_vma *vma)
>   
>   	i915_vma_remove(vma);
>   
> -destroy:
> -	if (unlikely(i915_vma_is_closed(vma)))
> -		i915_vma_destroy(vma);
> -
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 8c5022095418..fc4294cfaa91 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -119,6 +119,8 @@ struct i915_vma {
>   	/** This vma's place in the eviction list */
>   	struct list_head evict_link;
>   
> +	struct list_head closed_link;
> +
>   	/**
>   	 * Used for performing relocations during execbuffer insertion.
>   	 */
> @@ -285,6 +287,8 @@ void i915_vma_revoke_mmap(struct i915_vma *vma);
>   int __must_check i915_vma_unbind(struct i915_vma *vma);
>   void i915_vma_unlink_ctx(struct i915_vma *vma);
>   void i915_vma_close(struct i915_vma *vma);
> +void i915_vma_reopen(struct i915_vma *vma);
> +void i915_vma_destroy(struct i915_vma *vma);
>   
>   int __i915_vma_do_pin(struct i915_vma *vma,
>   		      u64 size, u64 alignment, u64 flags);
> @@ -408,6 +412,8 @@ i915_vma_unpin_fence(struct i915_vma *vma)
>   		__i915_vma_unpin_fence(vma);
>   }
>   
> +void i915_vma_parked(struct drm_i915_private *i915);
> +
>   #define for_each_until(cond) if (cond) break; else
>   
>   /**
> diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
> index 05bbef363fff..d7c8ef8e6764 100644
> --- a/drivers/gpu/drm/i915/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
> @@ -1091,7 +1091,7 @@ static int __igt_write_huge(struct i915_gem_context *ctx,
>   out_vma_unpin:
>   	i915_vma_unpin(vma);
>   out_vma_close:
> -	i915_vma_close(vma);
> +	i915_vma_destroy(vma);
>   
>   	return err;
>   }
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index a662c0450e77..4b6622c6986a 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -226,6 +226,7 @@ struct drm_i915_private *mock_gem_device(void)
>   
>   	INIT_LIST_HEAD(&i915->gt.timelines);
>   	INIT_LIST_HEAD(&i915->gt.active_rings);
> +	INIT_LIST_HEAD(&i915->gt.closed_vma);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   	mock_init_ggtt(i915);
> 

Looks fine to me.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/71] drm/i915: Keep one request in our ring_list
  2018-05-03  6:36 ` [PATCH 04/71] drm/i915: Keep one request in our ring_list Chris Wilson
@ 2018-05-03 17:04   ` Tvrtko Ursulin
  0 siblings, 0 replies; 65+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 17:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2018 07:36, Chris Wilson wrote:
> Don't pre-emptively retire the oldest request in our ring's list if it
> is the only request. We keep various bits of state alive using the
> active reference from the request and would rather transfer that state
> over to a new request rather than the more involved process of retiring
> and reacquiring it.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5acf869f3ca3..75061f9e48eb 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -694,9 +694,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   		goto err_unreserve;
>   
>   	/* Move our oldest request to the slab-cache (if not in use!) */
> -	rq = list_first_entry_or_null(&ring->request_list,
> -				      typeof(*rq), ring_link);
> -	if (rq && i915_request_completed(rq))
> +	rq = list_first_entry(&ring->request_list, typeof(*rq), ring_link);
> +	if (!list_is_last(&rq->ring_link, &ring->request_list) &&
> +	    i915_request_completed(rq))
>   		i915_request_retire(rq);
>   
>   	/*
> 

Sounds believable.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling
  2018-05-03  6:36 ` [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling Chris Wilson
@ 2018-05-03 17:49   ` Tvrtko Ursulin
  2018-05-03 19:50     ` Chris Wilson
  0 siblings, 1 reply; 65+ messages in thread
From: Tvrtko Ursulin @ 2018-05-03 17:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2018 07:36, Chris Wilson wrote:
> As we reschedule the requests, we do not want the submission tasklet
> running until we finish updating the priority chains. (We start
> rewriting priorities from the oldest, but the dequeue looks at the most
> recent in-flight, so there is a small race condition where dequeue may
> decide that preemption is falsely required.) Combine the tasklet kicking
> from adding a new request with the set-wedge protection so that we only
> have to adjust the preempt-counter once to achieve both goals.
> 
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c     | 4 ++--
>   drivers/gpu/drm/i915/i915_request.c | 5 +----
>   2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5ece6ae4bdff..03cd30001b5d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence,
>   	rq = to_request(fence);
>   	engine = rq->engine;
>   
> -	rcu_read_lock();
> +	local_bh_disable(); /* RCU serialisation for set-wedged protection */
>   	if (engine->schedule)
>   		engine->schedule(rq, attr);
> -	rcu_read_unlock();
> +	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
>   }
>   
>   static void fence_set_priority(struct dma_fence *fence,
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 75061f9e48eb..0756fafa7f81 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>   	 * decide whether to preempt the entire chain so that it is ready to
>   	 * run at the earliest possible convenience.
>   	 */
> -	rcu_read_lock();
> +	local_bh_disable();
>   	if (engine->schedule)
>   		engine->schedule(request, &request->ctx->sched);
> -	rcu_read_unlock();
> -
> -	local_bh_disable();
>   	i915_sw_fence_commit(&request->submit);
>   	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>   
> 

AFAICS this doesn't disable tasklets running on other CPUs in parallel, 
on different engines, so they still may see the non-atomic (wrt 
schedule) snapshot of the submission queues. So I am not sure what it 
means. It prevents a tasklet from interrupt the schedule of this request 
- but as I said, I am not sure of the benefit.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling
  2018-05-03 17:49   ` Tvrtko Ursulin
@ 2018-05-03 19:50     ` Chris Wilson
  2018-05-04  9:15       ` Tvrtko Ursulin
  0 siblings, 1 reply; 65+ messages in thread
From: Chris Wilson @ 2018-05-03 19:50 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-03 18:49:27)
> 
> On 03/05/2018 07:36, Chris Wilson wrote:
> > As we reschedule the requests, we do not want the submission tasklet
> > running until we finish updating the priority chains. (We start
> > rewriting priorities from the oldest, but the dequeue looks at the most
> > recent in-flight, so there is a small race condition where dequeue may
> > decide that preemption is falsely required.) Combine the tasklet kicking
> > from adding a new request with the set-wedge protection so that we only
> > have to adjust the preempt-counter once to achieve both goals.
> > 
> > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c     | 4 ++--
> >   drivers/gpu/drm/i915/i915_request.c | 5 +----
> >   2 files changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 5ece6ae4bdff..03cd30001b5d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence,
> >       rq = to_request(fence);
> >       engine = rq->engine;
> >   
> > -     rcu_read_lock();
> > +     local_bh_disable(); /* RCU serialisation for set-wedged protection */
> >       if (engine->schedule)
> >               engine->schedule(rq, attr);
> > -     rcu_read_unlock();
> > +     local_bh_enable(); /* kick the tasklets if queues were reprioritised */
> >   }
> >   
> >   static void fence_set_priority(struct dma_fence *fence,
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 75061f9e48eb..0756fafa7f81 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
> >        * decide whether to preempt the entire chain so that it is ready to
> >        * run at the earliest possible convenience.
> >        */
> > -     rcu_read_lock();
> > +     local_bh_disable();
> >       if (engine->schedule)
> >               engine->schedule(request, &request->ctx->sched);
> > -     rcu_read_unlock();
> > -
> > -     local_bh_disable();
> >       i915_sw_fence_commit(&request->submit);
> >       local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
> >   
> > 
> 
> AFAICS this doesn't disable tasklets running on other CPUs in parallel, 
> on different engines, so they still may see the non-atomic (wrt 
> schedule) snapshot of the submission queues. So I am not sure what it 
> means. It prevents a tasklet from interrupt the schedule of this request 
> - but as I said, I am not sure of the benefit.

That was my "oh bother" comment as well. We don't realise the benefit of
ensuring that we always process the entire chain before a concurrent
tasklet starts processing the update; but we do coalesce the double
preempt-counter manipulation into one, and only **actually** kick the
tasklet when rescheduling a page flip rather than be forced to wait to
ksoftird. tasklets are only fast-pathed if scheduled from interrupt
context!
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling
  2018-05-03 19:50     ` Chris Wilson
@ 2018-05-04  9:15       ` Tvrtko Ursulin
  2018-05-04  9:31         ` Chris Wilson
  0 siblings, 1 reply; 65+ messages in thread
From: Tvrtko Ursulin @ 2018-05-04  9:15 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2018 20:50, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-03 18:49:27)
>>
>> On 03/05/2018 07:36, Chris Wilson wrote:
>>> As we reschedule the requests, we do not want the submission tasklet
>>> running until we finish updating the priority chains. (We start
>>> rewriting priorities from the oldest, but the dequeue looks at the most
>>> recent in-flight, so there is a small race condition where dequeue may
>>> decide that preemption is falsely required.) Combine the tasklet kicking
>>> from adding a new request with the set-wedge protection so that we only
>>> have to adjust the preempt-counter once to achieve both goals.
>>>
>>> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c     | 4 ++--
>>>    drivers/gpu/drm/i915/i915_request.c | 5 +----
>>>    2 files changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 5ece6ae4bdff..03cd30001b5d 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence,
>>>        rq = to_request(fence);
>>>        engine = rq->engine;
>>>    
>>> -     rcu_read_lock();
>>> +     local_bh_disable(); /* RCU serialisation for set-wedged protection */
>>>        if (engine->schedule)
>>>                engine->schedule(rq, attr);
>>> -     rcu_read_unlock();
>>> +     local_bh_enable(); /* kick the tasklets if queues were reprioritised */
>>>    }
>>>    
>>>    static void fence_set_priority(struct dma_fence *fence,
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 75061f9e48eb..0756fafa7f81 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>>>         * decide whether to preempt the entire chain so that it is ready to
>>>         * run at the earliest possible convenience.
>>>         */
>>> -     rcu_read_lock();
>>> +     local_bh_disable();
>>>        if (engine->schedule)
>>>                engine->schedule(request, &request->ctx->sched);
>>> -     rcu_read_unlock();
>>> -
>>> -     local_bh_disable();
>>>        i915_sw_fence_commit(&request->submit);
>>>        local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>>>    
>>>
>>
>> AFAICS this doesn't disable tasklets running on other CPUs in parallel,
>> on different engines, so they still may see the non-atomic (wrt
>> schedule) snapshot of the submission queues. So I am not sure what it
>> means. It prevents a tasklet from interrupt the schedule of this request
>> - but as I said, I am not sure of the benefit.
> 
> That was my "oh bother" comment as well. We don't realise the benefit of
> ensuring that we always process the entire chain before a concurrent
> tasklet starts processing the update; but we do coalesce the double
> preempt-counter manipulation into one, and only **actually** kick the
> tasklet when rescheduling a page flip rather than be forced to wait to
> ksoftird. tasklets are only fast-pathed if scheduled from interrupt
> context!

As far as I understand it, the hunk in __fence_set_priority may help 
triggering preemption quicker, rather than on the next csb or submit 
activity - so makes some sense.

I say some because it would be a balancing question between the time 
taken to re-schedule vs alternative to just kick the tasklet after 
reschedule is done outside the softirq disable section.

The second hunk is a bit more difficult. It creates a longer softirq-off 
section, which is a slight negative, and I am unsure how much it 
actually closes the race with tasklets in practice. So it may be the 
only benefit is to reduce fiddles of both preempt count and 
local_bh_disable, to fiddle just one.

Can I ask for a patch split? :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling
  2018-05-04  9:15       ` Tvrtko Ursulin
@ 2018-05-04  9:31         ` Chris Wilson
  0 siblings, 0 replies; 65+ messages in thread
From: Chris Wilson @ 2018-05-04  9:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-05-04 10:15:20)
> On 03/05/2018 20:50, Chris Wilson wrote:
> The second hunk is a bit more difficult. It creates a longer softirq-off 
> section, which is a slight negative, and I am unsure how much it 
> actually closes the race with tasklets in practice. So it may be the 
> only benefit is to reduce fiddles of both preempt count and 
> local_bh_disable, to fiddle just one.

Remember that local_bh_disable() is just a fiddle with preempt-count and
that the local tasklet doesn't run during the !preemptible section
anyway. So it does remove one preemption point (in clearing the
preempt-count) but that is of dubious merit since we would then need to
kick the submission tasklet again immediately (from the preempt-count
pov) afterwards.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 09/71] drm/i915: Store a pointer to intel_context in i915_request
  2018-05-03  6:36 ` [PATCH 09/71] drm/i915: Store a pointer to intel_context in i915_request Chris Wilson
@ 2018-05-04 10:31   ` Tvrtko Ursulin
  0 siblings, 0 replies; 65+ messages in thread
From: Tvrtko Ursulin @ 2018-05-04 10:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 03/05/2018 07:36, Chris Wilson wrote:
> To ease the frequent and ugly pointer dance of
> &request->gem_context->engine[request->engine->id] during request
> submission, store that pointer as request->hw_context. One major
> advantage that we will exploit later is that this decouples the logical
> context state from the engine itself.
> 
> v2: Set mock_context->ops so we don't crash and burn in selftests.
>      Cleanups from Tvrtko.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gvt/mmio_context.c       |   6 +-
>   drivers/gpu/drm/i915/gvt/mmio_context.h       |   2 +-
>   drivers/gpu/drm/i915/gvt/scheduler.c          | 141 +++++++-----------
>   drivers/gpu/drm/i915/gvt/scheduler.h          |   1 -
>   drivers/gpu/drm/i915/i915_drv.h               |   1 +
>   drivers/gpu/drm/i915/i915_gem.c               |  12 +-
>   drivers/gpu/drm/i915/i915_gem_context.c       |  17 ++-
>   drivers/gpu/drm/i915/i915_gem_context.h       |  21 ++-
>   drivers/gpu/drm/i915/i915_gpu_error.c         |   3 +-
>   drivers/gpu/drm/i915/i915_perf.c              |  25 ++--
>   drivers/gpu/drm/i915/i915_request.c           |  34 ++---
>   drivers/gpu/drm/i915/i915_request.h           |   1 +
>   drivers/gpu/drm/i915/intel_engine_cs.c        |  54 ++++---
>   drivers/gpu/drm/i915/intel_guc_submission.c   |  10 +-
>   drivers/gpu/drm/i915/intel_lrc.c              | 118 +++++++++------
>   drivers/gpu/drm/i915/intel_lrc.h              |   7 -
>   drivers/gpu/drm/i915/intel_ringbuffer.c       | 100 ++++++++-----
>   drivers/gpu/drm/i915/intel_ringbuffer.h       |   9 +-
>   drivers/gpu/drm/i915/selftests/mock_context.c |   7 +
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  41 +++--
>   20 files changed, 320 insertions(+), 290 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
> index 0f949554d118..708170e61625 100644
> --- a/drivers/gpu/drm/i915/gvt/mmio_context.c
> +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
> @@ -446,9 +446,9 @@ static void switch_mocs(struct intel_vgpu *pre, struct intel_vgpu *next,
>   
>   #define CTX_CONTEXT_CONTROL_VAL	0x03
>   
> -bool is_inhibit_context(struct i915_gem_context *ctx, int ring_id)
> +bool is_inhibit_context(struct intel_context *ce)
>   {
> -	u32 *reg_state = ctx->__engine[ring_id].lrc_reg_state;
> +	const u32 *reg_state = ce->lrc_reg_state;
>   	u32 inhibit_mask =
>   		_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
>   
> @@ -501,7 +501,7 @@ static void switch_mmio(struct intel_vgpu *pre,
>   			 * itself.
>   			 */
>   			if (mmio->in_context &&
> -			    !is_inhibit_context(s->shadow_ctx, ring_id))
> +			    !is_inhibit_context(&s->shadow_ctx->__engine[ring_id]))
>   				continue;
>   
>   			if (mmio->mask)
> diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.h b/drivers/gpu/drm/i915/gvt/mmio_context.h
> index 0439eb8057a8..5c3b9ff9f96a 100644
> --- a/drivers/gpu/drm/i915/gvt/mmio_context.h
> +++ b/drivers/gpu/drm/i915/gvt/mmio_context.h
> @@ -49,7 +49,7 @@ void intel_gvt_switch_mmio(struct intel_vgpu *pre,
>   
>   void intel_gvt_init_engine_mmio_context(struct intel_gvt *gvt);
>   
> -bool is_inhibit_context(struct i915_gem_context *ctx, int ring_id);
> +bool is_inhibit_context(struct intel_context *ce);
>   
>   int intel_vgpu_restore_inhibit_context(struct intel_vgpu *vgpu,
>   				       struct i915_request *req);
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> index f409a154491d..d9aa39d28584 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> @@ -54,11 +54,8 @@ static void set_context_pdp_root_pointer(
>   
>   static void update_shadow_pdps(struct intel_vgpu_workload *workload)
>   {
> -	struct intel_vgpu *vgpu = workload->vgpu;
> -	int ring_id = workload->ring_id;
> -	struct i915_gem_context *shadow_ctx = vgpu->submission.shadow_ctx;
>   	struct drm_i915_gem_object *ctx_obj =
> -		shadow_ctx->__engine[ring_id].state->obj;
> +		workload->req->hw_context->state->obj;
>   	struct execlist_ring_context *shadow_ring_context;
>   	struct page *page;
>   
> @@ -128,9 +125,8 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
>   	struct intel_vgpu *vgpu = workload->vgpu;
>   	struct intel_gvt *gvt = vgpu->gvt;
>   	int ring_id = workload->ring_id;
> -	struct i915_gem_context *shadow_ctx = vgpu->submission.shadow_ctx;
>   	struct drm_i915_gem_object *ctx_obj =
> -		shadow_ctx->__engine[ring_id].state->obj;
> +		workload->req->hw_context->state->obj;
>   	struct execlist_ring_context *shadow_ring_context;
>   	struct page *page;
>   	void *dst;
> @@ -280,10 +276,8 @@ static int shadow_context_status_change(struct notifier_block *nb,
>   	return NOTIFY_OK;
>   }
>   
> -static void shadow_context_descriptor_update(struct i915_gem_context *ctx,
> -		struct intel_engine_cs *engine)
> +static void shadow_context_descriptor_update(struct intel_context *ce)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
>   	u64 desc = 0;
>   
>   	desc = ce->lrc_desc;
> @@ -292,7 +286,7 @@ static void shadow_context_descriptor_update(struct i915_gem_context *ctx,
>   	 * like GEN8_CTX_* cached in desc_template
>   	 */
>   	desc &= U64_MAX << 12;
> -	desc |= ctx->desc_template & ((1ULL << 12) - 1);
> +	desc |= ce->gem_context->desc_template & ((1ULL << 12) - 1);
>   
>   	ce->lrc_desc = desc;
>   }
> @@ -300,12 +294,11 @@ static void shadow_context_descriptor_update(struct i915_gem_context *ctx,
>   static int copy_workload_to_ring_buffer(struct intel_vgpu_workload *workload)
>   {
>   	struct intel_vgpu *vgpu = workload->vgpu;
> +	struct i915_request *req = workload->req;
>   	void *shadow_ring_buffer_va;
>   	u32 *cs;
> -	struct i915_request *req = workload->req;
>   
> -	if (IS_KABYLAKE(req->i915) &&
> -	    is_inhibit_context(req->gem_context, req->engine->id))
> +	if (IS_KABYLAKE(req->i915) && is_inhibit_context(req->hw_context))
>   		intel_vgpu_restore_inhibit_context(vgpu, req);
>   
>   	/* allocate shadow ring buffer */
> @@ -353,60 +346,56 @@ int intel_gvt_scan_and_shadow_workload(struct intel_vgpu_workload *workload)
>   	struct intel_vgpu_submission *s = &vgpu->submission;
>   	struct i915_gem_context *shadow_ctx = s->shadow_ctx;
>   	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
> -	int ring_id = workload->ring_id;
> -	struct intel_engine_cs *engine = dev_priv->engine[ring_id];
> -	struct intel_ring *ring;
> +	struct intel_engine_cs *engine = dev_priv->engine[workload->ring_id];
> +	struct intel_context *ce;
>   	int ret;
>   
>   	lockdep_assert_held(&dev_priv->drm.struct_mutex);
>   
> -	if (workload->shadowed)
> +	if (workload->req)
>   		return 0;
>   
> +	/* pin shadow context by gvt even the shadow context will be pinned
> +	 * when i915 alloc request. That is because gvt will update the guest
> +	 * context from shadow context when workload is completed, and at that
> +	 * moment, i915 may already unpined the shadow context to make the
> +	 * shadow_ctx pages invalid. So gvt need to pin itself. After update
> +	 * the guest context, gvt can unpin the shadow_ctx safely.
> +	 */
> +	ce = intel_context_pin(shadow_ctx, engine);
> +	if (IS_ERR(ce)) {
> +		gvt_vgpu_err("fail to pin shadow context\n");
> +		return PTR_ERR(ce);
> +	}
> +
>   	shadow_ctx->desc_template &= ~(0x3 << GEN8_CTX_ADDRESSING_MODE_SHIFT);
>   	shadow_ctx->desc_template |= workload->ctx_desc.addressing_mode <<
>   				    GEN8_CTX_ADDRESSING_MODE_SHIFT;
>   
> -	if (!test_and_set_bit(ring_id, s->shadow_ctx_desc_updated))
> -		shadow_context_descriptor_update(shadow_ctx,
> -					dev_priv->engine[ring_id]);
> +	if (!test_and_set_bit(workload->ring_id, s->shadow_ctx_desc_updated))
> +		shadow_context_descriptor_update(ce);
>   
>   	ret = intel_gvt_scan_and_shadow_ringbuffer(workload);
>   	if (ret)
> -		goto err_scan;
> +		goto err_unpin;
>   
>   	if ((workload->ring_id == RCS) &&
>   	    (workload->wa_ctx.indirect_ctx.size != 0)) {
>   		ret = intel_gvt_scan_and_shadow_wa_ctx(&workload->wa_ctx);
>   		if (ret)
> -			goto err_scan;
> -	}
> -
> -	/* pin shadow context by gvt even the shadow context will be pinned
> -	 * when i915 alloc request. That is because gvt will update the guest
> -	 * context from shadow context when workload is completed, and at that
> -	 * moment, i915 may already unpined the shadow context to make the
> -	 * shadow_ctx pages invalid. So gvt need to pin itself. After update
> -	 * the guest context, gvt can unpin the shadow_ctx safely.
> -	 */
> -	ring = intel_context_pin(shadow_ctx, engine);
> -	if (IS_ERR(ring)) {
> -		ret = PTR_ERR(ring);
> -		gvt_vgpu_err("fail to pin shadow context\n");
> -		goto err_shadow;
> +			goto err_shadow;
>   	}
>   
>   	ret = populate_shadow_context(workload);
>   	if (ret)
> -		goto err_unpin;
> -	workload->shadowed = true;
> +		goto err_shadow;
> +
>   	return 0;
>   
> -err_unpin:
> -	intel_context_unpin(shadow_ctx, engine);
>   err_shadow:
>   	release_shadow_wa_ctx(&workload->wa_ctx);
> -err_scan:
> +err_unpin:
> +	intel_context_unpin(ce);
>   	return ret;
>   }
>   
> @@ -414,7 +403,6 @@ static int intel_gvt_generate_request(struct intel_vgpu_workload *workload)
>   {
>   	int ring_id = workload->ring_id;
>   	struct drm_i915_private *dev_priv = workload->vgpu->gvt->dev_priv;
> -	struct intel_engine_cs *engine = dev_priv->engine[ring_id];
>   	struct i915_request *rq;
>   	struct intel_vgpu *vgpu = workload->vgpu;
>   	struct intel_vgpu_submission *s = &vgpu->submission;
> @@ -437,7 +425,6 @@ static int intel_gvt_generate_request(struct intel_vgpu_workload *workload)
>   	return 0;
>   
>   err_unpin:
> -	intel_context_unpin(shadow_ctx, engine);
>   	release_shadow_wa_ctx(&workload->wa_ctx);
>   	return ret;
>   }
> @@ -517,21 +504,13 @@ static int prepare_shadow_batch_buffer(struct intel_vgpu_workload *workload)
>   	return ret;
>   }
>   
> -static int update_wa_ctx_2_shadow_ctx(struct intel_shadow_wa_ctx *wa_ctx)
> +static void update_wa_ctx_2_shadow_ctx(struct intel_shadow_wa_ctx *wa_ctx)
>   {
> -	struct intel_vgpu_workload *workload = container_of(wa_ctx,
> -					struct intel_vgpu_workload,
> -					wa_ctx);
> -	int ring_id = workload->ring_id;
> -	struct intel_vgpu_submission *s = &workload->vgpu->submission;
> -	struct i915_gem_context *shadow_ctx = s->shadow_ctx;
> -	struct drm_i915_gem_object *ctx_obj =
> -		shadow_ctx->__engine[ring_id].state->obj;
> -	struct execlist_ring_context *shadow_ring_context;
> -	struct page *page;
> -
> -	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> -	shadow_ring_context = kmap_atomic(page);
> +	struct intel_vgpu_workload *workload =
> +		container_of(wa_ctx, struct intel_vgpu_workload, wa_ctx);
> +	struct i915_request *rq = workload->req;
> +	struct execlist_ring_context *shadow_ring_context =
> +		(struct execlist_ring_context *)rq->hw_context->lrc_reg_state;
>   
>   	shadow_ring_context->bb_per_ctx_ptr.val =
>   		(shadow_ring_context->bb_per_ctx_ptr.val &
> @@ -539,9 +518,6 @@ static int update_wa_ctx_2_shadow_ctx(struct intel_shadow_wa_ctx *wa_ctx)
>   	shadow_ring_context->rcs_indirect_ctx.val =
>   		(shadow_ring_context->rcs_indirect_ctx.val &
>   		(~INDIRECT_CTX_ADDR_MASK)) | wa_ctx->indirect_ctx.shadow_gma;
> -
> -	kunmap_atomic(shadow_ring_context);
> -	return 0;
>   }
>   
>   static int prepare_shadow_wa_ctx(struct intel_shadow_wa_ctx *wa_ctx)
> @@ -670,12 +646,9 @@ static int prepare_workload(struct intel_vgpu_workload *workload)
>   static int dispatch_workload(struct intel_vgpu_workload *workload)
>   {
>   	struct intel_vgpu *vgpu = workload->vgpu;
> -	struct intel_vgpu_submission *s = &vgpu->submission;
> -	struct i915_gem_context *shadow_ctx = s->shadow_ctx;
>   	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
>   	int ring_id = workload->ring_id;
> -	struct intel_engine_cs *engine = dev_priv->engine[ring_id];
> -	int ret = 0;
> +	int ret;
>   
>   	gvt_dbg_sched("ring id %d prepare to dispatch workload %p\n",
>   		ring_id, workload);
> @@ -687,10 +660,6 @@ static int dispatch_workload(struct intel_vgpu_workload *workload)
>   		goto out;
>   
>   	ret = prepare_workload(workload);
> -	if (ret) {
> -		intel_context_unpin(shadow_ctx, engine);
> -		goto out;
> -	}
>   
>   out:
>   	if (ret)
> @@ -765,27 +734,23 @@ static struct intel_vgpu_workload *pick_next_workload(
>   
>   static void update_guest_context(struct intel_vgpu_workload *workload)
>   {
> +	struct i915_request *rq = workload->req;
>   	struct intel_vgpu *vgpu = workload->vgpu;
>   	struct intel_gvt *gvt = vgpu->gvt;
> -	struct intel_vgpu_submission *s = &vgpu->submission;
> -	struct i915_gem_context *shadow_ctx = s->shadow_ctx;
> -	int ring_id = workload->ring_id;
> -	struct drm_i915_gem_object *ctx_obj =
> -		shadow_ctx->__engine[ring_id].state->obj;
> +	struct drm_i915_gem_object *ctx_obj = rq->hw_context->state->obj;
>   	struct execlist_ring_context *shadow_ring_context;
>   	struct page *page;
>   	void *src;
>   	unsigned long context_gpa, context_page_num;
>   	int i;
>   
> -	gvt_dbg_sched("ring id %d workload lrca %x\n", ring_id,
> -			workload->ctx_desc.lrca);
> -
> -	context_page_num = gvt->dev_priv->engine[ring_id]->context_size;
> +	gvt_dbg_sched("ring id %d workload lrca %x\n", rq->engine->id,
> +		      workload->ctx_desc.lrca);
>   
> +	context_page_num = rq->engine->context_size;
>   	context_page_num = context_page_num >> PAGE_SHIFT;
>   
> -	if (IS_BROADWELL(gvt->dev_priv) && ring_id == RCS)
> +	if (IS_BROADWELL(gvt->dev_priv) && rq->engine->id == RCS)
>   		context_page_num = 19;
>   
>   	i = 2;
> @@ -858,6 +823,7 @@ static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
>   		scheduler->current_workload[ring_id];
>   	struct intel_vgpu *vgpu = workload->vgpu;
>   	struct intel_vgpu_submission *s = &vgpu->submission;
> +	struct i915_request *rq;
>   	int event;
>   
>   	mutex_lock(&gvt->lock);
> @@ -866,11 +832,8 @@ static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
>   	 * switch to make sure request is completed.
>   	 * For the workload w/o request, directly complete the workload.
>   	 */
> -	if (workload->req) {
> -		struct drm_i915_private *dev_priv =
> -			workload->vgpu->gvt->dev_priv;
> -		struct intel_engine_cs *engine =
> -			dev_priv->engine[workload->ring_id];
> +	rq = fetch_and_zero(&workload->req);
> +	if (rq) {
>   		wait_event(workload->shadow_ctx_status_wq,
>   			   !atomic_read(&workload->shadow_ctx_active));
>   
> @@ -886,8 +849,6 @@ static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
>   				workload->status = 0;
>   		}
>   
> -		i915_request_put(fetch_and_zero(&workload->req));
> -
>   		if (!workload->status && !(vgpu->resetting_eng &
>   					   ENGINE_MASK(ring_id))) {
>   			update_guest_context(workload);
> @@ -896,10 +857,13 @@ static void complete_current_workload(struct intel_gvt *gvt, int ring_id)
>   					 INTEL_GVT_EVENT_MAX)
>   				intel_vgpu_trigger_virtual_event(vgpu, event);
>   		}
> -		mutex_lock(&dev_priv->drm.struct_mutex);
> +
>   		/* unpin shadow ctx as the shadow_ctx update is done */
> -		intel_context_unpin(s->shadow_ctx, engine);
> -		mutex_unlock(&dev_priv->drm.struct_mutex);
> +		mutex_lock(&rq->i915->drm.struct_mutex);
> +		intel_context_unpin(rq->hw_context);
> +		mutex_unlock(&rq->i915->drm.struct_mutex);
> +
> +		i915_request_put(rq);
>   	}
>   
>   	gvt_dbg_sched("ring id %d complete workload %p status %d\n",
> @@ -1273,7 +1237,6 @@ alloc_workload(struct intel_vgpu *vgpu)
>   	atomic_set(&workload->shadow_ctx_active, 0);
>   
>   	workload->status = -EINPROGRESS;
> -	workload->shadowed = false;
>   	workload->vgpu = vgpu;
>   
>   	return workload;
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.h b/drivers/gpu/drm/i915/gvt/scheduler.h
> index 6c644782193e..21eddab4a9cd 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.h
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.h
> @@ -83,7 +83,6 @@ struct intel_vgpu_workload {
>   	struct i915_request *req;
>   	/* if this workload has been dispatched to i915? */
>   	bool dispatched;
> -	bool shadowed;
>   	int status;
>   
>   	struct intel_vgpu_mm *shadow_mm;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 04e27806e581..9341b725113b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1949,6 +1949,7 @@ struct drm_i915_private {
>   			 */
>   			struct i915_perf_stream *exclusive_stream;
>   
> +			struct intel_context *pinned_ctx;
>   			u32 specific_ctx_id;
>   
>   			struct hrtimer poll_check_timer;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ecef2e8e5e93..8a8a77c2ef5f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3229,14 +3229,14 @@ void i915_gem_reset(struct drm_i915_private *dev_priv,
>   	i915_retire_requests(dev_priv);
>   
>   	for_each_engine(engine, dev_priv, id) {
> -		struct i915_gem_context *ctx;
> +		struct intel_context *ce;
>   
>   		i915_gem_reset_engine(engine,
>   				      engine->hangcheck.active_request,
>   				      stalled_mask & ENGINE_MASK(id));
> -		ctx = fetch_and_zero(&engine->last_retired_context);
> -		if (ctx)
> -			intel_context_unpin(ctx, engine);
> +		ce = fetch_and_zero(&engine->last_retired_context);
> +		if (ce)
> +			intel_context_unpin(ce);
>   
>   		/*
>   		 * Ostensibily, we always want a context loaded for powersaving,
> @@ -4946,13 +4946,13 @@ void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj)
>   
>   static void assert_kernel_context_is_current(struct drm_i915_private *i915)
>   {
> -	struct i915_gem_context *kernel_context = i915->kernel_context;
> +	struct i915_gem_context *kctx = i915->kernel_context;
>   	struct intel_engine_cs *engine;
>   	enum intel_engine_id id;
>   
>   	for_each_engine(engine, i915, id) {
>   		GEM_BUG_ON(__i915_gem_active_peek(&engine->timeline.last_request));
> -		GEM_BUG_ON(engine->last_retired_context != kernel_context);
> +		GEM_BUG_ON(engine->last_retired_context->gem_context != kctx);
>   	}
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 78dc4cb305c2..66aad55c5273 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -127,14 +127,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
>   		struct intel_context *ce = &ctx->__engine[n];
>   
> -		if (!ce->state)
> -			continue;
> -
> -		WARN_ON(ce->pin_count);
> -		if (ce->ring)
> -			intel_ring_free(ce->ring);
> -
> -		__i915_gem_object_release_unless_active(ce->state->obj);
> +		if (ce->ops)
> +			ce->ops->destroy(ce);
>   	}
>   
>   	kfree(ctx->name);
> @@ -266,6 +260,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   		    struct drm_i915_file_private *file_priv)
>   {
>   	struct i915_gem_context *ctx;
> +	unsigned int n;
>   	int ret;
>   
>   	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> @@ -283,6 +278,12 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   	ctx->i915 = dev_priv;
>   	ctx->sched.priority = I915_PRIORITY_NORMAL;
>   
> +	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
> +		struct intel_context *ce = &ctx->__engine[n];
> +
> +		ce->gem_context = ctx;
> +	}
> +
>   	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
>   	INIT_LIST_HEAD(&ctx->handles_list);
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index ace3b129c189..749a4ff566f5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -45,6 +45,11 @@ struct intel_ring;
>   
>   #define DEFAULT_CONTEXT_HANDLE 0
>   
> +struct intel_context_ops {
> +	void (*unpin)(struct intel_context *ce);
> +	void (*destroy)(struct intel_context *ce);
> +};
> +
>   /**
>    * struct i915_gem_context - client state
>    *
> @@ -144,11 +149,14 @@ struct i915_gem_context {
>   
>   	/** engine: per-engine logical HW state */
>   	struct intel_context {
> +		struct i915_gem_context *gem_context;
>   		struct i915_vma *state;
>   		struct intel_ring *ring;
>   		u32 *lrc_reg_state;
>   		u64 lrc_desc;
>   		int pin_count;
> +
> +		const struct intel_context_ops *ops;
>   	} __engine[I915_NUM_ENGINES];
>   
>   	/** ring_size: size for allocating the per-engine ring buffer */
> @@ -263,25 +271,22 @@ to_intel_context(struct i915_gem_context *ctx,
>   	return &ctx->__engine[engine->id];
>   }
>   
> -static inline struct intel_ring *
> +static inline struct intel_context *
>   intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
>   {
>   	return engine->context_pin(engine, ctx);
>   }
>   
> -static inline void __intel_context_pin(struct i915_gem_context *ctx,
> -				       const struct intel_engine_cs *engine)
> +static inline void __intel_context_pin(struct intel_context *ce)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
> -
>   	GEM_BUG_ON(!ce->pin_count);
>   	ce->pin_count++;
>   }
>   
> -static inline void intel_context_unpin(struct i915_gem_context *ctx,
> -				       struct intel_engine_cs *engine)
> +static inline void intel_context_unpin(struct intel_context *ce)
>   {
> -	engine->context_unpin(engine, ctx);
> +	GEM_BUG_ON(!ce->ops);
> +	ce->ops->unpin(ce);
>   }
>   
>   /* i915_gem_context.c */
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 7cc7d3bc731b..145823f0b48e 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1485,8 +1485,7 @@ static void gem_record_rings(struct i915_gpu_state *error)
>   
>   			ee->ctx =
>   				i915_error_object_create(i915,
> -							 to_intel_context(ctx,
> -									  engine)->state);
> +							 request->hw_context->state);
>   
>   			error->simulated |=
>   				i915_gem_context_no_error_capture(ctx);
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index d9341415df40..9b580aba7e25 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1221,7 +1221,7 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   		dev_priv->perf.oa.specific_ctx_id = stream->ctx->hw_id;
>   	} else {
>   		struct intel_engine_cs *engine = dev_priv->engine[RCS];
> -		struct intel_ring *ring;
> +		struct intel_context *ce;
>   		int ret;
>   
>   		ret = i915_mutex_lock_interruptible(&dev_priv->drm);
> @@ -1234,19 +1234,19 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   		 *
>   		 * NB: implied RCS engine...
>   		 */
> -		ring = intel_context_pin(stream->ctx, engine);
> +		ce = intel_context_pin(stream->ctx, engine);
>   		mutex_unlock(&dev_priv->drm.struct_mutex);
> -		if (IS_ERR(ring))
> -			return PTR_ERR(ring);
> +		if (IS_ERR(ce))
> +			return PTR_ERR(ce);
>   
> +		dev_priv->perf.oa.pinned_ctx = ce;
>   
>   		/*
>   		 * Explicitly track the ID (instead of calling
>   		 * i915_ggtt_offset() on the fly) considering the difference
>   		 * with gen8+ and execlists
>   		 */
> -		dev_priv->perf.oa.specific_ctx_id =
> -			i915_ggtt_offset(to_intel_context(stream->ctx, engine)->state);
> +		dev_priv->perf.oa.specific_ctx_id = i915_ggtt_offset(ce->state);
>   	}
>   
>   	return 0;
> @@ -1262,17 +1262,14 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
>   static void oa_put_render_ctx_id(struct i915_perf_stream *stream)
>   {
>   	struct drm_i915_private *dev_priv = stream->dev_priv;
> +	struct intel_context *ce;
>   
> -	if (HAS_LOGICAL_RING_CONTEXTS(dev_priv)) {
> -		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> -	} else {
> -		struct intel_engine_cs *engine = dev_priv->engine[RCS];
> +	dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
>   
> +	ce = fetch_and_zero(&dev_priv->perf.oa.pinned_ctx);
> +	if (ce) {
>   		mutex_lock(&dev_priv->drm.struct_mutex);
> -
> -		dev_priv->perf.oa.specific_ctx_id = INVALID_CTX_ID;
> -		intel_context_unpin(stream->ctx, engine);
> -
> +		intel_context_unpin(ce);
>   		mutex_unlock(&dev_priv->drm.struct_mutex);
>   	}
>   }
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5205707fe03a..e5925fcf6004 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -382,8 +382,8 @@ static void __retire_engine_request(struct intel_engine_cs *engine,
>   	 * the subsequent request.
>   	 */
>   	if (engine->last_retired_context)
> -		intel_context_unpin(engine->last_retired_context, engine);
> -	engine->last_retired_context = rq->gem_context;
> +		intel_context_unpin(engine->last_retired_context);
> +	engine->last_retired_context = rq->hw_context;
>   }
>   
>   static void __retire_engine_upto(struct intel_engine_cs *engine,
> @@ -455,7 +455,7 @@ static void i915_request_retire(struct i915_request *request)
>   
>   	/* Retirement decays the ban score as it is a sign of ctx progress */
>   	atomic_dec_if_positive(&request->gem_context->ban_score);
> -	intel_context_unpin(request->gem_context, request->engine);
> +	intel_context_unpin(request->hw_context);
>   
>   	__retire_engine_upto(request->engine, request);
>   
> @@ -656,7 +656,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
>   	struct i915_request *rq;
> -	struct intel_ring *ring;
> +	struct intel_context *ce;
>   	int ret;
>   
>   	lockdep_assert_held(&i915->drm.struct_mutex);
> @@ -680,22 +680,21 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	 * GGTT space, so do this first before we reserve a seqno for
>   	 * ourselves.
>   	 */
> -	ring = intel_context_pin(ctx, engine);
> -	if (IS_ERR(ring))
> -		return ERR_CAST(ring);
> -	GEM_BUG_ON(!ring);
> +	ce = intel_context_pin(ctx, engine);
> +	if (IS_ERR(ce))
> +		return ERR_CAST(ce);
>   
>   	ret = reserve_gt(i915);
>   	if (ret)
>   		goto err_unpin;
>   
> -	ret = intel_ring_wait_for_space(ring, MIN_SPACE_FOR_ADD_REQUEST);
> +	ret = intel_ring_wait_for_space(ce->ring, MIN_SPACE_FOR_ADD_REQUEST);
>   	if (ret)
>   		goto err_unreserve;
>   
>   	/* Move our oldest request to the slab-cache (if not in use!) */
> -	rq = list_first_entry(&ring->request_list, typeof(*rq), ring_link);
> -	if (!list_is_last(&rq->ring_link, &ring->request_list) &&
> +	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
> +	if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
>   	    i915_request_completed(rq))
>   		i915_request_retire(rq);
>   
> @@ -760,8 +759,9 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	rq->i915 = i915;
>   	rq->engine = engine;
>   	rq->gem_context = ctx;
> -	rq->ring = ring;
> -	rq->timeline = ring->timeline;
> +	rq->hw_context = ce;
> +	rq->ring = ce->ring;
> +	rq->timeline = ce->ring->timeline;
>   	GEM_BUG_ON(rq->timeline == &engine->timeline);
>   
>   	spin_lock_init(&rq->lock);
> @@ -813,14 +813,14 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   		goto err_unwind;
>   
>   	/* Keep a second pin for the dual retirement along engine and ring */
> -	__intel_context_pin(rq->gem_context, engine);
> +	__intel_context_pin(ce);
>   
>   	/* Check that we didn't interrupt ourselves with a new request */
>   	GEM_BUG_ON(rq->timeline->seqno != rq->fence.seqno);
>   	return rq;
>   
>   err_unwind:
> -	rq->ring->emit = rq->head;
> +	ce->ring->emit = rq->head;
>   
>   	/* Make sure we didn't add ourselves to external state before freeing */
>   	GEM_BUG_ON(!list_empty(&rq->active_list));
> @@ -831,7 +831,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   err_unreserve:
>   	unreserve_gt(i915);
>   err_unpin:
> -	intel_context_unpin(ctx, engine);
> +	intel_context_unpin(ce);
>   	return ERR_PTR(ret);
>   }
>   
> @@ -1017,8 +1017,8 @@ i915_request_await_object(struct i915_request *to,
>   void __i915_request_add(struct i915_request *request, bool flush_caches)
>   {
>   	struct intel_engine_cs *engine = request->engine;
> -	struct intel_ring *ring = request->ring;
>   	struct i915_timeline *timeline = request->timeline;
> +	struct intel_ring *ring = request->ring;
>   	struct i915_request *prev;
>   	u32 *cs;
>   	int err;
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index dddecd9ffd0c..1bbbb7a9fa03 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -95,6 +95,7 @@ struct i915_request {
>   	 */
>   	struct i915_gem_context *gem_context;
>   	struct intel_engine_cs *engine;
> +	struct intel_context *hw_context;
>   	struct intel_ring *ring;
>   	struct i915_timeline *timeline;
>   	struct intel_signal_node signaling;
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index a1b85440ce5a..bddc57ccfa4a 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -656,6 +656,12 @@ static int init_phys_status_page(struct intel_engine_cs *engine)
>   	return 0;
>   }
>   
> +static void __intel_context_unpin(struct i915_gem_context *ctx,
> +				  struct intel_engine_cs *engine)
> +{
> +	intel_context_unpin(to_intel_context(ctx, engine));
> +}
> +
>   /**
>    * intel_engines_init_common - initialize cengine state which might require hw access
>    * @engine: Engine to initialize.
> @@ -669,7 +675,8 @@ static int init_phys_status_page(struct intel_engine_cs *engine)
>    */
>   int intel_engine_init_common(struct intel_engine_cs *engine)
>   {
> -	struct intel_ring *ring;
> +	struct drm_i915_private *i915 = engine->i915;
> +	struct intel_context *ce;
>   	int ret;
>   
>   	engine->set_default_submission(engine);
> @@ -681,18 +688,18 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>   	 * be available. To avoid this we always pin the default
>   	 * context.
>   	 */
> -	ring = intel_context_pin(engine->i915->kernel_context, engine);
> -	if (IS_ERR(ring))
> -		return PTR_ERR(ring);
> +	ce = intel_context_pin(i915->kernel_context, engine);
> +	if (IS_ERR(ce))
> +		return PTR_ERR(ce);
>   
>   	/*
>   	 * Similarly the preempt context must always be available so that
>   	 * we can interrupt the engine at any time.
>   	 */
> -	if (engine->i915->preempt_context) {
> -		ring = intel_context_pin(engine->i915->preempt_context, engine);
> -		if (IS_ERR(ring)) {
> -			ret = PTR_ERR(ring);
> +	if (i915->preempt_context) {
> +		ce = intel_context_pin(i915->preempt_context, engine);
> +		if (IS_ERR(ce)) {
> +			ret = PTR_ERR(ce);
>   			goto err_unpin_kernel;
>   		}
>   	}
> @@ -701,7 +708,7 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>   	if (ret)
>   		goto err_unpin_preempt;
>   
> -	if (HWS_NEEDS_PHYSICAL(engine->i915))
> +	if (HWS_NEEDS_PHYSICAL(i915))
>   		ret = init_phys_status_page(engine);
>   	else
>   		ret = init_status_page(engine);
> @@ -713,10 +720,11 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>   err_breadcrumbs:
>   	intel_engine_fini_breadcrumbs(engine);
>   err_unpin_preempt:
> -	if (engine->i915->preempt_context)
> -		intel_context_unpin(engine->i915->preempt_context, engine);
> +	if (i915->preempt_context)
> +		__intel_context_unpin(i915->preempt_context, engine);
> +
>   err_unpin_kernel:
> -	intel_context_unpin(engine->i915->kernel_context, engine);
> +	__intel_context_unpin(i915->kernel_context, engine);
>   	return ret;
>   }
>   
> @@ -729,6 +737,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>    */
>   void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   {
> +	struct drm_i915_private *i915 = engine->i915;
> +
>   	intel_engine_cleanup_scratch(engine);
>   
>   	if (HWS_NEEDS_PHYSICAL(engine->i915))
> @@ -743,9 +753,9 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   	if (engine->default_state)
>   		i915_gem_object_put(engine->default_state);
>   
> -	if (engine->i915->preempt_context)
> -		intel_context_unpin(engine->i915->preempt_context, engine);
> -	intel_context_unpin(engine->i915->kernel_context, engine);
> +	if (i915->preempt_context)
> +		__intel_context_unpin(i915->preempt_context, engine);
> +	__intel_context_unpin(i915->kernel_context, engine);
>   
>   	i915_timeline_fini(&engine->timeline);
>   }
> @@ -989,8 +999,8 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
>    */
>   bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
>   {
> -	const struct i915_gem_context * const kernel_context =
> -		engine->i915->kernel_context;
> +	const struct intel_context *kernel_context =
> +		to_intel_context(engine->i915->kernel_context, engine);
>   	struct i915_request *rq;
>   
>   	lockdep_assert_held(&engine->i915->drm.struct_mutex);
> @@ -1002,7 +1012,7 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
>   	 */
>   	rq = __i915_gem_active_peek(&engine->timeline.last_request);
>   	if (rq)
> -		return rq->gem_context == kernel_context;
> +		return rq->hw_context == kernel_context;
>   	else
>   		return engine->last_retired_context == kernel_context;
>   }
> @@ -1087,16 +1097,16 @@ void intel_engines_unpark(struct drm_i915_private *i915)
>    */
>   void intel_engine_lost_context(struct intel_engine_cs *engine)
>   {
> -	struct i915_gem_context *ctx;
> +	struct intel_context *ce;
>   
>   	lockdep_assert_held(&engine->i915->drm.struct_mutex);
>   
>   	engine->legacy_active_context = NULL;
>   	engine->legacy_active_ppgtt = NULL;
>   
> -	ctx = fetch_and_zero(&engine->last_retired_context);
> -	if (ctx)
> -		intel_context_unpin(ctx, engine);
> +	ce = fetch_and_zero(&engine->last_retired_context);
> +	if (ce)
> +		intel_context_unpin(ce);
>   }
>   
>   bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index cc7b0c1b5e8c..3d4aaaf74a84 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -513,9 +513,7 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
>   	struct intel_guc_client *client = guc->execbuf_client;
>   	struct intel_engine_cs *engine = rq->engine;
> -	u32 ctx_desc =
> -		lower_32_bits(intel_lr_context_descriptor(rq->gem_context,
> -							  engine));
> +	u32 ctx_desc = lower_32_bits(rq->hw_context->lrc_desc);
>   	u32 ring_tail = intel_ring_set_tail(rq->ring, rq->tail) / sizeof(u64);
>   
>   	spin_lock(&client->wq_lock);
> @@ -553,8 +551,8 @@ static void inject_preempt_context(struct work_struct *work)
>   					     preempt_work[engine->id]);
>   	struct intel_guc_client *client = guc->preempt_client;
>   	struct guc_stage_desc *stage_desc = __get_stage_desc(client);
> -	u32 ctx_desc = lower_32_bits(intel_lr_context_descriptor(client->owner,
> -								 engine));
> +	u32 ctx_desc = lower_32_bits(to_intel_context(client->owner,
> +						      engine)->lrc_desc);
>   	u32 data[7];
>   
>   	/*
> @@ -710,7 +708,7 @@ static void guc_dequeue(struct intel_engine_cs *engine)
>   		struct i915_request *rq, *rn;
>   
>   		list_for_each_entry_safe(rq, rn, &p->requests, sched.link) {
> -			if (last && rq->gem_context != last->gem_context) {
> +			if (last && rq->hw_context != last->hw_context) {
>   				if (port == last_port) {
>   					__list_del_many(&p->requests,
>   							&rq->sched.link);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 578cb89b3af7..c29d5f5582c2 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -164,7 +164,8 @@
>   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
>   
>   static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> -					    struct intel_engine_cs *engine);
> +					    struct intel_engine_cs *engine,
> +					    struct intel_context *ce);
>   static void execlists_init_reg_state(u32 *reg_state,
>   				     struct i915_gem_context *ctx,
>   				     struct intel_engine_cs *engine,
> @@ -222,9 +223,9 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>    */
>   static void
>   intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
> -				   struct intel_engine_cs *engine)
> +				   struct intel_engine_cs *engine,
> +				   struct intel_context *ce)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
>   	u64 desc;
>   
>   	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > (BIT(GEN8_CTX_ID_WIDTH)));
> @@ -416,8 +417,7 @@ execlists_update_context_pdps(struct i915_hw_ppgtt *ppgtt, u32 *reg_state)
>   
>   static u64 execlists_update_context(struct i915_request *rq)
>   {
> -	struct intel_context *ce =
> -		to_intel_context(rq->gem_context, rq->engine);
> +	struct intel_context *ce = rq->hw_context;
>   	struct i915_hw_ppgtt *ppgtt =
>   		rq->gem_context->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
>   	u32 *reg_state = ce->lrc_reg_state;
> @@ -494,14 +494,14 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>   	execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK);
>   }
>   
> -static bool ctx_single_port_submission(const struct i915_gem_context *ctx)
> +static bool ctx_single_port_submission(const struct intel_context *ce)
>   {
>   	return (IS_ENABLED(CONFIG_DRM_I915_GVT) &&
> -		i915_gem_context_force_single_submission(ctx));
> +		i915_gem_context_force_single_submission(ce->gem_context));
>   }
>   
> -static bool can_merge_ctx(const struct i915_gem_context *prev,
> -			  const struct i915_gem_context *next)
> +static bool can_merge_ctx(const struct intel_context *prev,
> +			  const struct intel_context *next)
>   {
>   	if (prev != next)
>   		return false;
> @@ -669,8 +669,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			 * second request, and so we never need to tell the
>   			 * hardware about the first.
>   			 */
> -			if (last && !can_merge_ctx(rq->gem_context,
> -						   last->gem_context)) {
> +			if (last &&
> +			    !can_merge_ctx(rq->hw_context, last->hw_context)) {
>   				/*
>   				 * If we are on the second port and cannot
>   				 * combine this request with the last, then we
> @@ -689,14 +689,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   				 * the same context (even though a different
>   				 * request) to the second port.
>   				 */
> -				if (ctx_single_port_submission(last->gem_context) ||
> -				    ctx_single_port_submission(rq->gem_context)) {
> +				if (ctx_single_port_submission(last->hw_context) ||
> +				    ctx_single_port_submission(rq->hw_context)) {
>   					__list_del_many(&p->requests,
>   							&rq->sched.link);
>   					goto done;
>   				}
>   
> -				GEM_BUG_ON(last->gem_context == rq->gem_context);
> +				GEM_BUG_ON(last->hw_context == rq->hw_context);
>   
>   				if (submit)
>   					port_assign(port, last);
> @@ -1303,6 +1303,37 @@ static void execlists_schedule(struct i915_request *request,
>   	spin_unlock_irq(&engine->timeline.lock);
>   }
>   
> +static void execlists_context_destroy(struct intel_context *ce)
> +{
> +	GEM_BUG_ON(!ce->state);
> +	GEM_BUG_ON(ce->pin_count);
> +
> +	intel_ring_free(ce->ring);
> +	__i915_gem_object_release_unless_active(ce->state->obj);
> +}
> +
> +static void __execlists_context_unpin(struct intel_context *ce)
> +{
> +	intel_ring_unpin(ce->ring);
> +
> +	ce->state->obj->pin_global--;
> +	i915_gem_object_unpin_map(ce->state->obj);
> +	i915_vma_unpin(ce->state);
> +
> +	i915_gem_context_put(ce->gem_context);
> +}
> +
> +static void execlists_context_unpin(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->gem_context->i915->drm.struct_mutex);
> +	GEM_BUG_ON(ce->pin_count == 0);
> +
> +	if (--ce->pin_count)
> +		return;
> +
> +	__execlists_context_unpin(ce);
> +}
> +
>   static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
>   {
>   	unsigned int flags;
> @@ -1326,21 +1357,15 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
>   	return i915_vma_pin(vma, 0, GEN8_LR_CONTEXT_ALIGN, flags);
>   }
>   
> -static struct intel_ring *
> -execlists_context_pin(struct intel_engine_cs *engine,
> -		      struct i915_gem_context *ctx)
> +static struct intel_context *
> +__execlists_context_pin(struct intel_engine_cs *engine,
> +			struct i915_gem_context *ctx,
> +			struct intel_context *ce)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
>   	void *vaddr;
>   	int ret;
>   
> -	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> -
> -	if (likely(ce->pin_count++))
> -		goto out;
> -	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
> -
> -	ret = execlists_context_deferred_alloc(ctx, engine);
> +	ret = execlists_context_deferred_alloc(ctx, engine, ce);
>   	if (ret)
>   		goto err;
>   	GEM_BUG_ON(!ce->state);
> @@ -1359,7 +1384,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
>   	if (ret)
>   		goto unpin_map;
>   
> -	intel_lr_context_descriptor_update(ctx, engine);
> +	intel_lr_context_descriptor_update(ctx, engine, ce);
>   
>   	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>   	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
> @@ -1368,8 +1393,7 @@ execlists_context_pin(struct intel_engine_cs *engine,
>   
>   	ce->state->obj->pin_global++;
>   	i915_gem_context_get(ctx);
> -out:
> -	return ce->ring;
> +	return ce;
>   
>   unpin_map:
>   	i915_gem_object_unpin_map(ce->state->obj);
> @@ -1380,33 +1404,33 @@ execlists_context_pin(struct intel_engine_cs *engine,
>   	return ERR_PTR(ret);
>   }
>   
> -static void execlists_context_unpin(struct intel_engine_cs *engine,
> -				    struct i915_gem_context *ctx)
> +static const struct intel_context_ops execlists_context_ops = {
> +	.unpin = execlists_context_unpin,
> +	.destroy = execlists_context_destroy,
> +};
> +
> +static struct intel_context *
> +execlists_context_pin(struct intel_engine_cs *engine,
> +		      struct i915_gem_context *ctx)
>   {
>   	struct intel_context *ce = to_intel_context(ctx, engine);
>   
>   	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> -	GEM_BUG_ON(ce->pin_count == 0);
>   
> -	if (--ce->pin_count)
> -		return;
> -
> -	intel_ring_unpin(ce->ring);
> +	if (likely(ce->pin_count++))
> +		return ce;
> +	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
>   
> -	ce->state->obj->pin_global--;
> -	i915_gem_object_unpin_map(ce->state->obj);
> -	i915_vma_unpin(ce->state);
> +	ce->ops = &execlists_context_ops;
>   
> -	i915_gem_context_put(ctx);
> +	return __execlists_context_pin(engine, ctx, ce);
>   }
>   
>   static int execlists_request_alloc(struct i915_request *request)
>   {
> -	struct intel_context *ce =
> -		to_intel_context(request->gem_context, request->engine);
>   	int ret;
>   
> -	GEM_BUG_ON(!ce->pin_count);
> +	GEM_BUG_ON(!request->hw_context->pin_count);
>   
>   	/* Flush enough space to reduce the likelihood of waiting after
>   	 * we start building the request - in which case we will just
> @@ -1857,7 +1881,7 @@ static void reset_common_ring(struct intel_engine_cs *engine,
>   	 * future request will be after userspace has had the opportunity
>   	 * to recreate its own state.
>   	 */
> -	regs = to_intel_context(request->gem_context, engine)->lrc_reg_state;
> +	regs = request->hw_context->lrc_reg_state;
>   	if (engine->default_state) {
>   		void *defaults;
>   
> @@ -2216,8 +2240,6 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->reset_hw = reset_common_ring;
>   
>   	engine->context_pin = execlists_context_pin;
> -	engine->context_unpin = execlists_context_unpin;
> -
>   	engine->request_alloc = execlists_request_alloc;
>   
>   	engine->emit_flush = gen8_emit_flush;
> @@ -2452,7 +2474,7 @@ static void execlists_init_reg_state(u32 *regs,
>   	struct drm_i915_private *dev_priv = engine->i915;
>   	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt ?: dev_priv->mm.aliasing_ppgtt;
>   	u32 base = engine->mmio_base;
> -	bool rcs = engine->id == RCS;
> +	bool rcs = engine->class == RENDER_CLASS;
>   
>   	/* A context is actually a big batch buffer with several
>   	 * MI_LOAD_REGISTER_IMM commands followed by (reg, value) pairs. The
> @@ -2597,10 +2619,10 @@ populate_lr_context(struct i915_gem_context *ctx,
>   }
>   
>   static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> -					    struct intel_engine_cs *engine)
> +					    struct intel_engine_cs *engine,
> +					    struct intel_context *ce)
>   {
>   	struct drm_i915_gem_object *ctx_obj;
> -	struct intel_context *ce = to_intel_context(ctx, engine);
>   	struct i915_vma *vma;
>   	uint32_t context_size;
>   	struct intel_ring *ring;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 4ec7d8dd13c8..1593194e930c 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -104,11 +104,4 @@ struct i915_gem_context;
>   
>   void intel_lr_context_resume(struct drm_i915_private *dev_priv);
>   
> -static inline uint64_t
> -intel_lr_context_descriptor(struct i915_gem_context *ctx,
> -			    struct intel_engine_cs *engine)
> -{
> -	return to_intel_context(ctx, engine)->lrc_desc;
> -}
> -
>   #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index fbd23127505d..526ee8302fce 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -558,8 +558,7 @@ static void reset_ring_common(struct intel_engine_cs *engine,
>   	 */
>   	if (request) {
>   		struct drm_i915_private *dev_priv = request->i915;
> -		struct intel_context *ce =
> -			to_intel_context(request->gem_context, engine);
> +		struct intel_context *ce = request->hw_context;
>   		struct i915_hw_ppgtt *ppgtt;
>   
>   		if (ce->state) {
> @@ -1169,7 +1168,31 @@ intel_ring_free(struct intel_ring *ring)
>   	kfree(ring);
>   }
>   
> -static int context_pin(struct intel_context *ce)
> +static void intel_ring_context_destroy(struct intel_context *ce)
> +{
> +	GEM_BUG_ON(ce->pin_count);
> +
> +	if (ce->state)
> +		__i915_gem_object_release_unless_active(ce->state->obj);
> +}
> +
> +static void intel_ring_context_unpin(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->gem_context->i915->drm.struct_mutex);
> +	GEM_BUG_ON(ce->pin_count == 0);
> +
> +	if (--ce->pin_count)
> +		return;
> +
> +	if (ce->state) {
> +		ce->state->obj->pin_global--;
> +		i915_vma_unpin(ce->state);
> +	}
> +
> +	i915_gem_context_put(ce->gem_context);
> +}
> +
> +static int __context_pin(struct intel_context *ce)
>   {
>   	struct i915_vma *vma = ce->state;
>   	int ret;
> @@ -1258,25 +1281,19 @@ alloc_context_vma(struct intel_engine_cs *engine)
>   	return ERR_PTR(err);
>   }
>   
> -static struct intel_ring *
> -intel_ring_context_pin(struct intel_engine_cs *engine,
> -		       struct i915_gem_context *ctx)
> +static struct intel_context *
> +__ring_context_pin(struct intel_engine_cs *engine,
> +		   struct i915_gem_context *ctx,
> +		   struct intel_context *ce)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
> -	int ret;
> -
> -	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> -
> -	if (likely(ce->pin_count++))
> -		goto out;
> -	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
> +	int err;
>   
>   	if (!ce->state && engine->context_size) {
>   		struct i915_vma *vma;
>   
>   		vma = alloc_context_vma(engine);
>   		if (IS_ERR(vma)) {
> -			ret = PTR_ERR(vma);
> +			err = PTR_ERR(vma);
>   			goto err;
>   		}
>   
> @@ -1284,8 +1301,8 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
>   	}
>   
>   	if (ce->state) {
> -		ret = context_pin(ce);
> -		if (ret)
> +		err = __context_pin(ce);
> +		if (err)
>   			goto err;
>   
>   		ce->state->obj->pin_global++;
> @@ -1293,32 +1310,37 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
>   
>   	i915_gem_context_get(ctx);
>   
> -out:
>   	/* One ringbuffer to rule them all */
> -	return engine->buffer;
> +	GEM_BUG_ON(!engine->buffer);
> +	ce->ring = engine->buffer;
> +
> +	return ce;
>   
>   err:
>   	ce->pin_count = 0;
> -	return ERR_PTR(ret);
> +	return ERR_PTR(err);
>   }
>   
> -static void intel_ring_context_unpin(struct intel_engine_cs *engine,
> -				     struct i915_gem_context *ctx)
> +static const struct intel_context_ops ring_context_ops = {
> +	.unpin = intel_ring_context_unpin,
> +	.destroy = intel_ring_context_destroy,
> +};
> +
> +static struct intel_context *
> +intel_ring_context_pin(struct intel_engine_cs *engine,
> +		       struct i915_gem_context *ctx)
>   {
>   	struct intel_context *ce = to_intel_context(ctx, engine);
>   
>   	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> -	GEM_BUG_ON(ce->pin_count == 0);
>   
> -	if (--ce->pin_count)
> -		return;
> +	if (likely(ce->pin_count++))
> +		return ce;
> +	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
>   
> -	if (ce->state) {
> -		ce->state->obj->pin_global--;
> -		i915_vma_unpin(ce->state);
> -	}
> +	ce->ops = &ring_context_ops;
>   
> -	i915_gem_context_put(ctx);
> +	return __ring_context_pin(engine, ctx, ce);
>   }
>   
>   static int intel_init_ring_buffer(struct intel_engine_cs *engine)
> @@ -1329,10 +1351,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
>   
>   	intel_engine_setup_common(engine);
>   
> -	err = intel_engine_init_common(engine);
> -	if (err)
> -		goto err;
> -
>   	timeline = i915_timeline_create(engine->i915, engine->name);
>   	if (IS_ERR(timeline)) {
>   		err = PTR_ERR(timeline);
> @@ -1354,8 +1372,14 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(engine->buffer);
>   	engine->buffer = ring;
>   
> +	err = intel_engine_init_common(engine);
> +	if (err)
> +		goto err_unpin;
> +
>   	return 0;
>   
> +err_unpin:
> +	intel_ring_unpin(ring);
>   err_ring:
>   	intel_ring_free(ring);
>   err:
> @@ -1441,7 +1465,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   
>   	*cs++ = MI_NOOP;
>   	*cs++ = MI_SET_CONTEXT;
> -	*cs++ = i915_ggtt_offset(to_intel_context(rq->gem_context, engine)->state) | flags;
> +	*cs++ = i915_ggtt_offset(rq->hw_context->state) | flags;
>   	/*
>   	 * w/a: MI_SET_CONTEXT must always be followed by MI_NOOP
>   	 * WaMiSetContext_Hang:snb,ivb,vlv
> @@ -1532,7 +1556,7 @@ static int switch_context(struct i915_request *rq)
>   		hw_flags = MI_FORCE_RESTORE;
>   	}
>   
> -	if (to_intel_context(to_ctx, engine)->state &&
> +	if (rq->hw_context->state &&
>   	    (to_ctx != from_ctx || hw_flags & MI_FORCE_RESTORE)) {
>   		GEM_BUG_ON(engine->id != RCS);
>   
> @@ -1580,7 +1604,7 @@ static int ring_request_alloc(struct i915_request *request)
>   {
>   	int ret;
>   
> -	GEM_BUG_ON(!to_intel_context(request->gem_context, request->engine)->pin_count);
> +	GEM_BUG_ON(!request->hw_context->pin_count);
>   
>   	/* Flush enough space to reduce the likelihood of waiting after
>   	 * we start building the request - in which case we will just
> @@ -2009,8 +2033,6 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>   	engine->reset_hw = reset_ring_common;
>   
>   	engine->context_pin = intel_ring_context_pin;
> -	engine->context_unpin = intel_ring_context_unpin;
> -
>   	engine->request_alloc = ring_request_alloc;
>   
>   	engine->emit_breadcrumb = i9xx_emit_breadcrumb;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index c4e56044e34f..5e78ee3f5775 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -431,10 +431,9 @@ struct intel_engine_cs {
>   
>   	void		(*set_default_submission)(struct intel_engine_cs *engine);
>   
> -	struct intel_ring *(*context_pin)(struct intel_engine_cs *engine,
> -					  struct i915_gem_context *ctx);
> -	void		(*context_unpin)(struct intel_engine_cs *engine,
> -					 struct i915_gem_context *ctx);
> +	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
> +					     struct i915_gem_context *ctx);
> +
>   	int		(*request_alloc)(struct i915_request *rq);
>   	int		(*init_context)(struct i915_request *rq);
>   
> @@ -550,7 +549,7 @@ struct intel_engine_cs {
>   	 * to the kernel context and trash it as the save may not happen
>   	 * before the hardware is powered down.
>   	 */
> -	struct i915_gem_context *last_retired_context;
> +	struct intel_context *last_retired_context;
>   
>   	/* We track the current MI_SET_CONTEXT in order to eliminate
>   	 * redudant context switches. This presumes that requests are not
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 501becc47c0c..8904f1ce64e3 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -30,6 +30,7 @@ mock_context(struct drm_i915_private *i915,
>   	     const char *name)
>   {
>   	struct i915_gem_context *ctx;
> +	unsigned int n;
>   	int ret;
>   
>   	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> @@ -43,6 +44,12 @@ mock_context(struct drm_i915_private *i915,
>   	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
>   	INIT_LIST_HEAD(&ctx->handles_list);
>   
> +	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
> +		struct intel_context *ce = &ctx->__engine[n];
> +
> +		ce->gem_context = ctx;
> +	}
> +
>   	ret = ida_simple_get(&i915->contexts.hw_ida,
>   			     0, MAX_CONTEXT_HW_ID, GFP_KERNEL);
>   	if (ret < 0)
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 26bf29d97007..33eddfc1f8ce 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -72,25 +72,37 @@ static void hw_delay_complete(struct timer_list *t)
>   	spin_unlock(&engine->hw_lock);
>   }
>   
> -static struct intel_ring *
> -mock_context_pin(struct intel_engine_cs *engine,
> -		 struct i915_gem_context *ctx)
> +static void mock_context_unpin(struct intel_context *ce)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
> +	if (--ce->pin_count)
> +		return;
>   
> -	if (!ce->pin_count++)
> -		i915_gem_context_get(ctx);
> +	i915_gem_context_put(ce->gem_context);
> +}
>   
> -	return engine->buffer;
> +static void mock_context_destroy(struct intel_context *ce)
> +{
> +	GEM_BUG_ON(ce->pin_count);
>   }
>   
> -static void mock_context_unpin(struct intel_engine_cs *engine,
> -			       struct i915_gem_context *ctx)
> +static const struct intel_context_ops mock_context_ops = {
> +	.unpin = mock_context_unpin,
> +	.destroy = mock_context_destroy,
> +};
> +
> +static struct intel_context *
> +mock_context_pin(struct intel_engine_cs *engine,
> +		 struct i915_gem_context *ctx)
>   {
>   	struct intel_context *ce = to_intel_context(ctx, engine);
>   
> -	if (!--ce->pin_count)
> -		i915_gem_context_put(ctx);
> +	if (!ce->pin_count++) {
> +		i915_gem_context_get(ctx);
> +		ce->ring = engine->buffer;
> +		ce->ops = &mock_context_ops;
> +	}
> +
> +	return ce;
>   }
>   
>   static int mock_request_alloc(struct i915_request *request)
> @@ -185,7 +197,6 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.status_page.page_addr = (void *)(engine + 1);
>   
>   	engine->base.context_pin = mock_context_pin;
> -	engine->base.context_unpin = mock_context_unpin;
>   	engine->base.request_alloc = mock_request_alloc;
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_breadcrumb = mock_emit_breadcrumb;
> @@ -238,11 +249,13 @@ void mock_engine_free(struct intel_engine_cs *engine)
>   {
>   	struct mock_engine *mock =
>   		container_of(engine, typeof(*mock), base);
> +	struct intel_context *ce;
>   
>   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
>   
> -	if (engine->last_retired_context)
> -		intel_context_unpin(engine->last_retired_context, engine);
> +	ce = fetch_and_zero(&engine->last_retired_context);
> +	if (ce)
> +		intel_context_unpin(ce);
>   
>   	mock_ring_free(engine->buffer);
>   
> 

For other than GVT parts:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2018-05-04 10:31 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-03  6:36 [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Chris Wilson
2018-05-03  6:36 ` [PATCH 02/71] drm/i915/execlists: Emit i915_trace_request_out for preemption Chris Wilson
2018-05-03  6:36 ` [PATCH 03/71] drm/i915: Lazily unbind vma on close Chris Wilson
2018-05-03 16:59   ` Tvrtko Ursulin
2018-05-03  6:36 ` [PATCH 04/71] drm/i915: Keep one request in our ring_list Chris Wilson
2018-05-03 17:04   ` Tvrtko Ursulin
2018-05-03  6:36 ` [PATCH 05/71] drm/i915/execlists: Disable submission tasklets when rescheduling Chris Wilson
2018-05-03 17:49   ` Tvrtko Ursulin
2018-05-03 19:50     ` Chris Wilson
2018-05-04  9:15       ` Tvrtko Ursulin
2018-05-04  9:31         ` Chris Wilson
2018-05-03  6:36 ` [PATCH 06/71] drm/i915: Detect if we missed kicking the execlists tasklet Chris Wilson
2018-05-03 13:08   ` Chris Wilson
2018-05-03  6:36 ` [PATCH 07/71] drm/i915: Move request->ctx aside Chris Wilson
2018-05-03  6:36 ` [PATCH 08/71] drm/i915: Move fiddling with engine->last_retired_context Chris Wilson
2018-05-03  6:36 ` [PATCH 09/71] drm/i915: Store a pointer to intel_context in i915_request Chris Wilson
2018-05-04 10:31   ` Tvrtko Ursulin
2018-05-03  6:36 ` [PATCH 10/71] drm/i915/execlists: Refactor out complete_preempt_context() Chris Wilson
2018-05-03  6:36 ` [PATCH 11/71] drm/i915: Move engine reset prepare/finish to backends Chris Wilson
2018-05-03  6:36 ` [PATCH 12/71] drm/i915: Split execlists/guc reset preparations Chris Wilson
2018-05-03  6:36 ` [PATCH 13/71] drm/i915/execlists: Flush pending preemption events during reset Chris Wilson
2018-05-03  6:37 ` [PATCH 14/71] drm/i915: Combine tasklet_kill and tasklet_disable Chris Wilson
2018-05-03  6:37 ` [PATCH 15/71] drm/i915: Stop parking the signaler around reset Chris Wilson
2018-05-03  6:37 ` [PATCH 16/71] drm/i915: Be irqsafe inside reset Chris Wilson
2018-05-03  6:37 ` [PATCH 17/71] drm/i915/execlists: Make submission tasklet hardirq safe Chris Wilson
2018-05-03  6:37 ` [PATCH 18/71] drm/i915/guc: " Chris Wilson
2018-05-03  6:37 ` [PATCH 19/71] drm/i915: Allow init_breadcrumbs to be used from irq context Chris Wilson
2018-05-03  6:37 ` [PATCH 20/71] drm/i915/execlists: Force preemption via reset on timeout Chris Wilson
2018-05-03  6:37 ` [PATCH 21/71] drm/i915/execlists: Try preempt-reset from hardirq timer context Chris Wilson
2018-05-03  6:37 ` [PATCH 22/71] drm/i915/preemption: Select timeout when scheduling Chris Wilson
2018-05-03  6:37 ` [PATCH 23/71] drm/i915: Use a preemption timeout to enforce interactivity Chris Wilson
2018-05-03  6:37 ` [PATCH 24/71] drm/i915: Allow user control over preempt timeout on their important context Chris Wilson
2018-05-03  6:37 ` [PATCH 25/71] drm/i915: Disable preemption and sleeping while using the punit sideband Chris Wilson
2018-05-03  6:37 ` [PATCH 26/71] drm/i915: Lift acquiring the vlv punit magic to a common sb-get Chris Wilson
2018-05-03  6:37 ` [PATCH 27/71] drm/i915: Lift sideband locking for vlv_punit_(read|write) Chris Wilson
2018-05-03  6:37 ` [PATCH 28/71] drm/i915: Reduce RPS update frequency on Valleyview/Cherryview Chris Wilson
2018-05-03  6:37 ` [PATCH 29/71] Revert "drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3" Chris Wilson
2018-05-03  6:37 ` [PATCH 30/71] drm/i915: Replace pcu_lock with sb_lock Chris Wilson
2018-05-03  6:37 ` [PATCH 31/71] drm/i915: Separate sideband declarations to intel_sideband.h Chris Wilson
2018-05-03  6:37 ` [PATCH 32/71] drm/i915: Merge sbi read/write into a single accessor Chris Wilson
2018-05-03  6:37 ` [PATCH 33/71] drm/i915: Merge sandybridge_pcode_(read|write) Chris Wilson
2018-05-03  6:37 ` [PATCH 34/71] drm/i915: Move sandybride pcode access to intel_sideband.c Chris Wilson
2018-05-03  6:37 ` [PATCH 35/71] drm/i915: Mark up Ironlake ips with rpm wakerefs Chris Wilson
2018-05-03  6:37 ` [PATCH 36/71] drm/i915: Record logical context support in driver caps Chris Wilson
2018-05-03  6:37 ` [PATCH 37/71] drm/i915: Generalize i915_gem_sanitize() to reset contexts Chris Wilson
2018-05-03  6:37 ` [PATCH 38/71] drm/i915: Enable render context support for Ironlake (gen5) Chris Wilson
2018-05-03  8:47   ` Chris Wilson
2018-05-03  6:37 ` [PATCH 39/71] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga) Chris Wilson
2018-05-03  6:37 ` [PATCH 40/71] drm/i915: Split GT powermanagement functions to intel_gt_pm.c Chris Wilson
2018-05-03  6:37 ` [PATCH 41/71] drm/i915: Move rps worker " Chris Wilson
2018-05-03  6:37 ` [PATCH 42/71] drm/i915: Move all the RPS irq handlers to intel_gt_pm Chris Wilson
2018-05-03  6:37 ` [PATCH 43/71] drm/i915: Track HAS_RPS alongside HAS_RC6 in the device info Chris Wilson
2018-05-03  6:37 ` [PATCH 44/71] drm/i915: Remove defunct intel_suspend_gt_powersave() Chris Wilson
2018-05-03  6:37 ` [PATCH 45/71] drm/i915: Reorder GT interface code Chris Wilson
2018-05-03  6:37 ` [PATCH 46/71] drm/i915: Split control of rps and rc6 Chris Wilson
2018-05-03  6:37 ` [PATCH 47/71] drm/i915: Enabling rc6 and rps have different requirements, so separate them Chris Wilson
2018-05-03  6:37 ` [PATCH 48/71] drm/i915: Simplify rc6/rps enabling Chris Wilson
2018-05-03  6:37 ` [PATCH 49/71] drm/i915: Refactor frequency bounds computation Chris Wilson
2018-05-03  6:37 ` [PATCH 50/71] drm/i915: Rename rps min/max frequencies Chris Wilson
2018-05-03  6:37 ` [PATCH 51/71] drm/i915: Pull IPS into GT power management Chris Wilson
2018-05-03 10:13 ` [PATCH 01/71] drm/i915/execlists: Drop preemption arbitrations points along the ring Lionel Landwerlin
2018-05-03 10:18   ` Chris Wilson
2018-05-03 10:28     ` Lionel Landwerlin
2018-05-03 10:38       ` Chris Wilson
2018-05-03 16:37 ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.