Haswell full-ppgtt, no really

All of lore.kernel.org
 help / color / mirror / Atom feed

* Haswell full-ppgtt, no really
@ 2018-06-08 12:55 Chris Wilson
  2018-06-08 12:55 ` [PATCH 01/18] drm/i915: Apply batch location restrictions before pinning Chris Wilson
                   ` (22 more replies)
  0 siblings, 23 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

The GPU hangs in mesa (piglit at least) were resolved, and GPU reset
should now be operational. So as far as CI goes, we should have a clean
bill of health. There is still one outstanding issue as Baytail still
has the habit of writing to somewhere other than the intended mm.
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/18] drm/i915: Apply batch location restrictions before pinning
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 02/18] drm/i915/ringbuffer: Brute force context restore Chris Wilson
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

We special case the position of the batch within the GTT to prevent
negative self-relocation deltas from underflowing. However, that
restriction is being applied after a trial pin of the batch in its
current position. Thus we are not rejecting an invalid location if the
batch has been before, leading to an assertion if we happen to need to
rearrange the entire payload. In the worst case, this may cause a GPU
hang on gen7 or perhaps missing state.

References: https://bugs.freedesktop.org/show_bug.cgi?id=105720
Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Martin Peres <martin.peres@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 49 ++++++++++++----------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index eefd449502e2..2d2eb3075960 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -489,7 +489,9 @@ eb_validate_vma(struct i915_execbuffer *eb,
 }
 
 static int
-eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma)
+eb_add_vma(struct i915_execbuffer *eb,
+	   unsigned int i, unsigned batch_idx,
+	   struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
 	int err;
@@ -522,6 +524,24 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma)
 	eb->flags[i] = entry->flags;
 	vma->exec_flags = &eb->flags[i];
 
+	/*
+	 * SNA is doing fancy tricks with compressing batch buffers, which leads
+	 * to negative relocation deltas. Usually that works out ok since the
+	 * relocate address is still positive, except when the batch is placed
+	 * very low in the GTT. Ensure this doesn't happen.
+	 *
+	 * Note that actual hangs have only been observed on gen7, but for
+	 * paranoia do it everywhere.
+	 */
+	if (i == batch_idx) {
+		if (!(eb->flags[i] & EXEC_OBJECT_PINNED))
+			eb->flags[i] |= __EXEC_OBJECT_NEEDS_BIAS;
+		if (eb->reloc_cache.has_fence)
+			eb->flags[i] |= EXEC_OBJECT_NEEDS_FENCE;
+
+		eb->batch = vma;
+	}
+
 	err = 0;
 	if (eb_pin_vma(eb, entry, vma)) {
 		if (entry->offset != vma->node.start) {
@@ -716,7 +736,7 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct radix_tree_root *handles_vma = &eb->ctx->handles_vma;
 	struct drm_i915_gem_object *obj;
-	unsigned int i;
+	unsigned int i, batch;
 	int err;
 
 	if (unlikely(i915_gem_context_is_closed(eb->ctx)))
@@ -728,6 +748,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 	INIT_LIST_HEAD(&eb->relocs);
 	INIT_LIST_HEAD(&eb->unbound);
 
+	batch = eb_batch_index(eb);
+
 	for (i = 0; i < eb->buffer_count; i++) {
 		u32 handle = eb->exec[i].handle;
 		struct i915_lut_handle *lut;
@@ -770,33 +792,16 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 		lut->handle = handle;
 
 add_vma:
-		err = eb_add_vma(eb, i, vma);
+		err = eb_add_vma(eb, i, batch, vma);
 		if (unlikely(err))
 			goto err_vma;
 
 		GEM_BUG_ON(vma != eb->vma[i]);
 		GEM_BUG_ON(vma->exec_flags != &eb->flags[i]);
+		GEM_BUG_ON(drm_mm_node_allocated(&vma->node) &&
+			   eb_vma_misplaced(&eb->exec[i], vma, eb->flags[i]));
 	}
 
-	/* take note of the batch buffer before we might reorder the lists */
-	i = eb_batch_index(eb);
-	eb->batch = eb->vma[i];
-	GEM_BUG_ON(eb->batch->exec_flags != &eb->flags[i]);
-
-	/*
-	 * SNA is doing fancy tricks with compressing batch buffers, which leads
-	 * to negative relocation deltas. Usually that works out ok since the
-	 * relocate address is still positive, except when the batch is placed
-	 * very low in the GTT. Ensure this doesn't happen.
-	 *
-	 * Note that actual hangs have only been observed on gen7, but for
-	 * paranoia do it everywhere.
-	 */
-	if (!(eb->flags[i] & EXEC_OBJECT_PINNED))
-		eb->flags[i] |= __EXEC_OBJECT_NEEDS_BIAS;
-	if (eb->reloc_cache.has_fence)
-		eb->flags[i] |= EXEC_OBJECT_NEEDS_FENCE;
-
 	eb->args->flags |= __EXEC_VALIDATED;
 	return eb_reserve(eb);
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/18] drm/i915/ringbuffer: Brute force context restore
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
  2018-06-08 12:55 ` [PATCH 01/18] drm/i915: Apply batch location restrictions before pinning Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 13:52   ` Mika Kuoppala
  2018-06-08 12:55 ` [PATCH 03/18] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

An issue encountered with switching mm on gen7 is that the GPU likes to
hang (with the VS unit busy) when told to force restore the current
context. We can simply workaround this by substituting the
MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
forcing the context to be saved and restored; thereby reloading the
PP_DIR registers and updating the modified page directory!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 30 ++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 65811e2fa7da..332d97bc5c27 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1458,6 +1458,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		(HAS_LEGACY_SEMAPHORES(i915) && IS_GEN7(i915)) ?
 		INTEL_INFO(i915)->num_rings - 1 :
 		0;
+	bool force_restore = false;
 	int len;
 	u32 *cs;
 
@@ -1471,6 +1472,12 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 	len = 4;
 	if (IS_GEN7(i915))
 		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
+	if (flags & MI_FORCE_RESTORE) {
+		GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
+		flags &= ~MI_FORCE_RESTORE;
+		force_restore = true;
+		len += 2;
+	}
 
 	cs = intel_ring_begin(rq, len);
 	if (IS_ERR(cs))
@@ -1496,6 +1503,20 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 	}
 
 	*cs++ = MI_NOOP;
+	if (force_restore) {
+		/*
+		 * The HW doesn't handle being told to restore the current
+		 * context very well. Quite often it likes goes to go off and
+		 * sulk, especially when it is meant to be reloading PP_DIR.
+		 * A very simple fix to force the reload is to simply switch
+		 * away from the current context and back again.
+		 */
+		*cs++ = MI_SET_CONTEXT;
+		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
+							  engine)->state) |
+			MI_MM_SPACE_GTT |
+			MI_RESTORE_INHIBIT;
+	}
 	*cs++ = MI_SET_CONTEXT;
 	*cs++ = i915_ggtt_offset(rq->hw_context->state) | flags;
 	/*
@@ -1585,11 +1606,14 @@ static int switch_context(struct i915_request *rq)
 
 		to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
 		engine->legacy_active_ppgtt = to_mm;
-		hw_flags = MI_FORCE_RESTORE;
+
+		if (to_ctx == from_ctx) {
+			hw_flags = MI_FORCE_RESTORE;
+			from_ctx = NULL;
+		}
 	}
 
-	if (rq->hw_context->state &&
-	    (to_ctx != from_ctx || hw_flags & MI_FORCE_RESTORE)) {
+	if (rq->hw_context->state && to_ctx != from_ctx) {
 		GEM_BUG_ON(engine->id != RCS);
 
 		/*
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/18] drm/i915/ringbuffer: Fix context restore upon reset
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
  2018-06-08 12:55 ` [PATCH 01/18] drm/i915: Apply batch location restrictions before pinning Chris Wilson
  2018-06-08 12:55 ` [PATCH 02/18] drm/i915/ringbuffer: Brute force context restore Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 14:26   ` Chris Wilson
  2018-06-08 17:26   ` [PATCH v2] " Chris Wilson
  2018-06-08 12:55 ` [PATCH 04/18] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories Chris Wilson
                   ` (19 subsequent siblings)
  22 siblings, 2 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

The discovery with trying to enable full-ppgtt was that we were
completely failing to the load both the mm and context following the
reset. Although we were performing mmio to set the PP_DIR (per-process
GTT) and CCID (context), these were taking no effect (the assumption was
that this would trigger reload of the context and restore the page
tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
following context switch would anything occur.

Since we are then required to reset the context image and PP_DIR using
CS commands, we place those commands into every batch. The hardware
should recognise the no-ops and eliminate the expensive context loads,
but we still have to pay the cost of using cross-powerwell register
writes. In practice, this has no effect on actual context switch times,
and only adds a few hundred nanoseconds to no-op switches. We can improve
the latter by eliminating the w/a around known no-op switches, but there
is an ulterior motive to keeping them.

Always emitting the context switch at the beginning of the request (and
relying on HW to skip unneeded switches) does have one key advantage.
Should we implement request reordering on Haswell, we will not know in
advance what the previous executing context was on the GPU and so we
would not be able to elide the MI_SET_CONTEXT commands ourselves and
always have to emit them. Having our hand forced now actually prepares
us for later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c     |  2 +
 drivers/gpu/drm/i915/i915_request.h     |  3 +
 drivers/gpu/drm/i915/intel_engine_cs.c  |  3 -
 drivers/gpu/drm/i915/intel_ringbuffer.c | 75 ++++++++-----------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  9 ---
 5 files changed, 28 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f187250e60c6..9092f5464c24 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -817,6 +817,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	/* Keep a second pin for the dual retirement along engine and ring */
 	__intel_context_pin(ce);
 
+	rq->infix = rq->ring->emit; /* end of header; start of user payload */
+
 	/* Check that we didn't interrupt ourselves with a new request */
 	GEM_BUG_ON(rq->timeline->seqno != rq->fence.seqno);
 	return rq;
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 491ff81d0fea..0e9aba53d0e4 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -134,6 +134,9 @@ struct i915_request {
 	/** Position in the ring of the start of the request */
 	u32 head;
 
+	/** Position in the ring of the start of the user packets */
+	u32 infix;
+
 	/**
 	 * Position in the ring of the start of the postfix.
 	 * This is required to calculate the maximum available ring space
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2ec2e60dc670..d1cf8b4926ab 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1168,9 +1168,6 @@ void intel_engine_lost_context(struct intel_engine_cs *engine)
 
 	lockdep_assert_held(&engine->i915->drm.struct_mutex);
 
-	engine->legacy_active_context = NULL;
-	engine->legacy_active_ppgtt = NULL;
-
 	ce = fetch_and_zero(&engine->last_retired_context);
 	if (ce)
 		intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 332d97bc5c27..1b3805adbd57 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -541,6 +541,21 @@ static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
 	return i915_gem_find_active_request(engine);
 }
 
+static void skip_request(struct i915_request *request)
+{
+	void *vaddr = request->ring->vaddr;
+	u32 head;
+
+	head = request->infix;
+	if (request->postfix < head) {
+		memset32(vaddr + head, MI_NOOP,
+			 (request->ring->size - head) / sizeof(u32));
+		head = 0;
+	}
+	memset32(vaddr + head, MI_NOOP,
+		 (request->postfix - head) / sizeof(u32));
+}
+
 static void reset_ring(struct intel_engine_cs *engine,
 		       struct i915_request *request)
 {
@@ -570,42 +585,10 @@ static void reset_ring(struct intel_engine_cs *engine,
 	 * the restored context.
 	 */
 	if (request) {
-		struct drm_i915_private *dev_priv = request->i915;
-		struct intel_context *ce = request->hw_context;
-		struct i915_hw_ppgtt *ppgtt;
-
-		if (ce->state) {
-			I915_WRITE(CCID,
-				   i915_ggtt_offset(ce->state) |
-				   BIT(8) /* must be set! */ |
-				   CCID_EXTENDED_STATE_SAVE |
-				   CCID_EXTENDED_STATE_RESTORE |
-				   CCID_EN);
-		}
-
-		ppgtt = request->gem_context->ppgtt ?: engine->i915->mm.aliasing_ppgtt;
-		if (ppgtt) {
-			u32 pd_offset = ppgtt->pd.base.ggtt_offset << 10;
-
-			I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
-			I915_WRITE(RING_PP_DIR_BASE(engine), pd_offset);
-
-			/* Wait for the PD reload to complete */
-			if (intel_wait_for_register(dev_priv,
-						    RING_PP_DIR_BASE(engine),
-						    BIT(0), 0,
-						    10))
-				DRM_ERROR("Wait for reload of ppgtt page-directory timed out\n");
-
-			ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
-		}
-
 		/* If the rq hung, jump to its breadcrumb and skip the batch */
+		request->ring->head = request->head;
 		if (request->fence.error == -EIO)
-			request->ring->head = request->postfix;
-	} else {
-		engine->legacy_active_context = NULL;
-		engine->legacy_active_ppgtt = NULL;
+			skip_request(request);
 	}
 }
 
@@ -1589,31 +1572,25 @@ static int switch_context(struct i915_request *rq)
 	struct i915_gem_context *to_ctx = rq->gem_context;
 	struct i915_hw_ppgtt *to_mm =
 		to_ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
-	struct i915_gem_context *from_ctx = engine->legacy_active_context;
-	struct i915_hw_ppgtt *from_mm = engine->legacy_active_ppgtt;
 	u32 hw_flags = 0;
 	int ret, i;
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
 	GEM_BUG_ON(HAS_EXECLISTS(rq->i915));
 
-	if (to_mm != from_mm ||
-	    (to_mm && intel_engine_flag(engine) & to_mm->pd_dirty_rings)) {
+	if (to_mm) {
 		trace_switch_mm(engine, to_ctx);
 		ret = to_mm->switch_mm(to_mm, rq);
 		if (ret)
 			goto err;
 
-		to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
-		engine->legacy_active_ppgtt = to_mm;
-
-		if (to_ctx == from_ctx) {
+		if (intel_engine_flag(engine) & to_mm->pd_dirty_rings) {
 			hw_flags = MI_FORCE_RESTORE;
-			from_ctx = NULL;
+			to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
 		}
 	}
 
-	if (rq->hw_context->state && to_ctx != from_ctx) {
+	if (rq->hw_context->state) {
 		GEM_BUG_ON(engine->id != RCS);
 
 		/*
@@ -1628,9 +1605,7 @@ static int switch_context(struct i915_request *rq)
 
 		ret = mi_set_context(rq, hw_flags);
 		if (ret)
-			goto err_mm;
-
-		engine->legacy_active_context = to_ctx;
+			goto err;
 	}
 
 	if (to_ctx->remap_slice) {
@@ -1640,7 +1615,7 @@ static int switch_context(struct i915_request *rq)
 
 			ret = remap_l3(rq, i);
 			if (ret)
-				goto err_ctx;
+				goto err;
 		}
 
 		to_ctx->remap_slice = 0;
@@ -1648,10 +1623,6 @@ static int switch_context(struct i915_request *rq)
 
 	return 0;
 
-err_ctx:
-	engine->legacy_active_context = from_ctx;
-err_mm:
-	engine->legacy_active_ppgtt = from_mm;
 err:
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index acef385c4c80..b44c67849749 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -557,15 +557,6 @@ struct intel_engine_cs {
 	 */
 	struct intel_context *last_retired_context;
 
-	/* We track the current MI_SET_CONTEXT in order to eliminate
-	 * redudant context switches. This presumes that requests are not
-	 * reordered! Or when they are the tracking is updated along with
-	 * the emission of individual requests into the legacy command
-	 * stream (ring).
-	 */
-	struct i915_gem_context *legacy_active_context;
-	struct i915_hw_ppgtt *legacy_active_ppgtt;
-
 	/* status_notifier: list of callbacks for context-switch changes */
 	struct atomic_notifier_head context_status_notifier;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/18] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (2 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 03/18] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 05/18] drm/i915/gtt: Subclass gen6_hw_ppgtt Chris Wilson
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

When we update the gen6 ppgtt page directories, we do so by writing the
new address into a reserved slot in the GGTT. It appears that when the
GPU reads that entry from the gsm, it uses its small cache and that we
need to invalidate that cache after writing. We don't see an issue
currently as we prefill the ppgtt page directories on creation; and only
create the single aliasing_ppgtt long before we start using the GGTT
(and so before the cache mayhave a conflicting entry).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6ac6520b6e9c..091251c0e7fc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1693,8 +1693,8 @@ static inline void gen6_write_pde(const struct i915_hw_ppgtt *ppgtt,
 				  const struct i915_page_table *pt)
 {
 	/* Caller needs to make sure the write completes if necessary */
-	writel_relaxed(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
-		       ppgtt->pd_addr + pde);
+	iowrite32(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
+		  ppgtt->pd_addr + pde);
 }
 
 /* Write all the page tables found in the ppgtt structure to incrementing page
@@ -1709,7 +1709,7 @@ static void gen6_write_page_range(struct i915_hw_ppgtt *ppgtt,
 		gen6_write_pde(ppgtt, pde, pt);
 
 	mark_tlbs_dirty(ppgtt);
-	wmb();
+	gen6_ggtt_invalidate(ppgtt->vm.i915);
 }
 
 static inline u32 get_pd_offset(struct i915_hw_ppgtt *ppgtt)
@@ -1925,7 +1925,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 	if (flush) {
 		mark_tlbs_dirty(ppgtt);
-		wmb();
+		gen6_ggtt_invalidate(ppgtt->vm.i915);
 	}
 
 	return 0;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/18] drm/i915/gtt: Subclass gen6_hw_ppgtt
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (3 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 04/18] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 06/18] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create Chris Wilson
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

The legacy gen6 ppgtt needs a little more hand holding than gen8+, and
so requires a larger structure. As I intend to make this slightly more
complicated in the future, separate the gen6 from the core gen8 hw
struct by subclassing. This patch moves the gen6 only features out to
gen6_hw_ppgtt and pipes the new type everywhere that needs it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 106 +++++++++---------
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  21 +++-
 drivers/gpu/drm/i915/intel_ringbuffer.c       |   9 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   4 -
 4 files changed, 75 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 091251c0e7fc..49e02dee07e0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1636,20 +1636,20 @@ static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915)
 	return ERR_PTR(err);
 }
 
-static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
+static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
-	struct i915_address_space *vm = &ppgtt->vm;
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+	struct i915_address_space *vm = &base->vm;
 	struct i915_page_table *unused;
 	gen6_pte_t scratch_pte;
 	u32 pd_entry, pte, pde;
-	u32 start = 0, length = ppgtt->vm.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
 
-	gen6_for_each_pde(unused, &ppgtt->pd, start, length, pde) {
+	gen6_for_all_pdes(unused, &base->pd, pde) {
 		u32 expected;
 		gen6_pte_t *pt_vaddr;
-		const dma_addr_t pt_addr = px_dma(ppgtt->pd.page_table[pde]);
+		const dma_addr_t pt_addr = px_dma(base->pd.page_table[pde]);
 		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -1660,7 +1660,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic_px(ppgtt->pd.page_table[pde]);
+		pt_vaddr = kmap_atomic_px(base->pd.page_table[pde]);
 
 		for (pte = 0; pte < GEN6_PTES; pte+=4) {
 			unsigned long va =
@@ -1688,7 +1688,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 }
 
 /* Write pde (index) from the page directory @pd to the page table @pt */
-static inline void gen6_write_pde(const struct i915_hw_ppgtt *ppgtt,
+static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt,
 				  const unsigned int pde,
 				  const struct i915_page_table *pt)
 {
@@ -1699,26 +1699,27 @@ static inline void gen6_write_pde(const struct i915_hw_ppgtt *ppgtt,
 
 /* Write all the page tables found in the ppgtt structure to incrementing page
  * directories. */
-static void gen6_write_page_range(struct i915_hw_ppgtt *ppgtt,
+static void gen6_write_page_range(struct i915_hw_ppgtt *base,
 				  u32 start, u32 length)
 {
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 	struct i915_page_table *pt;
 	unsigned int pde;
 
-	gen6_for_each_pde(pt, &ppgtt->pd, start, length, pde)
+	gen6_for_each_pde(pt, &base->pd, start, length, pde)
 		gen6_write_pde(ppgtt, pde, pt);
 
-	mark_tlbs_dirty(ppgtt);
-	gen6_ggtt_invalidate(ppgtt->vm.i915);
+	mark_tlbs_dirty(base);
+	gen6_ggtt_invalidate(base->vm.i915);
 }
 
-static inline u32 get_pd_offset(struct i915_hw_ppgtt *ppgtt)
+static inline u32 get_pd_offset(struct gen6_hw_ppgtt *ppgtt)
 {
-	GEM_BUG_ON(ppgtt->pd.base.ggtt_offset & 0x3f);
-	return ppgtt->pd.base.ggtt_offset << 10;
+	GEM_BUG_ON(ppgtt->base.pd.base.ggtt_offset & 0x3f);
+	return ppgtt->base.pd.base.ggtt_offset << 10;
 }
 
-static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
+static int hsw_mm_switch(struct gen6_hw_ppgtt *ppgtt,
 			 struct i915_request *rq)
 {
 	struct intel_engine_cs *engine = rq->engine;
@@ -1740,7 +1741,7 @@ static int hsw_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
-static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
+static int gen7_mm_switch(struct gen6_hw_ppgtt *ppgtt,
 			  struct i915_request *rq)
 {
 	struct intel_engine_cs *engine = rq->engine;
@@ -1762,7 +1763,7 @@ static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
 	return 0;
 }
 
-static int gen6_mm_switch(struct i915_hw_ppgtt *ppgtt,
+static int gen6_mm_switch(struct gen6_hw_ppgtt *ppgtt,
 			  struct i915_request *rq)
 {
 	struct intel_engine_cs *engine = rq->engine;
@@ -1904,28 +1905,28 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       u64 start, u64 length)
 {
-	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	struct i915_page_table *pt;
 	u64 from = start;
 	unsigned int pde;
 	bool flush = false;
 
-	gen6_for_each_pde(pt, &ppgtt->pd, start, length, pde) {
+	gen6_for_each_pde(pt, &ppgtt->base.pd, start, length, pde) {
 		if (pt == vm->scratch_pt) {
 			pt = alloc_pt(vm);
 			if (IS_ERR(pt))
 				goto unwind_out;
 
 			gen6_initialize_pt(vm, pt);
-			ppgtt->pd.page_table[pde] = pt;
+			ppgtt->base.pd.page_table[pde] = pt;
 			gen6_write_pde(ppgtt, pde, pt);
 			flush = true;
 		}
 	}
 
 	if (flush) {
-		mark_tlbs_dirty(ppgtt);
-		gen6_ggtt_invalidate(ppgtt->vm.i915);
+		mark_tlbs_dirty(&ppgtt->base);
+		gen6_ggtt_invalidate(ppgtt->base.vm.i915);
 	}
 
 	return 0;
@@ -1962,24 +1963,23 @@ static void gen6_free_scratch(struct i915_address_space *vm)
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
-	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
-	struct i915_page_directory *pd = &ppgtt->pd;
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	struct i915_page_table *pt;
 	u32 pde;
 
 	drm_mm_remove_node(&ppgtt->node);
 
-	gen6_for_all_pdes(pt, pd, pde)
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
 		if (pt != vm->scratch_pt)
 			free_pt(vm, pt);
 
 	gen6_free_scratch(vm);
 }
 
-static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
 {
-	struct i915_address_space *vm = &ppgtt->vm;
-	struct drm_i915_private *dev_priv = ppgtt->vm.i915;
+	struct i915_address_space *vm = &ppgtt->base.vm;
+	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
 	int ret;
 
@@ -2004,11 +2004,11 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	if (ppgtt->node.start < ggtt->mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->pd.base.ggtt_offset =
+	ppgtt->base.pd.base.ggtt_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
 
 	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
-		ppgtt->pd.base.ggtt_offset / sizeof(gen6_pte_t);
+		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
 
 	return 0;
 
@@ -2017,35 +2017,35 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	return ret;
 }
 
-static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_alloc(struct gen6_hw_ppgtt *ppgtt)
 {
 	return gen6_ppgtt_allocate_page_directories(ppgtt);
 }
 
-static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
 				  u64 start, u64 length)
 {
 	struct i915_page_table *unused;
 	u32 pde;
 
-	gen6_for_each_pde(unused, &ppgtt->pd, start, length, pde)
-		ppgtt->pd.page_table[pde] = ppgtt->vm.scratch_pt;
+	gen6_for_each_pde(unused, &ppgtt->base.pd, start, length, pde)
+		ppgtt->base.pd.page_table[pde] = ppgtt->base.vm.scratch_pt;
 }
 
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 {
 	struct i915_ggtt * const ggtt = &i915->ggtt;
-	struct i915_hw_ppgtt *ppgtt;
+	struct gen6_hw_ppgtt *ppgtt;
 	int err;
 
 	ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
 	if (!ppgtt)
 		return ERR_PTR(-ENOMEM);
 
-	ppgtt->vm.i915 = i915;
-	ppgtt->vm.dma = &i915->drm.pdev->dev;
+	ppgtt->base.vm.i915 = i915;
+	ppgtt->base.vm.dma = &i915->drm.pdev->dev;
 
-	ppgtt->vm.pte_encode = ggtt->vm.pte_encode;
+	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 	if (intel_vgpu_active(i915) || IS_GEN6(i915))
 		ppgtt->switch_mm = gen6_mm_switch;
 	else if (IS_HASWELL(i915))
@@ -2059,36 +2059,36 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	if (err)
 		goto err_free;
 
-	ppgtt->vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
+	ppgtt->base.vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
 
-	gen6_scratch_va_range(ppgtt, 0, ppgtt->vm.total);
-	gen6_write_page_range(ppgtt, 0, ppgtt->vm.total);
+	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
+	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
 
-	err = gen6_alloc_va_range(&ppgtt->vm, 0, ppgtt->vm.total);
+	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
 	if (err)
 		goto err_cleanup;
 
-	ppgtt->vm.clear_range = gen6_ppgtt_clear_range;
-	ppgtt->vm.insert_entries = gen6_ppgtt_insert_entries;
-	ppgtt->vm.cleanup = gen6_ppgtt_cleanup;
-	ppgtt->debug_dump = gen6_dump_ppgtt;
+	ppgtt->base.vm.clear_range = gen6_ppgtt_clear_range;
+	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
+	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
+	ppgtt->base.debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->vm.vma_ops.bind_vma    = gen6_ppgtt_bind_vma;
-	ppgtt->vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
-	ppgtt->vm.vma_ops.set_pages   = ppgtt_set_pages;
-	ppgtt->vm.vma_ops.clear_pages = clear_pages;
+	ppgtt->base.vm.vma_ops.bind_vma    = gen6_ppgtt_bind_vma;
+	ppgtt->base.vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
+	ppgtt->base.vm.vma_ops.set_pages   = ppgtt_set_pages;
+	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
 
 	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
-			 ppgtt->pd.base.ggtt_offset << 10);
+			 ppgtt->base.pd.base.ggtt_offset << 10);
 
-	return ppgtt;
+	return &ppgtt->base;
 
 err_cleanup:
-	gen6_ppgtt_cleanup(&ppgtt->vm);
+	gen6_ppgtt_cleanup(&ppgtt->base.vm);
 err_free:
 	kfree(ppgtt);
 	return ERR_PTR(err);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 16307ba7e303..199d6f47a557 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -396,7 +396,7 @@ struct i915_ggtt {
 struct i915_hw_ppgtt {
 	struct i915_address_space vm;
 	struct kref ref;
-	struct drm_mm_node node;
+
 	unsigned long pd_dirty_rings;
 	union {
 		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
@@ -404,13 +404,26 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory pd;		/* GEN6-7 */
 	};
 
+	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
+};
+
+struct gen6_hw_ppgtt {
+	struct i915_hw_ppgtt base;
+
+	struct drm_mm_node node;
 	gen6_pte_t __iomem *pd_addr;
 
-	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
-			 struct i915_request *rq);
-	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
+	int (*switch_mm)(struct gen6_hw_ppgtt *ppgtt, struct i915_request *rq);
 };
 
+#define __to_gen6_ppgtt(base) container_of(base, struct gen6_hw_ppgtt, base)
+
+static inline struct gen6_hw_ppgtt *to_gen6_ppgtt(struct i915_hw_ppgtt *base)
+{
+	BUILD_BUG_ON(offsetof(struct gen6_hw_ppgtt, base));
+	return __to_gen6_ppgtt(base);
+}
+
 /*
  * gen6_for_each_pde() iterates over every pde from start until start+length.
  * If start and start+length are not perfectly divisible, the macro will round
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1b3805adbd57..ce07ef9471d2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1570,8 +1570,8 @@ static int switch_context(struct i915_request *rq)
 {
 	struct intel_engine_cs *engine = rq->engine;
 	struct i915_gem_context *to_ctx = rq->gem_context;
-	struct i915_hw_ppgtt *to_mm =
-		to_ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
+	struct gen6_hw_ppgtt *to_mm =
+		to_gen6_ppgtt(to_ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt);
 	u32 hw_flags = 0;
 	int ret, i;
 
@@ -1584,9 +1584,10 @@ static int switch_context(struct i915_request *rq)
 		if (ret)
 			goto err;
 
-		if (intel_engine_flag(engine) & to_mm->pd_dirty_rings) {
+		if (intel_engine_flag(engine) & to_mm->base.pd_dirty_rings) {
 			hw_flags = MI_FORCE_RESTORE;
-			to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
+			to_mm->base.pd_dirty_rings &=
+				~intel_engine_flag(engine);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index f80cf7ce3fa9..538e658252f7 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -142,10 +142,6 @@ static int igt_ppgtt_alloc(void *arg)
 	if (!USES_PPGTT(dev_priv))
 		return 0;
 
-	ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
-	if (!ppgtt)
-		return -ENOMEM;
-
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	ppgtt = __hw_ppgtt_create(dev_priv);
 	if (IS_ERR(ppgtt)) {
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/18] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (4 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 05/18] drm/i915/gtt: Subclass gen6_hw_ppgtt Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 07/18] drm/i915/gtt: Reorder aliasing_ppgtt fini Chris Wilson
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

Pull the empty stubs together into the top level gen6_ppgtt_create, and
tear each one down on error in proper onion order (rather than use
Joonas' pet hate of calling the cleanup function in indeterminable
state).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 81 ++++++++++++++---------------
 1 file changed, 39 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 49e02dee07e0..f9f0bffa727e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1936,7 +1936,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	return -ENOMEM;
 }
 
-static int gen6_init_scratch(struct i915_address_space *vm)
+static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
 {
 	int ret;
 
@@ -1955,33 +1955,37 @@ static int gen6_init_scratch(struct i915_address_space *vm)
 	return 0;
 }
 
-static void gen6_free_scratch(struct i915_address_space *vm)
+static void gen6_ppgtt_free_scratch(struct i915_address_space *vm)
 {
 	free_pt(vm, vm->scratch_pt);
 	cleanup_scratch_page(vm);
 }
 
-static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen6_ppgtt_free_pd(struct gen6_hw_ppgtt *ppgtt)
 {
-	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	struct i915_page_table *pt;
 	u32 pde;
 
-	drm_mm_remove_node(&ppgtt->node);
-
 	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
-		if (pt != vm->scratch_pt)
-			free_pt(vm, pt);
+		if (pt != ppgtt->base.vm.scratch_pt)
+			free_pt(&ppgtt->base.vm, pt);
+}
 
-	gen6_free_scratch(vm);
+static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
+
+	drm_mm_remove_node(&ppgtt->node);
+
+	gen6_ppgtt_free_pd(ppgtt);
+	gen6_ppgtt_free_scratch(vm);
 }
 
 static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
 {
-	struct i915_address_space *vm = &ppgtt->base.vm;
 	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	int ret;
+	int err;
 
 	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
 	 * allocator works in address space sizes, so it's multiplied by page
@@ -1989,17 +1993,13 @@ static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
 	 */
 	BUG_ON(!drm_mm_initialized(&ggtt->vm.mm));
 
-	ret = gen6_init_scratch(vm);
-	if (ret)
-		return ret;
-
-	ret = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
+	err = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
 				  GEN6_PD_SIZE, GEN6_PD_ALIGN,
 				  I915_COLOR_UNEVICTABLE,
 				  0, ggtt->vm.total,
 				  PIN_HIGH);
-	if (ret)
-		goto err_out;
+	if (err)
+		return err;
 
 	if (ppgtt->node.start < ggtt->mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
@@ -2011,15 +2011,6 @@ static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
 		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
 
 	return 0;
-
-err_out:
-	gen6_free_scratch(vm);
-	return ret;
-}
-
-static int gen6_ppgtt_alloc(struct gen6_hw_ppgtt *ppgtt)
-{
-	return gen6_ppgtt_allocate_page_directories(ppgtt);
 }
 
 static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
@@ -2045,6 +2036,18 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->base.vm.i915 = i915;
 	ppgtt->base.vm.dma = &i915->drm.pdev->dev;
 
+	ppgtt->base.vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
+
+	ppgtt->base.vm.clear_range = gen6_ppgtt_clear_range;
+	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
+	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
+	ppgtt->base.debug_dump = gen6_dump_ppgtt;
+
+	ppgtt->base.vm.vma_ops.bind_vma    = gen6_ppgtt_bind_vma;
+	ppgtt->base.vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
+	ppgtt->base.vm.vma_ops.set_pages   = ppgtt_set_pages;
+	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
+
 	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 	if (intel_vgpu_active(i915) || IS_GEN6(i915))
 		ppgtt->switch_mm = gen6_mm_switch;
@@ -2055,28 +2058,20 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	else
 		BUG();
 
-	err = gen6_ppgtt_alloc(ppgtt);
+	err = gen6_ppgtt_init_scratch(&ppgtt->base.vm);
 	if (err)
 		goto err_free;
 
-	ppgtt->base.vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
+	err = gen6_ppgtt_allocate_page_directories(ppgtt);
+	if (err)
+		goto err_scratch;
 
 	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
 	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
 
 	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
 	if (err)
-		goto err_cleanup;
-
-	ppgtt->base.vm.clear_range = gen6_ppgtt_clear_range;
-	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
-	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
-	ppgtt->base.debug_dump = gen6_dump_ppgtt;
-
-	ppgtt->base.vm.vma_ops.bind_vma    = gen6_ppgtt_bind_vma;
-	ppgtt->base.vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
-	ppgtt->base.vm.vma_ops.set_pages   = ppgtt_set_pages;
-	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
+		goto err_pd;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
 			 ppgtt->node.size >> 20,
@@ -2087,8 +2082,10 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 	return &ppgtt->base;
 
-err_cleanup:
-	gen6_ppgtt_cleanup(&ppgtt->base.vm);
+err_pd:
+	gen6_ppgtt_free_pd(ppgtt);
+err_scratch:
+	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
 	kfree(ppgtt);
 	return ERR_PTR(err);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/18] drm/i915/gtt: Reorder aliasing_ppgtt fini
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (5 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 06/18] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 08/18] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

To allow ourselves to use a first class vma for the aliasing_ppgtt page
directory, we have to reorder the shutdown on module unload to remove
and unpin the aliasing_ppgtt before complaining about any objects left
in the GGTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f9f0bffa727e..bd338bccf706 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2891,15 +2891,11 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
 	ggtt->vm.closed = true;
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
+	i915_gem_fini_aliasing_ppgtt(dev_priv);
+
 	GEM_BUG_ON(!list_empty(&ggtt->vm.active_list));
 	list_for_each_entry_safe(vma, vn, &ggtt->vm.inactive_list, vm_link)
 		WARN_ON(i915_vma_unbind(vma));
-	mutex_unlock(&dev_priv->drm.struct_mutex);
-
-	i915_gem_cleanup_stolen(&dev_priv->drm);
-
-	mutex_lock(&dev_priv->drm.struct_mutex);
-	i915_gem_fini_aliasing_ppgtt(dev_priv);
 
 	if (drm_mm_node_allocated(&ggtt->error_capture))
 		drm_mm_remove_node(&ggtt->error_capture);
@@ -2921,6 +2917,8 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv)
 
 	arch_phys_wc_del(ggtt->mtrr);
 	io_mapping_fini(&ggtt->iomap);
+
+	i915_gem_cleanup_stolen(&dev_priv->drm);
 }
 
 static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/18] drm/i915/gtt: Make gen6 page directories evictable
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (6 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 07/18] drm/i915/gtt: Reorder aliasing_ppgtt fini Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 13:10   ` [PATCH] " Chris Wilson
  2018-06-08 12:55 ` [PATCH 09/18] drm/i915/gtt: Only keep gen6 page directories pinned while active Chris Wilson
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

Currently all page directories are bound at creation using an
unevictable node in the GGTT. This severely limits us as we cannot
remove any inactive ppgtt for new contexts, or under aperture pressure.
To fix this we need to make the page directory into a first class and
unbindable vma. Hence, the creation of a custom vma to wrap the page
directory as opposed to a GEM object.

In this patch, we leave the page directories pinned upon creation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 256 ++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   2 +-
 drivers/gpu/drm/i915/i915_vma.h     |   7 +
 3 files changed, 154 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd338bccf706..96f0638ab0f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1640,50 +1640,55 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 	struct i915_address_space *vm = &base->vm;
-	struct i915_page_table *unused;
-	gen6_pte_t scratch_pte;
-	u32 pd_entry, pte, pde;
-
-	scratch_pte = vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
-
-	gen6_for_all_pdes(unused, &base->pd, pde) {
-		u32 expected;
-		gen6_pte_t *pt_vaddr;
-		const dma_addr_t pt_addr = px_dma(base->pd.page_table[pde]);
-		pd_entry = readl(ppgtt->pd_addr + pde);
-		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
-
-		if (pd_entry != expected)
-			seq_printf(m, "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
-				   pde,
-				   pd_entry,
-				   expected);
-		seq_printf(m, "\tPDE: %x\n", pd_entry);
-
-		pt_vaddr = kmap_atomic_px(base->pd.page_table[pde]);
-
-		for (pte = 0; pte < GEN6_PTES; pte+=4) {
-			unsigned long va =
-				(pde * PAGE_SIZE * GEN6_PTES) +
-				(pte * PAGE_SIZE);
+	const gen6_pte_t scratch_pte =
+		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	struct i915_page_table *pt;
+	u32 pte, pde;
+
+	gen6_for_all_pdes(pt, &base->pd, pde) {
+		gen6_pte_t *vaddr;
+
+		if (pt == base->vm.scratch_pt)
+			continue;
+
+		if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND)) {
+			u32 expected =
+				GEN6_PDE_ADDR_ENCODE(px_dma(pt)) |
+				GEN6_PDE_VALID;
+			u32 pd_entry = readl(ppgtt->pd_addr + pde);
+
+			if (pd_entry != expected)
+				seq_printf(m,
+					   "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
+					   pde,
+					   pd_entry,
+					   expected);
+
+			seq_printf(m, "\tPDE: %x\n", pd_entry);
+		}
+
+		vaddr = kmap_atomic_px(base->pd.page_table[pde]);
+		for (pte = 0; pte < GEN6_PTES; pte += 4) {
 			int i;
-			bool found = false;
+
 			for (i = 0; i < 4; i++)
-				if (pt_vaddr[pte + i] != scratch_pte)
-					found = true;
-			if (!found)
+				if (vaddr[pte + i] != scratch_pte)
+					break;
+			if (i == 4)
 				continue;
 
-			seq_printf(m, "\t\t0x%lx [%03d,%04d]: =", va, pde, pte);
+			seq_printf(m, "\t\t(%03d, %04d) %08lx: ",
+				   pde, pte,
+				   (pde * GEN6_PTES + pte) * PAGE_SIZE);
 			for (i = 0; i < 4; i++) {
-				if (pt_vaddr[pte + i] != scratch_pte)
-					seq_printf(m, " %08x", pt_vaddr[pte + i]);
+				if (vaddr[pte + i] != scratch_pte)
+					seq_printf(m, " %08x", vaddr[pte + i]);
 				else
-					seq_puts(m, "  SCRATCH ");
+					seq_puts(m, "  SCRATCH");
 			}
 			seq_puts(m, "\n");
 		}
-		kunmap_atomic(pt_vaddr);
+		kunmap_atomic(vaddr);
 	}
 }
 
@@ -1697,22 +1702,6 @@ static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt,
 		  ppgtt->pd_addr + pde);
 }
 
-/* Write all the page tables found in the ppgtt structure to incrementing page
- * directories. */
-static void gen6_write_page_range(struct i915_hw_ppgtt *base,
-				  u32 start, u32 length)
-{
-	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
-	struct i915_page_table *pt;
-	unsigned int pde;
-
-	gen6_for_each_pde(pt, &base->pd, start, length, pde)
-		gen6_write_pde(ppgtt, pde, pt);
-
-	mark_tlbs_dirty(base);
-	gen6_ggtt_invalidate(base->vm.i915);
-}
-
 static inline u32 get_pd_offset(struct gen6_hw_ppgtt *ppgtt)
 {
 	GEM_BUG_ON(ppgtt->base.pd.base.ggtt_offset & 0x3f);
@@ -1919,8 +1908,12 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 			gen6_initialize_pt(vm, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
-			gen6_write_pde(ppgtt, pde, pt);
-			flush = true;
+
+			if (i915_vma_is_bound(ppgtt->vma,
+					      I915_VMA_GLOBAL_BIND)) {
+				gen6_write_pde(ppgtt, pde, pt);
+				flush = true;
+			}
 		}
 	}
 
@@ -1936,8 +1929,11 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	return -ENOMEM;
 }
 
-static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
+static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt)
 {
+	struct i915_address_space * const vm = &ppgtt->base.vm;
+	struct i915_page_table *unused;
+	u32 pde;
 	int ret;
 
 	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
@@ -1951,6 +1947,8 @@ static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
 	}
 
 	gen6_initialize_pt(vm, vm->scratch_pt);
+	gen6_for_all_pdes(unused, &ppgtt->base.pd, pde)
+		ppgtt->base.pd.page_table[pde] = vm->scratch_pt;
 
 	return 0;
 }
@@ -1975,52 +1973,103 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 
-	drm_mm_remove_node(&ppgtt->node);
+	i915_vma_unpin(ppgtt->vma);
+	i915_vma_destroy(ppgtt->vma);
 
 	gen6_ppgtt_free_pd(ppgtt);
 	gen6_ppgtt_free_scratch(vm);
 }
 
-static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
+static int pd_vma_set_pages(struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	int err;
+	vma->pages = ERR_PTR(-ENODEV);
+	return 0;
+}
 
-	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
-	 * allocator works in address space sizes, so it's multiplied by page
-	 * size. We allocate at the top of the GTT to avoid fragmentation.
-	 */
-	BUG_ON(!drm_mm_initialized(&ggtt->vm.mm));
+static void pd_vma_clear_pages(struct i915_vma *vma)
+{
+	GEM_BUG_ON(!vma->pages);
 
-	err = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
-				  GEN6_PD_SIZE, GEN6_PD_ALIGN,
-				  I915_COLOR_UNEVICTABLE,
-				  0, ggtt->vm.total,
-				  PIN_HIGH);
-	if (err)
-		return err;
+	vma->pages = NULL;
+}
+
+static int pd_vma_bind(struct i915_vma *vma,
+		       enum i915_cache_level cache_level,
+		       u32 unused)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct gen6_hw_ppgtt *ppgtt = vma->private;
+	u32 ggtt_offset = i915_ggtt_offset(vma) / PAGE_SIZE;
+	struct i915_page_table *pt;
+	unsigned int pde;
 
-	if (ppgtt->node.start < ggtt->mappable_end)
-		DRM_DEBUG("Forced to use aperture for PDEs\n");
+	ppgtt->base.pd.base.ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
+	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
 
-	ppgtt->base.pd.base.ggtt_offset =
-		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
+		gen6_write_pde(ppgtt, pde, pt);
 
-	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
-		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
+	mark_tlbs_dirty(&ppgtt->base);
+	gen6_ggtt_invalidate(ppgtt->base.vm.i915);
 
 	return 0;
 }
 
-static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
-				  u64 start, u64 length)
+static void pd_vma_unbind(struct i915_vma *vma)
 {
-	struct i915_page_table *unused;
-	u32 pde;
+}
 
-	gen6_for_each_pde(unused, &ppgtt->base.pd, start, length, pde)
-		ppgtt->base.pd.page_table[pde] = ppgtt->base.vm.scratch_pt;
+static const struct i915_vma_ops pd_vma_ops = {
+	.set_pages = pd_vma_set_pages,
+	.clear_pages = pd_vma_clear_pages,
+	.bind_vma = pd_vma_bind,
+	.unbind_vma = pd_vma_unbind,
+};
+
+static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
+{
+	struct drm_i915_private *i915 = ppgtt->base.vm.i915;
+	struct i915_ggtt *ggtt = &i915->ggtt;
+	struct i915_vma *vma;
+	int i;
+
+	GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
+	GEM_BUG_ON(size > ggtt->vm.total);
+
+	vma = kmem_cache_zalloc(i915->vmas, GFP_KERNEL);
+	if (!vma)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
+		init_request_active(&vma->last_read[i], NULL);
+	init_request_active(&vma->last_fence, NULL);
+
+	vma->vm = &ggtt->vm;
+	vma->ops = &pd_vma_ops;
+	vma->private = ppgtt;
+
+	vma->size = size;
+	vma->flags = I915_VMA_GGTT;
+	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
+
+	INIT_LIST_HEAD(&vma->obj_link);
+	list_add(&vma->vm_link, &vma->vm->unbound_list);
+
+	return vma;
+}
+
+static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+
+	/*
+	 * PPGTT PDEs reside in the GGTT and consists of 512 entries. The
+	 * allocator works in address space sizes, so it's multiplied by page
+	 * size. We allocate at the top of the GTT to avoid fragmentation.
+	 */
+	return i915_vma_pin(ppgtt->vma,
+			    0, GEN6_PD_ALIGN,
+			    PIN_GLOBAL | PIN_HIGH);
 }
 
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
@@ -2058,24 +2107,27 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	else
 		BUG();
 
-	err = gen6_ppgtt_init_scratch(&ppgtt->base.vm);
+	err = gen6_ppgtt_init_scratch(ppgtt);
 	if (err)
 		goto err_free;
 
-	err = gen6_ppgtt_allocate_page_directories(ppgtt);
-	if (err)
+	ppgtt->vma = pd_vma_create(ppgtt, GEN6_PD_SIZE);
+	if (IS_ERR(ppgtt->vma)) {
+		err = PTR_ERR(ppgtt->vma);
 		goto err_scratch;
-
-	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
-	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
+	}
 
 	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
+	if (err)
+		goto err_vma;
+
+	err = gen6_ppgtt_pin(&ppgtt->base);
 	if (err)
 		goto err_pd;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
-			 ppgtt->node.size >> 20,
-			 ppgtt->node.start / PAGE_SIZE);
+			 ppgtt->vma->node.size >> 20,
+			 ppgtt->vma->node.start / PAGE_SIZE);
 
 	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
 			 ppgtt->base.pd.base.ggtt_offset << 10);
@@ -2084,6 +2136,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 err_pd:
 	gen6_ppgtt_free_pd(ppgtt);
+err_vma:
+	i915_vma_destroy(ppgtt->vma);
 err_scratch:
 	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
@@ -3602,6 +3656,7 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	}
 
 	ggtt->vm.closed = false;
+	i915_ggtt_invalidate(dev_priv);
 
 	if (INTEL_GEN(dev_priv) >= 8) {
 		struct intel_ppat *ppat = &dev_priv->ppat;
@@ -3610,25 +3665,6 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 		dev_priv->ppat.update_hw(dev_priv);
 		return;
 	}
-
-	if (USES_PPGTT(dev_priv)) {
-		struct i915_address_space *vm;
-
-		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
-			struct i915_hw_ppgtt *ppgtt;
-
-			if (i915_is_ggtt(vm))
-				ppgtt = dev_priv->mm.aliasing_ppgtt;
-			else
-				ppgtt = i915_vm_to_ppgtt(vm);
-			if (!ppgtt)
-				continue;
-
-			gen6_write_page_range(ppgtt, 0, ppgtt->vm.total);
-		}
-	}
-
-	i915_ggtt_invalidate(dev_priv);
 }
 
 static struct scatterlist *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 199d6f47a557..c2f270c90bea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -410,7 +410,7 @@ struct i915_hw_ppgtt {
 struct gen6_hw_ppgtt {
 	struct i915_hw_ppgtt base;
 
-	struct drm_mm_node node;
+	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
 
 	int (*switch_mm)(struct gen6_hw_ppgtt *ppgtt, struct i915_request *rq);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 4321476a6a32..66a228931517 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -54,6 +54,7 @@ struct i915_vma {
 	struct reservation_object *resv; /** Alias of obj->resv */
 	struct sg_table *pages;
 	void __iomem *iomap;
+	void *private; /* owned by creator */
 	u64 size;
 	u64 display_alignment;
 	struct i915_page_sizes page_sizes;
@@ -340,6 +341,12 @@ static inline void i915_vma_unpin(struct i915_vma *vma)
 	__i915_vma_unpin(vma);
 }
 
+static inline bool i915_vma_is_bound(const struct i915_vma *vma,
+				     unsigned int where)
+{
+	return vma->flags & where;
+}
+
 /**
  * i915_vma_pin_iomap - calls ioremap_wc to map the GGTT VMA via the aperture
  * @vma: VMA to iomap
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/18] drm/i915/gtt: Only keep gen6 page directories pinned while active
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (7 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 08/18] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 10/18] drm/i915/gtt: Lazily allocate page directories for gen7 Chris Wilson
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

In order to be able to evict the gen6 ppgtt, we have to unpin it at some
point. We can simply use our context activity tracking to know when the
ppgtt is no longer in use by hardware, and so only keep it pinned while
being used a request.

For the kernel_context (and thus aliasing_ppgtt), it remains pinned at
all times, as the kernel_context itself is pinned at all times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 36 ++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h     |  5 ++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 28 +++++++++++++++++++
 3 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 96f0638ab0f2..d5af099939f6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1973,7 +1973,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 
-	i915_vma_unpin(ppgtt->vma);
 	i915_vma_destroy(ppgtt->vma);
 
 	gen6_ppgtt_free_pd(ppgtt);
@@ -2058,10 +2057,19 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	return vma;
 }
 
-static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
+int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 
+	/*
+	 * Workaround the limited maximum vma->pin_count and the aliasing_ppgtt
+	 * which will be pinned into every active context.
+	 * (When vma->pin_count becomes atomic, I expect we will naturally
+	 * need a larger, unpacked, type and kill this redundancy.)
+	 */
+	if (ppgtt->pin_count++)
+		return 0;
+
 	/*
 	 * PPGTT PDEs reside in the GGTT and consists of 512 entries. The
 	 * allocator works in address space sizes, so it's multiplied by page
@@ -2072,6 +2080,17 @@ static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
 			    PIN_GLOBAL | PIN_HIGH);
 }
 
+void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+
+	GEM_BUG_ON(!ppgtt->pin_count);
+	if (--ppgtt->pin_count)
+		return;
+
+	i915_vma_unpin(ppgtt->vma);
+}
+
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 {
 	struct i915_ggtt * const ggtt = &i915->ggtt;
@@ -2121,21 +2140,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	if (err)
 		goto err_vma;
 
-	err = gen6_ppgtt_pin(&ppgtt->base);
-	if (err)
-		goto err_pd;
-
-	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
-			 ppgtt->vma->node.size >> 20,
-			 ppgtt->vma->node.start / PAGE_SIZE);
-
-	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
-			 ppgtt->base.pd.base.ggtt_offset << 10);
-
 	return &ppgtt->base;
 
-err_pd:
-	gen6_ppgtt_free_pd(ppgtt);
 err_vma:
 	i915_vma_destroy(ppgtt->vma);
 err_scratch:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c2f270c90bea..c20a4f06db37 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -413,6 +413,8 @@ struct gen6_hw_ppgtt {
 	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
 
+	unsigned int pin_count;
+
 	int (*switch_mm)(struct gen6_hw_ppgtt *ppgtt, struct i915_request *rq);
 };
 
@@ -627,6 +629,9 @@ static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
 		kref_put(&ppgtt->ref, i915_ppgtt_release);
 }
 
+int gen6_ppgtt_pin(struct i915_hw_ppgtt *base);
+void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base);
+
 void i915_check_and_clear_faults(struct drm_i915_private *dev_priv);
 void i915_gem_suspend_gtt_mappings(struct drm_i915_private *dev_priv);
 void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index ce07ef9471d2..e89012b66e7e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1178,6 +1178,27 @@ static void intel_ring_context_destroy(struct intel_context *ce)
 		__i915_gem_object_release_unless_active(ce->state->obj);
 }
 
+static int __context_pin_ppgtt(struct i915_gem_context *ctx)
+{
+	struct i915_hw_ppgtt *ppgtt;
+	int err = 0;
+
+	ppgtt = ctx->ppgtt ?: ctx->i915->mm.aliasing_ppgtt;
+	if (ppgtt)
+		err = gen6_ppgtt_pin(ppgtt);
+
+	return err;
+}
+
+static void __context_unpin_ppgtt(struct i915_gem_context *ctx)
+{
+	struct i915_hw_ppgtt *ppgtt;
+
+	ppgtt = ctx->ppgtt ?: ctx->i915->mm.aliasing_ppgtt;
+	if (ppgtt)
+		gen6_ppgtt_unpin(ppgtt);
+}
+
 static int __context_pin(struct intel_context *ce)
 {
 	struct i915_vma *vma;
@@ -1226,6 +1247,7 @@ static void __context_unpin(struct intel_context *ce)
 
 static void intel_ring_context_unpin(struct intel_context *ce)
 {
+	__context_unpin_ppgtt(ce->gem_context);
 	__context_unpin(ce);
 
 	i915_gem_context_put(ce->gem_context);
@@ -1323,6 +1345,10 @@ __ring_context_pin(struct intel_engine_cs *engine,
 	if (err)
 		goto err;
 
+	err = __context_pin_ppgtt(ce->gem_context);
+	if (err)
+		goto err_unpin;
+
 	i915_gem_context_get(ctx);
 
 	/* One ringbuffer to rule them all */
@@ -1331,6 +1357,8 @@ __ring_context_pin(struct intel_engine_cs *engine,
 
 	return ce;
 
+err_unpin:
+	__context_unpin(ce);
 err:
 	ce->pin_count = 0;
 	return ERR_PTR(err);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/18] drm/i915/gtt: Lazily allocate page directories for gen7
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (8 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 09/18] drm/i915/gtt: Only keep gen6 page directories pinned while active Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 14:37   ` Matthew Auld
  2018-06-08 12:55 ` [PATCH 11/18] drm/i915/gtt: Free unused page tables on unbind the context Chris Wilson
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

As we were only supporting aliasing_ppgtt on gen7 for some time, we
saved a few checks by preallocating the page directories on creation.
However, since we need 2MiB of page directories for each ppgtt, to
support arbitrary numbers of user contexts, we need to be more prudent
in our allocations, and defer the page allocation until it is used. We
don't recover unused pages yet as we found that doing so on the fly
(i.e. altering TLB entries) would confuse the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 67 +++++++++++------------------
 1 file changed, 26 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d5af099939f6..e611884596a6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -190,11 +190,19 @@ int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv,
 	return 1;
 }
 
-static int gen6_ppgtt_bind_vma(struct i915_vma *vma,
-			       enum i915_cache_level cache_level,
-			       u32 unused)
+static int ppgtt_bind_vma(struct i915_vma *vma,
+			  enum i915_cache_level cache_level,
+			  u32 unused)
 {
 	u32 pte_flags;
+	int err;
+
+	if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
+		err = vma->vm->allocate_va_range(vma->vm,
+						 vma->node.start, vma->size);
+		if (err)
+			return err;
+	}
 
 	/* Currently applicable only to VLV */
 	pte_flags = 0;
@@ -206,22 +214,6 @@ static int gen6_ppgtt_bind_vma(struct i915_vma *vma,
 	return 0;
 }
 
-static int gen8_ppgtt_bind_vma(struct i915_vma *vma,
-			       enum i915_cache_level cache_level,
-			       u32 unused)
-{
-	int ret;
-
-	if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
-		ret = vma->vm->allocate_va_range(vma->vm,
-						 vma->node.start, vma->size);
-		if (ret)
-			return ret;
-	}
-
-	return gen6_ppgtt_bind_vma(vma, cache_level, unused);
-}
-
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
 	vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
@@ -1622,7 +1614,7 @@ static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->vm.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->debug_dump = gen8_dump_ppgtt;
 
-	ppgtt->vm.vma_ops.bind_vma    = gen8_ppgtt_bind_vma;
+	ppgtt->vm.vma_ops.bind_vma    = ppgtt_bind_vma;
 	ppgtt->vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
 	ppgtt->vm.vma_ops.set_pages   = ppgtt_set_pages;
 	ppgtt->vm.vma_ops.clear_pages = clear_pages;
@@ -1837,7 +1829,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 
 		num_entries -= end - pte;
 
-		/* Note that the hw doesn't support removing PDE on the fly
+		/*
+		 * Note that the hw doesn't support removing PDE on the fly
 		 * (they are cached inside the context with no means to
 		 * invalidate the cache), so we can only reset the PTE
 		 * entries back to scratch.
@@ -2106,12 +2099,13 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 	ppgtt->base.vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
 
+	ppgtt->base.vm.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.vm.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->base.vm.vma_ops.bind_vma    = gen6_ppgtt_bind_vma;
+	ppgtt->base.vm.vma_ops.bind_vma    = ppgtt_bind_vma;
 	ppgtt->base.vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
 	ppgtt->base.vm.vma_ops.set_pages   = ppgtt_set_pages;
 	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
@@ -2136,14 +2130,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 		goto err_scratch;
 	}
 
-	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
-	if (err)
-		goto err_vma;
-
 	return &ppgtt->base;
 
-err_vma:
-	i915_vma_destroy(ppgtt->vma);
 err_scratch:
 	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
@@ -2739,8 +2727,7 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 	if (flags & I915_VMA_LOCAL_BIND) {
 		struct i915_hw_ppgtt *appgtt = i915->mm.aliasing_ppgtt;
 
-		if (!(vma->flags & I915_VMA_LOCAL_BIND) &&
-		    appgtt->vm.allocate_va_range) {
+		if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
 			ret = appgtt->vm.allocate_va_range(&appgtt->vm,
 							   vma->node.start,
 							   vma->size);
@@ -2844,17 +2831,15 @@ int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915)
 		goto err_ppgtt;
 	}
 
-	if (ppgtt->vm.allocate_va_range) {
-		/* Note we only pre-allocate as far as the end of the global
-		 * GTT. On 48b / 4-level page-tables, the difference is very,
-		 * very significant! We have to preallocate as GVT/vgpu does
-		 * not like the page directory disappearing.
-		 */
-		err = ppgtt->vm.allocate_va_range(&ppgtt->vm,
-						  0, ggtt->vm.total);
-		if (err)
-			goto err_ppgtt;
-	}
+	/*
+	 * Note we only pre-allocate as far as the end of the global
+	 * GTT. On 48b / 4-level page-tables, the difference is very,
+	 * very significant! We have to preallocate as GVT/vgpu does
+	 * not like the page directory disappearing.
+	 */
+	err = ppgtt->vm.allocate_va_range(&ppgtt->vm, 0, ggtt->vm.total);
+	if (err)
+		goto err_ppgtt;
 
 	i915->mm.aliasing_ppgtt = ppgtt;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/18] drm/i915/gtt: Free unused page tables on unbind the context
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (9 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 10/18] drm/i915/gtt: Lazily allocate page directories for gen7 Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 12/18] drm/i915/gtt: Skip initializing PT with scratch if full Chris Wilson
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

As we cannot reliably change used page tables while the context is
active, the earliest opportunity we have to recover excess pages is when
the context becomes idle. So whenever we unbind the context (it must be
idle, and indeed being evicted) free the unused ptes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 44 +++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e611884596a6..edb19648a85b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1814,20 +1814,28 @@ static void gen6_ppgtt_enable(struct drm_i915_private *dev_priv)
 static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 				   u64 start, u64 length)
 {
-	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	unsigned int first_entry = start >> PAGE_SHIFT;
 	unsigned int pde = first_entry / GEN6_PTES;
 	unsigned int pte = first_entry % GEN6_PTES;
 	unsigned int num_entries = length >> PAGE_SHIFT;
-	gen6_pte_t scratch_pte =
+	const gen6_pte_t scratch_pte =
 		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
 
 	while (num_entries) {
-		struct i915_page_table *pt = ppgtt->pd.page_table[pde++];
-		unsigned int end = min(pte + num_entries, GEN6_PTES);
+		struct i915_page_table *pt = ppgtt->base.pd.page_table[pde++];
+		const unsigned int end = min(pte + num_entries, GEN6_PTES);
+		const unsigned int count = end - pte;
 		gen6_pte_t *vaddr;
 
-		num_entries -= end - pte;
+		GEM_BUG_ON(pt == vm->scratch_pt);
+
+		num_entries -= count;
+
+		GEM_BUG_ON(count > pt->used_ptes);
+		pt->used_ptes -= count;
+		if (!pt->used_ptes)
+			ppgtt->scan_for_unused_pt = true;
 
 		/*
 		 * Note that the hw doesn't support removing PDE on the fly
@@ -1859,6 +1867,8 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma);
 	gen6_pte_t *vaddr;
 
+	GEM_BUG_ON(ppgtt->pd.page_table[act_pt] == vm->scratch_pt);
+
 	vaddr = kmap_atomic_px(ppgtt->pd.page_table[act_pt]);
 	do {
 		vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma);
@@ -1894,6 +1904,8 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	bool flush = false;
 
 	gen6_for_each_pde(pt, &ppgtt->base.pd, start, length, pde) {
+		const unsigned int count = gen6_pte_count(start, length);
+
 		if (pt == vm->scratch_pt) {
 			pt = alloc_pt(vm);
 			if (IS_ERR(pt))
@@ -1907,7 +1919,11 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				gen6_write_pde(ppgtt, pde, pt);
 				flush = true;
 			}
+
+			GEM_BUG_ON(pt->used_ptes);
 		}
+
+		pt->used_ptes += count;
 	}
 
 	if (flush) {
@@ -2009,6 +2025,24 @@ static int pd_vma_bind(struct i915_vma *vma,
 
 static void pd_vma_unbind(struct i915_vma *vma)
 {
+	struct gen6_hw_ppgtt *ppgtt = vma->private;
+	struct i915_page_table * const scratch_pt = ppgtt->base.vm.scratch_pt;
+	struct i915_page_table *pt;
+	unsigned int pde;
+
+	if (!ppgtt->scan_for_unused_pt)
+		return;
+
+	/* Free all no longer used page tables */
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde) {
+		if (pt->used_ptes || pt == scratch_pt)
+			continue;
+
+		free_pt(&ppgtt->base.vm, pt);
+		ppgtt->base.pd.page_table[pde] = scratch_pt;
+	}
+
+	ppgtt->scan_for_unused_pt = false;
 }
 
 static const struct i915_vma_ops pd_vma_ops = {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c20a4f06db37..dc98830fae69 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -414,6 +414,7 @@ struct gen6_hw_ppgtt {
 	gen6_pte_t __iomem *pd_addr;
 
 	unsigned int pin_count;
+	bool scan_for_unused_pt;
 
 	int (*switch_mm)(struct gen6_hw_ppgtt *ppgtt, struct i915_request *rq);
 };
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/18] drm/i915/gtt: Skip initializing PT with scratch if full
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (10 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 11/18] drm/i915/gtt: Free unused page tables on unbind the context Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 13/18] drm/i915/gtt: Cache the PTE encoding of the scratch page Chris Wilson
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

If we will completely overwrite the PT with PTEs for the object, we can
forgo filling it with scratch entries.

References: 14826673247e ("drm/i915: Only initialize partially filled pagetables")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index edb19648a85b..79d63e16c2d4 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1911,7 +1911,8 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 			if (IS_ERR(pt))
 				goto unwind_out;
 
-			gen6_initialize_pt(vm, pt);
+			if (count < GEN6_PTES)
+				gen6_initialize_pt(vm, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
 
 			if (i915_vma_is_bound(ppgtt->vma,
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/18] drm/i915/gtt: Cache the PTE encoding of the scratch page
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (11 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 12/18] drm/i915/gtt: Skip initializing PT with scratch if full Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 14/18] drm/i915/gtt: Reduce a pair of runtime asserts Chris Wilson
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

As the most frequent PTE encoding is for the scratch page, cache it upon
creation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 20 ++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 79d63e16c2d4..58fd2ea77d00 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -640,11 +640,10 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
 		gen8_pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC));
 }
 
-static void gen6_initialize_pt(struct i915_address_space *vm,
+static void gen6_initialize_pt(struct gen6_hw_ppgtt *ppgtt,
 			       struct i915_page_table *pt)
 {
-	fill32_px(vm, pt,
-		  vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0));
+	fill32_px(&ppgtt->base.vm, pt, ppgtt->scratch_pte);
 }
 
 static struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
@@ -1631,9 +1630,7 @@ static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915)
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
-	struct i915_address_space *vm = &base->vm;
-	const gen6_pte_t scratch_pte =
-		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	const gen6_pte_t scratch_pte = ppgtt->scratch_pte;
 	struct i915_page_table *pt;
 	u32 pte, pde;
 
@@ -1819,8 +1816,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 	unsigned int pde = first_entry / GEN6_PTES;
 	unsigned int pte = first_entry % GEN6_PTES;
 	unsigned int num_entries = length >> PAGE_SHIFT;
-	const gen6_pte_t scratch_pte =
-		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	const gen6_pte_t scratch_pte = ppgtt->scratch_pte;
 
 	while (num_entries) {
 		struct i915_page_table *pt = ppgtt->base.pd.page_table[pde++];
@@ -1912,7 +1908,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				goto unwind_out;
 
 			if (count < GEN6_PTES)
-				gen6_initialize_pt(vm, pt);
+				gen6_initialize_pt(ppgtt, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
 
 			if (i915_vma_is_bound(ppgtt->vma,
@@ -1950,13 +1946,17 @@ static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	ppgtt->scratch_pte =
+		vm->pte_encode(vm->scratch_page.daddr,
+			       I915_CACHE_NONE, PTE_READ_ONLY);
+
 	vm->scratch_pt = alloc_pt(vm);
 	if (IS_ERR(vm->scratch_pt)) {
 		cleanup_scratch_page(vm);
 		return PTR_ERR(vm->scratch_pt);
 	}
 
-	gen6_initialize_pt(vm, vm->scratch_pt);
+	gen6_initialize_pt(ppgtt, vm->scratch_pt);
 	gen6_for_all_pdes(unused, &ppgtt->base.pd, pde)
 		ppgtt->base.pd.page_table[pde] = vm->scratch_pt;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index dc98830fae69..c50bbde007f8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -412,6 +412,7 @@ struct gen6_hw_ppgtt {
 
 	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
+	gen6_pte_t scratch_pte;
 
 	unsigned int pin_count;
 	bool scan_for_unused_pt;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 14/18] drm/i915/gtt: Reduce a pair of runtime asserts
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (12 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 13/18] drm/i915/gtt: Cache the PTE encoding of the scratch page Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:55 ` [PATCH 15/18] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt Chris Wilson
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

We can stop asserting using WARN_ON as given sufficient CI coverage, we
can rely on using GEM_BUG_ON() to catch problems before merging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 58fd2ea77d00..0b434954f185 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2861,7 +2861,7 @@ int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915)
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
-	if (WARN_ON(ppgtt->vm.total < ggtt->vm.total)) {
+	if (GEM_WARN_ON(ppgtt->vm.total < ggtt->vm.total)) {
 		err = -ENODEV;
 		goto err_ppgtt;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index c50bbde007f8..37f565a38d3e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -466,8 +466,8 @@ static inline u32 i915_pte_count(u64 addr, u64 length, unsigned int pde_shift)
 	const u64 mask = ~((1ULL << pde_shift) - 1);
 	u64 end;
 
-	WARN_ON(length == 0);
-	WARN_ON(offset_in_page(addr|length));
+	GEM_BUG_ON(length == 0);
+	GEM_BUG_ON(offset_in_page(addr|length));
 
 	end = addr + length;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 15/18] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (13 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 14/18] drm/i915/gtt: Reduce a pair of runtime asserts Chris Wilson
@ 2018-06-08 12:55 ` Chris Wilson
  2018-06-08 12:56 ` [PATCH 16/18] drm/i915/gtt: Remove redundant hsw_mm_switch() Chris Wilson
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:55 UTC (permalink / raw)
  To: intel-gfx

If we know that the user cannot access the GGTT, by virtue of having a
segregated memory area, we can skip clearing the unused entries as they
cannot be accessed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 0b434954f185..60a8332a122e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3451,7 +3451,9 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
 	size = gen6_get_total_gtt_size(snb_gmch_ctl);
 	ggtt->vm.total = (size / sizeof(gen6_pte_t)) << PAGE_SHIFT;
 
-	ggtt->vm.clear_range = gen6_ggtt_clear_range;
+	ggtt->vm.clear_range = nop_clear_range;
+	if (!USES_FULL_PPGTT(dev_priv) || intel_scanout_needs_vtd_wa(dev_priv))
+		ggtt->vm.clear_range = gen6_ggtt_clear_range;
 	ggtt->vm.insert_page = gen6_ggtt_insert_page;
 	ggtt->vm.insert_entries = gen6_ggtt_insert_entries;
 	ggtt->vm.cleanup = gen6_gmch_remove;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 16/18] drm/i915/gtt: Remove redundant hsw_mm_switch()
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (14 preceding siblings ...)
  2018-06-08 12:55 ` [PATCH 15/18] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt Chris Wilson
@ 2018-06-08 12:56 ` Chris Wilson
  2018-06-08 14:03   ` Mika Kuoppala
  2018-06-08 12:56 ` [PATCH 17/18] drm/i915/gtt: Remove vgpu check for gen6 Chris Wilson
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:56 UTC (permalink / raw)
  To: intel-gfx

hsw_mm_switch() and gen7_mm_switch() are identical, so let's remove the
redundant specialism.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 60a8332a122e..25ad94b1b67e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1697,28 +1697,6 @@ static inline u32 get_pd_offset(struct gen6_hw_ppgtt *ppgtt)
 	return ppgtt->base.pd.base.ggtt_offset << 10;
 }
 
-static int hsw_mm_switch(struct gen6_hw_ppgtt *ppgtt,
-			 struct i915_request *rq)
-{
-	struct intel_engine_cs *engine = rq->engine;
-	u32 *cs;
-
-	/* NB: TLBs must be flushed and invalidated before a switch */
-	cs = intel_ring_begin(rq, 6);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_LOAD_REGISTER_IMM(2);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine));
-	*cs++ = PP_DIR_DCLV_2G;
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
-	*cs++ = get_pd_offset(ppgtt);
-	*cs++ = MI_NOOP;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
 static int gen7_mm_switch(struct gen6_hw_ppgtt *ppgtt,
 			  struct i915_request *rq)
 {
@@ -2148,8 +2126,6 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 	if (intel_vgpu_active(i915) || IS_GEN6(i915))
 		ppgtt->switch_mm = gen6_mm_switch;
-	else if (IS_HASWELL(i915))
-		ppgtt->switch_mm = hsw_mm_switch;
 	else if (IS_GEN7(i915))
 		ppgtt->switch_mm = gen7_mm_switch;
 	else
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 17/18] drm/i915/gtt: Remove vgpu check for gen6
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (15 preceding siblings ...)
  2018-06-08 12:56 ` [PATCH 16/18] drm/i915/gtt: Remove redundant hsw_mm_switch() Chris Wilson
@ 2018-06-08 12:56 ` Chris Wilson
  2018-06-08 14:06   ` Mika Kuoppala
  2018-06-08 12:56 ` [PATCH 18/18] RFT drm/i915/gtt: Enable full-ppgtt by default everywhere Chris Wilson
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:56 UTC (permalink / raw)
  To: intel-gfx

Since vgpu is not supported on Haswell or any other gen6/7, we do not
need to check and act upon it's enablement.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 25ad94b1b67e..ca067d9adf54 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2124,7 +2124,7 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
 
 	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
-	if (intel_vgpu_active(i915) || IS_GEN6(i915))
+	if (IS_GEN6(i915))
 		ppgtt->switch_mm = gen6_mm_switch;
 	else if (IS_GEN7(i915))
 		ppgtt->switch_mm = gen7_mm_switch;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 18/18] RFT drm/i915/gtt: Enable full-ppgtt by default everywhere
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (16 preceding siblings ...)
  2018-06-08 12:56 ` [PATCH 17/18] drm/i915/gtt: Remove vgpu check for gen6 Chris Wilson
@ 2018-06-08 12:56 ` Chris Wilson
  2018-06-08 13:25 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2) Patchwork
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 12:56 UTC (permalink / raw)
  To: intel-gfx

Let's see if we have all the kinks worked out and full-ppgtt now works
reliably on gen7 (Ivybridge, Valleyview/Baytrail and Haswell). If we can
let userspace have full control over their own ppgtt, it makes softpinning
far more effective, in turn making GPU dispatch far more efficient and
more secure (due to better mm segregation).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ca067d9adf54..a181fe20964e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -179,13 +179,11 @@ int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv,
 		return 0;
 	}
 
-	if (HAS_LOGICAL_RING_CONTEXTS(dev_priv)) {
-		if (has_full_48bit_ppgtt)
-			return 3;
+	if (has_full_48bit_ppgtt)
+		return 3;
 
-		if (has_full_ppgtt)
-			return 2;
-	}
+	if (has_full_ppgtt)
+		return 2;
 
 	return 1;
 }
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH] drm/i915/gtt: Make gen6 page directories evictable
  2018-06-08 12:55 ` [PATCH 08/18] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
@ 2018-06-08 13:10   ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 13:10 UTC (permalink / raw)
  To: intel-gfx

Currently all page directories are bound at creation using an
unevictable node in the GGTT. This severely limits us as we cannot
remove any inactive ppgtt for new contexts, or under aperture pressure.
To fix this we need to make the page directory into a first class and
unbindable vma. Hence, the creation of a custom vma to wrap the page
directory as opposed to a GEM object.

In this patch, we leave the page directories pinned upon creation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
Put back vma->fence_size = size. I was overeager in trying to remove
unused members.
-Chris
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 257 ++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   2 +-
 drivers/gpu/drm/i915/i915_vma.h     |   7 +
 3 files changed, 155 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd338bccf706..2c739e21c085 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1640,50 +1640,55 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 	struct i915_address_space *vm = &base->vm;
-	struct i915_page_table *unused;
-	gen6_pte_t scratch_pte;
-	u32 pd_entry, pte, pde;
-
-	scratch_pte = vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
-
-	gen6_for_all_pdes(unused, &base->pd, pde) {
-		u32 expected;
-		gen6_pte_t *pt_vaddr;
-		const dma_addr_t pt_addr = px_dma(base->pd.page_table[pde]);
-		pd_entry = readl(ppgtt->pd_addr + pde);
-		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
-
-		if (pd_entry != expected)
-			seq_printf(m, "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
-				   pde,
-				   pd_entry,
-				   expected);
-		seq_printf(m, "\tPDE: %x\n", pd_entry);
-
-		pt_vaddr = kmap_atomic_px(base->pd.page_table[pde]);
-
-		for (pte = 0; pte < GEN6_PTES; pte+=4) {
-			unsigned long va =
-				(pde * PAGE_SIZE * GEN6_PTES) +
-				(pte * PAGE_SIZE);
+	const gen6_pte_t scratch_pte =
+		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	struct i915_page_table *pt;
+	u32 pte, pde;
+
+	gen6_for_all_pdes(pt, &base->pd, pde) {
+		gen6_pte_t *vaddr;
+
+		if (pt == base->vm.scratch_pt)
+			continue;
+
+		if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND)) {
+			u32 expected =
+				GEN6_PDE_ADDR_ENCODE(px_dma(pt)) |
+				GEN6_PDE_VALID;
+			u32 pd_entry = readl(ppgtt->pd_addr + pde);
+
+			if (pd_entry != expected)
+				seq_printf(m,
+					   "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
+					   pde,
+					   pd_entry,
+					   expected);
+
+			seq_printf(m, "\tPDE: %x\n", pd_entry);
+		}
+
+		vaddr = kmap_atomic_px(base->pd.page_table[pde]);
+		for (pte = 0; pte < GEN6_PTES; pte += 4) {
 			int i;
-			bool found = false;
+
 			for (i = 0; i < 4; i++)
-				if (pt_vaddr[pte + i] != scratch_pte)
-					found = true;
-			if (!found)
+				if (vaddr[pte + i] != scratch_pte)
+					break;
+			if (i == 4)
 				continue;
 
-			seq_printf(m, "\t\t0x%lx [%03d,%04d]: =", va, pde, pte);
+			seq_printf(m, "\t\t(%03d, %04d) %08lx: ",
+				   pde, pte,
+				   (pde * GEN6_PTES + pte) * PAGE_SIZE);
 			for (i = 0; i < 4; i++) {
-				if (pt_vaddr[pte + i] != scratch_pte)
-					seq_printf(m, " %08x", pt_vaddr[pte + i]);
+				if (vaddr[pte + i] != scratch_pte)
+					seq_printf(m, " %08x", vaddr[pte + i]);
 				else
-					seq_puts(m, "  SCRATCH ");
+					seq_puts(m, "  SCRATCH");
 			}
 			seq_puts(m, "\n");
 		}
-		kunmap_atomic(pt_vaddr);
+		kunmap_atomic(vaddr);
 	}
 }
 
@@ -1697,22 +1702,6 @@ static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt,
 		  ppgtt->pd_addr + pde);
 }
 
-/* Write all the page tables found in the ppgtt structure to incrementing page
- * directories. */
-static void gen6_write_page_range(struct i915_hw_ppgtt *base,
-				  u32 start, u32 length)
-{
-	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
-	struct i915_page_table *pt;
-	unsigned int pde;
-
-	gen6_for_each_pde(pt, &base->pd, start, length, pde)
-		gen6_write_pde(ppgtt, pde, pt);
-
-	mark_tlbs_dirty(base);
-	gen6_ggtt_invalidate(base->vm.i915);
-}
-
 static inline u32 get_pd_offset(struct gen6_hw_ppgtt *ppgtt)
 {
 	GEM_BUG_ON(ppgtt->base.pd.base.ggtt_offset & 0x3f);
@@ -1919,8 +1908,12 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 			gen6_initialize_pt(vm, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
-			gen6_write_pde(ppgtt, pde, pt);
-			flush = true;
+
+			if (i915_vma_is_bound(ppgtt->vma,
+					      I915_VMA_GLOBAL_BIND)) {
+				gen6_write_pde(ppgtt, pde, pt);
+				flush = true;
+			}
 		}
 	}
 
@@ -1936,8 +1929,11 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	return -ENOMEM;
 }
 
-static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
+static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt)
 {
+	struct i915_address_space * const vm = &ppgtt->base.vm;
+	struct i915_page_table *unused;
+	u32 pde;
 	int ret;
 
 	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
@@ -1951,6 +1947,8 @@ static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
 	}
 
 	gen6_initialize_pt(vm, vm->scratch_pt);
+	gen6_for_all_pdes(unused, &ppgtt->base.pd, pde)
+		ppgtt->base.pd.page_table[pde] = vm->scratch_pt;
 
 	return 0;
 }
@@ -1975,52 +1973,104 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 
-	drm_mm_remove_node(&ppgtt->node);
+	i915_vma_unpin(ppgtt->vma);
+	i915_vma_destroy(ppgtt->vma);
 
 	gen6_ppgtt_free_pd(ppgtt);
 	gen6_ppgtt_free_scratch(vm);
 }
 
-static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
+static int pd_vma_set_pages(struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	int err;
+	vma->pages = ERR_PTR(-ENODEV);
+	return 0;
+}
 
-	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
-	 * allocator works in address space sizes, so it's multiplied by page
-	 * size. We allocate at the top of the GTT to avoid fragmentation.
-	 */
-	BUG_ON(!drm_mm_initialized(&ggtt->vm.mm));
+static void pd_vma_clear_pages(struct i915_vma *vma)
+{
+	GEM_BUG_ON(!vma->pages);
 
-	err = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
-				  GEN6_PD_SIZE, GEN6_PD_ALIGN,
-				  I915_COLOR_UNEVICTABLE,
-				  0, ggtt->vm.total,
-				  PIN_HIGH);
-	if (err)
-		return err;
+	vma->pages = NULL;
+}
+
+static int pd_vma_bind(struct i915_vma *vma,
+		       enum i915_cache_level cache_level,
+		       u32 unused)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct gen6_hw_ppgtt *ppgtt = vma->private;
+	u32 ggtt_offset = i915_ggtt_offset(vma) / PAGE_SIZE;
+	struct i915_page_table *pt;
+	unsigned int pde;
 
-	if (ppgtt->node.start < ggtt->mappable_end)
-		DRM_DEBUG("Forced to use aperture for PDEs\n");
+	ppgtt->base.pd.base.ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
+	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
 
-	ppgtt->base.pd.base.ggtt_offset =
-		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
+		gen6_write_pde(ppgtt, pde, pt);
 
-	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
-		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
+	mark_tlbs_dirty(&ppgtt->base);
+	gen6_ggtt_invalidate(ppgtt->base.vm.i915);
 
 	return 0;
 }
 
-static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
-				  u64 start, u64 length)
+static void pd_vma_unbind(struct i915_vma *vma)
 {
-	struct i915_page_table *unused;
-	u32 pde;
+}
 
-	gen6_for_each_pde(unused, &ppgtt->base.pd, start, length, pde)
-		ppgtt->base.pd.page_table[pde] = ppgtt->base.vm.scratch_pt;
+static const struct i915_vma_ops pd_vma_ops = {
+	.set_pages = pd_vma_set_pages,
+	.clear_pages = pd_vma_clear_pages,
+	.bind_vma = pd_vma_bind,
+	.unbind_vma = pd_vma_unbind,
+};
+
+static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
+{
+	struct drm_i915_private *i915 = ppgtt->base.vm.i915;
+	struct i915_ggtt *ggtt = &i915->ggtt;
+	struct i915_vma *vma;
+	int i;
+
+	GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
+	GEM_BUG_ON(size > ggtt->vm.total);
+
+	vma = kmem_cache_zalloc(i915->vmas, GFP_KERNEL);
+	if (!vma)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
+		init_request_active(&vma->last_read[i], NULL);
+	init_request_active(&vma->last_fence, NULL);
+
+	vma->vm = &ggtt->vm;
+	vma->ops = &pd_vma_ops;
+	vma->private = ppgtt;
+
+	vma->size = size;
+	vma->fence_size = size;
+	vma->flags = I915_VMA_GGTT;
+	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
+
+	INIT_LIST_HEAD(&vma->obj_link);
+	list_add(&vma->vm_link, &vma->vm->unbound_list);
+
+	return vma;
+}
+
+static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+
+	/*
+	 * PPGTT PDEs reside in the GGTT and consists of 512 entries. The
+	 * allocator works in address space sizes, so it's multiplied by page
+	 * size. We allocate at the top of the GTT to avoid fragmentation.
+	 */
+	return i915_vma_pin(ppgtt->vma,
+			    0, GEN6_PD_ALIGN,
+			    PIN_GLOBAL | PIN_HIGH);
 }
 
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
@@ -2058,24 +2108,27 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	else
 		BUG();
 
-	err = gen6_ppgtt_init_scratch(&ppgtt->base.vm);
+	err = gen6_ppgtt_init_scratch(ppgtt);
 	if (err)
 		goto err_free;
 
-	err = gen6_ppgtt_allocate_page_directories(ppgtt);
-	if (err)
+	ppgtt->vma = pd_vma_create(ppgtt, GEN6_PD_SIZE);
+	if (IS_ERR(ppgtt->vma)) {
+		err = PTR_ERR(ppgtt->vma);
 		goto err_scratch;
-
-	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
-	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
+	}
 
 	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
+	if (err)
+		goto err_vma;
+
+	err = gen6_ppgtt_pin(&ppgtt->base);
 	if (err)
 		goto err_pd;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
-			 ppgtt->node.size >> 20,
-			 ppgtt->node.start / PAGE_SIZE);
+			 ppgtt->vma->node.size >> 20,
+			 ppgtt->vma->node.start / PAGE_SIZE);
 
 	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
 			 ppgtt->base.pd.base.ggtt_offset << 10);
@@ -2084,6 +2137,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 err_pd:
 	gen6_ppgtt_free_pd(ppgtt);
+err_vma:
+	i915_vma_destroy(ppgtt->vma);
 err_scratch:
 	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
@@ -3602,6 +3657,7 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	}
 
 	ggtt->vm.closed = false;
+	i915_ggtt_invalidate(dev_priv);
 
 	if (INTEL_GEN(dev_priv) >= 8) {
 		struct intel_ppat *ppat = &dev_priv->ppat;
@@ -3610,25 +3666,6 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 		dev_priv->ppat.update_hw(dev_priv);
 		return;
 	}
-
-	if (USES_PPGTT(dev_priv)) {
-		struct i915_address_space *vm;
-
-		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
-			struct i915_hw_ppgtt *ppgtt;
-
-			if (i915_is_ggtt(vm))
-				ppgtt = dev_priv->mm.aliasing_ppgtt;
-			else
-				ppgtt = i915_vm_to_ppgtt(vm);
-			if (!ppgtt)
-				continue;
-
-			gen6_write_page_range(ppgtt, 0, ppgtt->vm.total);
-		}
-	}
-
-	i915_ggtt_invalidate(dev_priv);
 }
 
 static struct scatterlist *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 199d6f47a557..c2f270c90bea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -410,7 +410,7 @@ struct i915_hw_ppgtt {
 struct gen6_hw_ppgtt {
 	struct i915_hw_ppgtt base;
 
-	struct drm_mm_node node;
+	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
 
 	int (*switch_mm)(struct gen6_hw_ppgtt *ppgtt, struct i915_request *rq);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 4321476a6a32..66a228931517 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -54,6 +54,7 @@ struct i915_vma {
 	struct reservation_object *resv; /** Alias of obj->resv */
 	struct sg_table *pages;
 	void __iomem *iomap;
+	void *private; /* owned by creator */
 	u64 size;
 	u64 display_alignment;
 	struct i915_page_sizes page_sizes;
@@ -340,6 +341,12 @@ static inline void i915_vma_unpin(struct i915_vma *vma)
 	__i915_vma_unpin(vma);
 }
 
+static inline bool i915_vma_is_bound(const struct i915_vma *vma,
+				     unsigned int where)
+{
+	return vma->flags & where;
+}
+
 /**
  * i915_vma_pin_iomap - calls ioremap_wc to map the GGTT VMA via the aperture
  * @vma: VMA to iomap
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (17 preceding siblings ...)
  2018-06-08 12:56 ` [PATCH 18/18] RFT drm/i915/gtt: Enable full-ppgtt by default everywhere Chris Wilson
@ 2018-06-08 13:25 ` Patchwork
  2018-06-08 13:30 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-06-08 13:25 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
URL   : https://patchwork.freedesktop.org/series/44486/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
c1efb9574092 drm/i915: Apply batch location restrictions before pinning
-:30: WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
#30: FILE: drivers/gpu/drm/i915/i915_gem_execbuffer.c:493:
+	   unsigned int i, unsigned batch_idx,

total: 0 errors, 1 warnings, 0 checks, 86 lines checked
aa246150d1ae drm/i915/ringbuffer: Brute force context restore
276fa4d5d41c drm/i915/ringbuffer: Fix context restore upon reset
8096c4955335 drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
c048d4432be3 drm/i915/gtt: Subclass gen6_hw_ppgtt
-:326: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'base' - possible side-effects?
#326: FILE: drivers/gpu/drm/i915/i915_gem_gtt.h:419:
+#define __to_gen6_ppgtt(base) container_of(base, struct gen6_hw_ppgtt, base)

total: 0 errors, 0 warnings, 1 checks, 330 lines checked
f3abd9908e36 drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
51bf579d69f0 drm/i915/gtt: Reorder aliasing_ppgtt fini
69a521d88bfe drm/i915/gtt: Make gen6 page directories evictable
66674381f287 drm/i915/gtt: Only keep gen6 page directories pinned while active
75f47fae2367 drm/i915/gtt: Lazily allocate page directories for gen7
0d40e5eb8990 drm/i915/gtt: Free unused page tables on unbind the context
aefdc08285f0 drm/i915/gtt: Skip initializing PT with scratch if full
-:9: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#9: 
References: 14826673247e ("drm/i915: Only initialize partially filled pagetables")

-:9: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 14826673247e ("drm/i915: Only initialize partially filled pagetables")'
#9: 
References: 14826673247e ("drm/i915: Only initialize partially filled pagetables")

total: 1 errors, 1 warnings, 0 checks, 9 lines checked
b14c6c476216 drm/i915/gtt: Cache the PTE encoding of the scratch page
2943ce67094f drm/i915/gtt: Reduce a pair of runtime asserts
-:39: CHECK:SPACING: spaces preferred around that '|' (ctx:VxV)
#39: FILE: drivers/gpu/drm/i915/i915_gem_gtt.h:470:
+	GEM_BUG_ON(offset_in_page(addr|length));
 	                              ^

total: 0 errors, 0 warnings, 1 checks, 18 lines checked
45557e9cb79b drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
12ad9bea4b2a drm/i915/gtt: Remove redundant hsw_mm_switch()
a577338f5224 drm/i915/gtt: Remove vgpu check for gen6
bf7d463b083b RFT drm/i915/gtt: Enable full-ppgtt by default everywhere

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (18 preceding siblings ...)
  2018-06-08 13:25 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2) Patchwork
@ 2018-06-08 13:30 ` Patchwork
  2018-06-08 13:41 ` ✓ Fi.CI.BAT: success " Patchwork
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-06-08 13:30 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
URL   : https://patchwork.freedesktop.org/series/44486/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Apply batch location restrictions before pinning
Okay!

Commit: drm/i915/ringbuffer: Brute force context restore
Okay!

Commit: drm/i915/ringbuffer: Fix context restore upon reset
Okay!

Commit: drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
Okay!

Commit: drm/i915/gtt: Subclass gen6_hw_ppgtt
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1649:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1649:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1708:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1708:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1709:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1709:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1913:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1913:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:2031:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:2031:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1914:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1914:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:2031:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:2031:9: warning: expression using sizeof(void)

Commit: drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
Okay!

Commit: drm/i915/gtt: Reorder aliasing_ppgtt fini
Okay!

Commit: drm/i915/gtt: Make gen6 page directories evictable
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1709:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1709:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:2022:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:2022:9: warning: expression using sizeof(void)

Commit: drm/i915/gtt: Only keep gen6 page directories pinned while active
Okay!

Commit: drm/i915/gtt: Lazily allocate page directories for gen7
Okay!

Commit: drm/i915/gtt: Free unused page tables on unbind the context
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1827:36: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1896:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1896:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1827:42: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1906:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1906:9: warning: expression using sizeof(void)

Commit: drm/i915/gtt: Skip initializing PT with scratch if full
Okay!

Commit: drm/i915/gtt: Cache the PTE encoding of the scratch page
Okay!

Commit: drm/i915/gtt: Reduce a pair of runtime asserts
Okay!

Commit: drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
Okay!

Commit: drm/i915/gtt: Remove redundant hsw_mm_switch()
Okay!

Commit: drm/i915/gtt: Remove vgpu check for gen6
Okay!

Commit: RFT drm/i915/gtt: Enable full-ppgtt by default everywhere
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (19 preceding siblings ...)
  2018-06-08 13:30 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-06-08 13:41 ` Patchwork
  2018-06-08 16:23 ` ✗ Fi.CI.IGT: failure " Patchwork
  2018-06-08 18:03 ` ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev3) Patchwork
  22 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-06-08 13:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
URL   : https://patchwork.freedesktop.org/series/44486/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4294 -> Patchwork_9241 =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9241 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9241, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/44486/revisions/2/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9241:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_gttfill@basic:
      fi-pnv-d510:        SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9241 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_module_reload@basic-reload-inject:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106725, fdo#106248)

    igt@kms_pipe_crc_basic@nonblocking-crc-pipe-c-frame-sequence:
      fi-bdw-5557u:       PASS -> FAIL (fdo#103481)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-bxt-dsi:         NOTRUN -> INCOMPLETE (fdo#103927)

    igt@prime_vgem@basic-fence-flip:
      fi-bdw-5557u:       PASS -> FAIL (fdo#104008)

    
    ==== Possible fixes ====

    igt@kms_chamelium@dp-crc-fast:
      fi-kbl-7500u:       DMESG-FAIL (fdo#103841) -> PASS

    igt@kms_flip@basic-flip-vs-dpms:
      fi-glk-j4005:       DMESG-WARN (fdo#106000) -> PASS

    
  fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106248 https://bugs.freedesktop.org/show_bug.cgi?id=106248
  fdo#106725 https://bugs.freedesktop.org/show_bug.cgi?id=106725


== Participating hosts (41 -> 37) ==

  Additional (1): fi-bxt-dsi 
  Missing    (5): fi-byt-j1900 fi-byt-squawks fi-ilk-m540 fi-cnl-y3 fi-skl-6700hq 


== Build changes ==

    * Linux: CI_DRM_4294 -> Patchwork_9241

  CI_DRM_4294: af0889384edc6de2f91494325d571c66dffea83f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4512: 093fa482371795c3aa246509994eb21907f501b9 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9241: bf7d463b083bc7108ce2c73d6669a72d997b5676 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

bf7d463b083b RFT drm/i915/gtt: Enable full-ppgtt by default everywhere
a577338f5224 drm/i915/gtt: Remove vgpu check for gen6
12ad9bea4b2a drm/i915/gtt: Remove redundant hsw_mm_switch()
45557e9cb79b drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
2943ce67094f drm/i915/gtt: Reduce a pair of runtime asserts
b14c6c476216 drm/i915/gtt: Cache the PTE encoding of the scratch page
aefdc08285f0 drm/i915/gtt: Skip initializing PT with scratch if full
0d40e5eb8990 drm/i915/gtt: Free unused page tables on unbind the context
75f47fae2367 drm/i915/gtt: Lazily allocate page directories for gen7
66674381f287 drm/i915/gtt: Only keep gen6 page directories pinned while active
69a521d88bfe drm/i915/gtt: Make gen6 page directories evictable
51bf579d69f0 drm/i915/gtt: Reorder aliasing_ppgtt fini
f3abd9908e36 drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
c048d4432be3 drm/i915/gtt: Subclass gen6_hw_ppgtt
8096c4955335 drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
276fa4d5d41c drm/i915/ringbuffer: Fix context restore upon reset
aa246150d1ae drm/i915/ringbuffer: Brute force context restore
c1efb9574092 drm/i915: Apply batch location restrictions before pinning

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9241/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/18] drm/i915/ringbuffer: Brute force context restore
  2018-06-08 12:55 ` [PATCH 02/18] drm/i915/ringbuffer: Brute force context restore Chris Wilson
@ 2018-06-08 13:52   ` Mika Kuoppala
  2018-06-08 14:00     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Mika Kuoppala @ 2018-06-08 13:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> An issue encountered with switching mm on gen7 is that the GPU likes to
> hang (with the VS unit busy) when told to force restore the current
> context. We can simply workaround this by substituting the
> MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
> forcing the context to be saved and restored; thereby reloading the
> PP_DIR registers and updating the modified page directory!
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 30 ++++++++++++++++++++++---
>  1 file changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 65811e2fa7da..332d97bc5c27 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1458,6 +1458,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  		(HAS_LEGACY_SEMAPHORES(i915) && IS_GEN7(i915)) ?
>  		INTEL_INFO(i915)->num_rings - 1 :
>  		0;
> +	bool force_restore = false;
>  	int len;
>  	u32 *cs;
>  
> @@ -1471,6 +1472,12 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  	len = 4;
>  	if (IS_GEN7(i915))
>  		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
> +	if (flags & MI_FORCE_RESTORE) {
> +		GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
> +		flags &= ~MI_FORCE_RESTORE;
> +		force_restore = true;
> +		len += 2;
> +	}
>  
>  	cs = intel_ring_begin(rq, len);
>  	if (IS_ERR(cs))
> @@ -1496,6 +1503,20 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  	}
>  
>  	*cs++ = MI_NOOP;
> +	if (force_restore) {
> +		/*
> +		 * The HW doesn't handle being told to restore the current
> +		 * context very well. Quite often it likes goes to go off and
> +		 * sulk, especially when it is meant to be reloading PP_DIR.
> +		 * A very simple fix to force the reload is to simply switch
> +		 * away from the current context and back again.
> +		 */
> +		*cs++ = MI_SET_CONTEXT;
> +		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
> +							  engine)->state) |
> +			MI_MM_SPACE_GTT |
> +			MI_RESTORE_INHIBIT;

Why inhibit? You dont really switch to kernel but rather overwrite
current with kernel ctx.

-Mika

> +	}
>  	*cs++ = MI_SET_CONTEXT;
>  	*cs++ = i915_ggtt_offset(rq->hw_context->state) | flags;
>  	/*
> @@ -1585,11 +1606,14 @@ static int switch_context(struct i915_request *rq)
>  
>  		to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
>  		engine->legacy_active_ppgtt = to_mm;
> -		hw_flags = MI_FORCE_RESTORE;
> +
> +		if (to_ctx == from_ctx) {
> +			hw_flags = MI_FORCE_RESTORE;
> +			from_ctx = NULL;
> +		}
>  	}
>  
> -	if (rq->hw_context->state &&
> -	    (to_ctx != from_ctx || hw_flags & MI_FORCE_RESTORE)) {
> +	if (rq->hw_context->state && to_ctx != from_ctx) {
>  		GEM_BUG_ON(engine->id != RCS);
>  
>  		/*
> -- 
> 2.17.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 02/18] drm/i915/ringbuffer: Brute force context restore
  2018-06-08 13:52   ` Mika Kuoppala
@ 2018-06-08 14:00     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 14:00 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-06-08 14:52:13)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > An issue encountered with switching mm on gen7 is that the GPU likes to
> > hang (with the VS unit busy) when told to force restore the current
> > context. We can simply workaround this by substituting the
> > MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
> > forcing the context to be saved and restored; thereby reloading the
> > PP_DIR registers and updating the modified page directory!
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Matthew Auld <matthew.william.auld@gmail.com>
> > ---
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 30 ++++++++++++++++++++++---
> >  1 file changed, 27 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 65811e2fa7da..332d97bc5c27 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1458,6 +1458,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> >               (HAS_LEGACY_SEMAPHORES(i915) && IS_GEN7(i915)) ?
> >               INTEL_INFO(i915)->num_rings - 1 :
> >               0;
> > +     bool force_restore = false;
> >       int len;
> >       u32 *cs;
> >  
> > @@ -1471,6 +1472,12 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> >       len = 4;
> >       if (IS_GEN7(i915))
> >               len += 2 + (num_rings ? 4*num_rings + 6 : 0);
> > +     if (flags & MI_FORCE_RESTORE) {
> > +             GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
> > +             flags &= ~MI_FORCE_RESTORE;
> > +             force_restore = true;
> > +             len += 2;
> > +     }
> >  
> >       cs = intel_ring_begin(rq, len);
> >       if (IS_ERR(cs))
> > @@ -1496,6 +1503,20 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
> >       }
> >  
> >       *cs++ = MI_NOOP;
> > +     if (force_restore) {
> > +             /*
> > +              * The HW doesn't handle being told to restore the current
> > +              * context very well. Quite often it likes goes to go off and
> > +              * sulk, especially when it is meant to be reloading PP_DIR.
> > +              * A very simple fix to force the reload is to simply switch
> > +              * away from the current context and back again.
> > +              */
> > +             *cs++ = MI_SET_CONTEXT;
> > +             *cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
> > +                                                       engine)->state) |
> > +                     MI_MM_SPACE_GTT |
> > +                     MI_RESTORE_INHIBIT;
> 
> Why inhibit? You dont really switch to kernel but rather overwrite
> current with kernel ctx.

This is for the switch to kernel context. We don't want to load the
kernel context image, just trigger the load of our own context in the
next MI_SET_CONTEXT.

Ideally, we would tell it not to save the kernel context either. But we
don't have that option in MI_SET_CONTEXT. This does mean that the kernel
context then contains state from current, but we *never* use the kernel
context state.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 16/18] drm/i915/gtt: Remove redundant hsw_mm_switch()
  2018-06-08 12:56 ` [PATCH 16/18] drm/i915/gtt: Remove redundant hsw_mm_switch() Chris Wilson
@ 2018-06-08 14:03   ` Mika Kuoppala
  0 siblings, 0 replies; 38+ messages in thread
From: Mika Kuoppala @ 2018-06-08 14:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> hsw_mm_switch() and gen7_mm_switch() are identical, so let's remove the
> redundant specialism.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 24 ------------------------
>  1 file changed, 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 60a8332a122e..25ad94b1b67e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1697,28 +1697,6 @@ static inline u32 get_pd_offset(struct gen6_hw_ppgtt *ppgtt)
>  	return ppgtt->base.pd.base.ggtt_offset << 10;
>  }
>  
> -static int hsw_mm_switch(struct gen6_hw_ppgtt *ppgtt,
> -			 struct i915_request *rq)
> -{
> -	struct intel_engine_cs *engine = rq->engine;
> -	u32 *cs;
> -
> -	/* NB: TLBs must be flushed and invalidated before a switch */
> -	cs = intel_ring_begin(rq, 6);
> -	if (IS_ERR(cs))
> -		return PTR_ERR(cs);
> -
> -	*cs++ = MI_LOAD_REGISTER_IMM(2);
> -	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine));
> -	*cs++ = PP_DIR_DCLV_2G;
> -	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
> -	*cs++ = get_pd_offset(ppgtt);
> -	*cs++ = MI_NOOP;
> -	intel_ring_advance(rq, cs);
> -
> -	return 0;
> -}
> -
>  static int gen7_mm_switch(struct gen6_hw_ppgtt *ppgtt,
>  			  struct i915_request *rq)
>  {
> @@ -2148,8 +2126,6 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
>  	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
>  	if (intel_vgpu_active(i915) || IS_GEN6(i915))
>  		ppgtt->switch_mm = gen6_mm_switch;
> -	else if (IS_HASWELL(i915))
> -		ppgtt->switch_mm = hsw_mm_switch;
>  	else if (IS_GEN7(i915))
>  		ppgtt->switch_mm = gen7_mm_switch;
>  	else
> -- 
> 2.17.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 17/18] drm/i915/gtt: Remove vgpu check for gen6
  2018-06-08 12:56 ` [PATCH 17/18] drm/i915/gtt: Remove vgpu check for gen6 Chris Wilson
@ 2018-06-08 14:06   ` Mika Kuoppala
  0 siblings, 0 replies; 38+ messages in thread
From: Mika Kuoppala @ 2018-06-08 14:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Since vgpu is not supported on Haswell or any other gen6/7, we do not
> need to check and act upon it's enablement.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 25ad94b1b67e..ca067d9adf54 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2124,7 +2124,7 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
>  	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
>  
>  	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
> -	if (intel_vgpu_active(i915) || IS_GEN6(i915))
> +	if (IS_GEN6(i915))
>  		ppgtt->switch_mm = gen6_mm_switch;
>  	else if (IS_GEN7(i915))
>  		ppgtt->switch_mm = gen7_mm_switch;
> -- 
> 2.17.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 03/18] drm/i915/ringbuffer: Fix context restore upon reset
  2018-06-08 12:55 ` [PATCH 03/18] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
@ 2018-06-08 14:26   ` Chris Wilson
  2018-06-08 17:26   ` [PATCH v2] " Chris Wilson
  1 sibling, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 14:26 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-06-08 13:55:47)
> @@ -570,42 +585,10 @@ static void reset_ring(struct intel_engine_cs *engine,
>          * the restored context.
>          */
>         if (request) {
> -               struct drm_i915_private *dev_priv = request->i915;
> -               struct intel_context *ce = request->hw_context;
> -               struct i915_hw_ppgtt *ppgtt;
> -
> -               if (ce->state) {
> -                       I915_WRITE(CCID,
> -                                  i915_ggtt_offset(ce->state) |
> -                                  BIT(8) /* must be set! */ |
> -                                  CCID_EXTENDED_STATE_SAVE |
> -                                  CCID_EXTENDED_STATE_RESTORE |
> -                                  CCID_EN);
> -               }
> -
> -               ppgtt = request->gem_context->ppgtt ?: engine->i915->mm.aliasing_ppgtt;
> -               if (ppgtt) {
> -                       u32 pd_offset = ppgtt->pd.base.ggtt_offset << 10;
> -
> -                       I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
> -                       I915_WRITE(RING_PP_DIR_BASE(engine), pd_offset);
> -
> -                       /* Wait for the PD reload to complete */
> -                       if (intel_wait_for_register(dev_priv,
> -                                                   RING_PP_DIR_BASE(engine),
> -                                                   BIT(0), 0,
> -                                                   10))
> -                               DRM_ERROR("Wait for reload of ppgtt page-directory timed out\n");
> -
> -                       ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);

So I forgot about Sandybridge here. Snb doesn't place an LRI in each
request, and uses mmio instead. Hence has the problem of not setting
PP_DIR after reset until the next request is submitted.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 10/18] drm/i915/gtt: Lazily allocate page directories for gen7
  2018-06-08 12:55 ` [PATCH 10/18] drm/i915/gtt: Lazily allocate page directories for gen7 Chris Wilson
@ 2018-06-08 14:37   ` Matthew Auld
  2018-06-08 14:43     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Matthew Auld @ 2018-06-08 14:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On 8 June 2018 at 13:55, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> As we were only supporting aliasing_ppgtt on gen7 for some time, we
> saved a few checks by preallocating the page directories on creation.
> However, since we need 2MiB of page directories for each ppgtt, to
> support arbitrary numbers of user contexts, we need to be more prudent
> in our allocations, and defer the page allocation until it is used. We
> don't recover unused pages yet as we found that doing so on the fly
> (i.e. altering TLB entries) would confuse the GPU.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 67 +++++++++++------------------
>  1 file changed, 26 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index d5af099939f6..e611884596a6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -190,11 +190,19 @@ int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv,
>         return 1;
>  }
>
> -static int gen6_ppgtt_bind_vma(struct i915_vma *vma,
> -                              enum i915_cache_level cache_level,
> -                              u32 unused)
> +static int ppgtt_bind_vma(struct i915_vma *vma,
> +                         enum i915_cache_level cache_level,
> +                         u32 unused)
>  {
>         u32 pte_flags;
> +       int err;
> +
> +       if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
> +               err = vma->vm->allocate_va_range(vma->vm,
> +                                                vma->node.start, vma->size);
> +               if (err)
> +                       return err;
> +       }
>
>         /* Currently applicable only to VLV */
>         pte_flags = 0;
> @@ -206,22 +214,6 @@ static int gen6_ppgtt_bind_vma(struct i915_vma *vma,
>         return 0;
>  }
>
> -static int gen8_ppgtt_bind_vma(struct i915_vma *vma,
> -                              enum i915_cache_level cache_level,
> -                              u32 unused)
> -{
> -       int ret;
> -
> -       if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
> -               ret = vma->vm->allocate_va_range(vma->vm,
> -                                                vma->node.start, vma->size);
> -               if (ret)
> -                       return ret;
> -       }
> -
> -       return gen6_ppgtt_bind_vma(vma, cache_level, unused);
> -}
> -
>  static void ppgtt_unbind_vma(struct i915_vma *vma)
>  {
>         vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
> @@ -1622,7 +1614,7 @@ static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915)
>         ppgtt->vm.cleanup = gen8_ppgtt_cleanup;
>         ppgtt->debug_dump = gen8_dump_ppgtt;
>
> -       ppgtt->vm.vma_ops.bind_vma    = gen8_ppgtt_bind_vma;
> +       ppgtt->vm.vma_ops.bind_vma    = ppgtt_bind_vma;
>         ppgtt->vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
>         ppgtt->vm.vma_ops.set_pages   = ppgtt_set_pages;
>         ppgtt->vm.vma_ops.clear_pages = clear_pages;
> @@ -1837,7 +1829,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
>
>                 num_entries -= end - pte;
>
> -               /* Note that the hw doesn't support removing PDE on the fly
> +               /*
> +                * Note that the hw doesn't support removing PDE on the fly
>                  * (they are cached inside the context with no means to
>                  * invalidate the cache), so we can only reset the PTE
>                  * entries back to scratch.
> @@ -2106,12 +2099,13 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
>
>         ppgtt->base.vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
>
> +       ppgtt->base.vm.allocate_va_range = gen6_alloc_va_range;

Ah, in gen6_alloc_va_range() I think we now need:

unwind_out:
-       gen6_ppgtt_clear_range(vm, from, start);
+       gen6_ppgtt_clear_range(vm, from, start - from);
        return -ENOMEM;
 }

?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 10/18] drm/i915/gtt: Lazily allocate page directories for gen7
  2018-06-08 14:37   ` Matthew Auld
@ 2018-06-08 14:43     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 14:43 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Matthew Auld (2018-06-08 15:37:43)
> Ah, in gen6_alloc_va_range() I think we now need:
> 
> unwind_out:
> -       gen6_ppgtt_clear_range(vm, from, start);
> +       gen6_ppgtt_clear_range(vm, from, start - from);

You are very right.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (20 preceding siblings ...)
  2018-06-08 13:41 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-06-08 16:23 ` Patchwork
  2018-06-08 16:36   ` Chris Wilson
  2018-06-08 18:03 ` ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev3) Patchwork
  22 siblings, 1 reply; 38+ messages in thread
From: Patchwork @ 2018-06-08 16:23 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
URL   : https://patchwork.freedesktop.org/series/44486/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4294_full -> Patchwork_9241_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9241_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9241_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9241_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_selftest@live_hangcheck:
      shard-snb:          PASS -> DMESG-FAIL
      shard-hsw:          PASS -> DMESG-FAIL

    
    ==== Warnings ====

    igt@gem_mocs_settings@mocs-rc6-render:
      shard-kbl:          PASS -> SKIP +1

    igt@gem_ppgtt@flink-and-close-vma-leak:
      shard-hsw:          SKIP -> PASS +2

    igt@perf_pmu@rc6:
      shard-kbl:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9241_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_hangcheck:
      shard-apl:          PASS -> DMESG-FAIL (fdo#106560)

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking:
      shard-glk:          PASS -> FAIL (fdo#105703)

    igt@kms_flip@dpms-vs-vblank-race:
      shard-glk:          PASS -> FAIL (fdo#103060)

    igt@kms_flip@dpms-vs-vblank-race-interruptible:
      shard-hsw:          PASS -> FAIL (fdo#103060)

    igt@kms_flip_tiling@flip-to-x-tiled:
      shard-glk:          PASS -> FAIL (fdo#104724)

    igt@kms_setmode@basic:
      shard-apl:          PASS -> FAIL (fdo#99912)

    
    ==== Possible fixes ====

    igt@drv_selftest@live_gtt:
      shard-apl:          FAIL (fdo#105347) -> PASS

    igt@gem_eio@suspend:
      shard-snb:          INCOMPLETE (fdo#105411) -> PASS +1

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking-fencing:
      shard-glk:          FAIL (fdo#105703) -> PASS

    igt@kms_flip@2x-plain-flip-fb-recreate:
      shard-hsw:          FAIL (fdo#100368) -> PASS +1

    igt@kms_flip@modeset-vs-vblank-race-interruptible:
      shard-hsw:          FAIL (fdo#103060) -> PASS +1

    igt@kms_flip@plain-flip-fb-recreate-interruptible:
      shard-glk:          FAIL (fdo#100368) -> PASS +1

    igt@kms_flip_tiling@flip-to-y-tiled:
      shard-glk:          FAIL (fdo#104724) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-cpu:
      shard-snb:          FAIL (fdo#103167, fdo#104724) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105347 https://bugs.freedesktop.org/show_bug.cgi?id=105347
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#106560 https://bugs.freedesktop.org/show_bug.cgi?id=106560
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4294 -> Patchwork_9241

  CI_DRM_4294: af0889384edc6de2f91494325d571c66dffea83f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4512: 093fa482371795c3aa246509994eb21907f501b9 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9241: bf7d463b083bc7108ce2c73d6669a72d997b5676 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9241/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ✗ Fi.CI.IGT: failure for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
  2018-06-08 16:23 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-06-08 16:36   ` Chris Wilson
  2018-06-08 17:09     ` Chris Wilson
  0 siblings, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 16:36 UTC (permalink / raw)
  To: Patchwork; +Cc: intel-gfx

Quoting Patchwork (2018-06-08 17:23:38)
> == Series Details ==
> 
> Series: series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
> URL   : https://patchwork.freedesktop.org/series/44486/
> State : failure
> 
> == Summary ==
> 
> = CI Bug Log - changes from CI_DRM_4294_full -> Patchwork_9241_full =
> 
> == Summary - FAILURE ==
> 
>   Serious unknown changes coming with Patchwork_9241_full absolutely need to be
>   verified manually.
>   
>   If you think the reported changes have nothing to do with the changes
>   introduced in Patchwork_9241_full, please notify your bug team to allow them
>   to document this new failure mode, which will reduce false positives in CI.
> 
>   
> 
> == Possible new issues ==
> 
>   Here are the unknown changes that may have been introduced in Patchwork_9241_full:
> 
>   === IGT changes ===
> 
>     ==== Possible regressions ====
> 
>     igt@drv_selftest@live_hangcheck:
>       shard-snb:          PASS -> DMESG-FAIL
>       shard-hsw:          PASS -> DMESG-FAIL

Darn it! I expected the SNB fail after realising the mistake over its
missing mmio. But Haswell? You were the chosen one!

Other than that, it looked good and piglit was happy too.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: ✗ Fi.CI.IGT: failure for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
  2018-06-08 16:36   ` Chris Wilson
@ 2018-06-08 17:09     ` Chris Wilson
  0 siblings, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 17:09 UTC (permalink / raw)
  To: Patchwork; +Cc: intel-gfx

Quoting Chris Wilson (2018-06-08 17:36:28)
> Quoting Patchwork (2018-06-08 17:23:38)
> > == Series Details ==
> > 
> > Series: series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2)
> > URL   : https://patchwork.freedesktop.org/series/44486/
> > State : failure
> > 
> > == Summary ==
> > 
> > = CI Bug Log - changes from CI_DRM_4294_full -> Patchwork_9241_full =
> > 
> > == Summary - FAILURE ==
> > 
> >   Serious unknown changes coming with Patchwork_9241_full absolutely need to be
> >   verified manually.
> >   
> >   If you think the reported changes have nothing to do with the changes
> >   introduced in Patchwork_9241_full, please notify your bug team to allow them
> >   to document this new failure mode, which will reduce false positives in CI.
> > 
> >   
> > 
> > == Possible new issues ==
> > 
> >   Here are the unknown changes that may have been introduced in Patchwork_9241_full:
> > 
> >   === IGT changes ===
> > 
> >     ==== Possible regressions ====
> > 
> >     igt@drv_selftest@live_hangcheck:
> >       shard-snb:          PASS -> DMESG-FAIL
> >       shard-hsw:          PASS -> DMESG-FAIL
> 
> Darn it! I expected the SNB fail after realising the mistake over its
> missing mmio. But Haswell? You were the chosen one!

Ok, spotted it; a missing intel_ring_wrap(). Phew.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2] drm/i915/ringbuffer: Fix context restore upon reset
  2018-06-08 12:55 ` [PATCH 03/18] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
  2018-06-08 14:26   ` Chris Wilson
@ 2018-06-08 17:26   ` Chris Wilson
  1 sibling, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-08 17:26 UTC (permalink / raw)
  To: intel-gfx

The discovery with trying to enable full-ppgtt was that we were
completely failing to the load both the mm and context following the
reset. Although we were performing mmio to set the PP_DIR (per-process
GTT) and CCID (context), these were taking no effect (the assumption was
that this would trigger reload of the context and restore the page
tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
following context switch would anything occur.

Since we are then required to reset the context image and PP_DIR using
CS commands, we place those commands into every batch. The hardware
should recognise the no-ops and eliminate the expensive context loads,
but we still have to pay the cost of using cross-powerwell register
writes. In practice, this has no effect on actual context switch times,
and only adds a few hundred nanoseconds to no-op switches. We can improve
the latter by eliminating the w/a around known no-op switches, but there
is an ulterior motive to keeping them.

Always emitting the context switch at the beginning of the request (and
relying on HW to skip unneeded switches) does have one key advantage.
Should we implement request reordering on Haswell, we will not know in
advance what the previous executing context was on the GPU and so we
would not be able to elide the MI_SET_CONTEXT commands ourselves and
always have to emit them. Having our hand forced now actually prepares
us for later.

v2: Sandybridge has to agree to use LRI as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c     |  45 ---------
 drivers/gpu/drm/i915/i915_gem_gtt.h     |   2 -
 drivers/gpu/drm/i915/i915_request.c     |   2 +
 drivers/gpu/drm/i915/i915_request.h     |   3 +
 drivers/gpu/drm/i915/i915_trace.h       |  33 -------
 drivers/gpu/drm/i915/intel_engine_cs.c  |   3 -
 drivers/gpu/drm/i915/intel_ringbuffer.c | 124 +++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   9 --
 8 files changed, 64 insertions(+), 157 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ca747a82a00c..6b93bac911b5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1712,45 +1712,6 @@ static void gen6_write_page_range(struct i915_hw_ppgtt *ppgtt,
 	wmb();
 }
 
-static inline u32 get_pd_offset(struct i915_hw_ppgtt *ppgtt)
-{
-	GEM_BUG_ON(ppgtt->pd.base.ggtt_offset & 0x3f);
-	return ppgtt->pd.base.ggtt_offset << 10;
-}
-
-static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct i915_request *rq)
-{
-	struct intel_engine_cs *engine = rq->engine;
-	u32 *cs;
-
-	/* NB: TLBs must be flushed and invalidated before a switch */
-	cs = intel_ring_begin(rq, 6);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_LOAD_REGISTER_IMM(2);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine));
-	*cs++ = PP_DIR_DCLV_2G;
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
-	*cs++ = get_pd_offset(ppgtt);
-	*cs++ = MI_NOOP;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static int gen6_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct i915_request *rq)
-{
-	struct intel_engine_cs *engine = rq->engine;
-	struct drm_i915_private *dev_priv = rq->i915;
-
-	I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
-	I915_WRITE(RING_PP_DIR_BASE(engine), get_pd_offset(ppgtt));
-	return 0;
-}
-
 static void gen8_ppgtt_enable(struct drm_i915_private *dev_priv)
 {
 	struct intel_engine_cs *engine;
@@ -2024,12 +1985,6 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->vm.dma = &i915->drm.pdev->dev;
 
 	ppgtt->vm.pte_encode = ggtt->vm.pte_encode;
-	if (IS_GEN6(i915))
-		ppgtt->switch_mm = gen6_mm_switch;
-	else if (IS_GEN7(i915))
-		ppgtt->switch_mm = gen7_mm_switch;
-	else
-		BUG();
 
 	err = gen6_ppgtt_alloc(ppgtt);
 	if (err)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 16307ba7e303..e70f6abcd0f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -406,8 +406,6 @@ struct i915_hw_ppgtt {
 
 	gen6_pte_t __iomem *pd_addr;
 
-	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
-			 struct i915_request *rq);
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f187250e60c6..9092f5464c24 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -817,6 +817,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	/* Keep a second pin for the dual retirement along engine and ring */
 	__intel_context_pin(ce);
 
+	rq->infix = rq->ring->emit; /* end of header; start of user payload */
+
 	/* Check that we didn't interrupt ourselves with a new request */
 	GEM_BUG_ON(rq->timeline->seqno != rq->fence.seqno);
 	return rq;
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 491ff81d0fea..0e9aba53d0e4 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -134,6 +134,9 @@ struct i915_request {
 	/** Position in the ring of the start of the request */
 	u32 head;
 
+	/** Position in the ring of the start of the user packets */
+	u32 infix;
+
 	/**
 	 * Position in the ring of the start of the postfix.
 	 * This is required to calculate the maximum available ring space
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 1472f48ab2e8..b50c6b829715 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -973,39 +973,6 @@ DEFINE_EVENT(i915_context, i915_context_free,
 	TP_ARGS(ctx)
 );
 
-/**
- * DOC: switch_mm tracepoint
- *
- * This tracepoint allows tracking of the mm switch, which is an important point
- * in the lifetime of the vm in the legacy submission path. This tracepoint is
- * called only if full ppgtt is enabled.
- */
-TRACE_EVENT(switch_mm,
-	TP_PROTO(struct intel_engine_cs *engine, struct i915_gem_context *to),
-
-	TP_ARGS(engine, to),
-
-	TP_STRUCT__entry(
-			__field(u16, class)
-			__field(u16, instance)
-			__field(struct i915_gem_context *, to)
-			__field(struct i915_address_space *, vm)
-			__field(u32, dev)
-	),
-
-	TP_fast_assign(
-			__entry->class = engine->uabi_class;
-			__entry->instance = engine->instance;
-			__entry->to = to;
-			__entry->vm = to->ppgtt ? &to->ppgtt->vm : NULL;
-			__entry->dev = engine->i915->drm.primary->index;
-	),
-
-	TP_printk("dev=%u, engine=%u:%u, ctx=%p, ctx_vm=%p",
-		  __entry->dev, __entry->class, __entry->instance, __entry->to,
-		  __entry->vm)
-);
-
 #endif /* _I915_TRACE_H_ */
 
 /* This part must be outside protection */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2ec2e60dc670..d1cf8b4926ab 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1168,9 +1168,6 @@ void intel_engine_lost_context(struct intel_engine_cs *engine)
 
 	lockdep_assert_held(&engine->i915->drm.struct_mutex);
 
-	engine->legacy_active_context = NULL;
-	engine->legacy_active_ppgtt = NULL;
-
 	ce = fetch_and_zero(&engine->last_retired_context);
 	if (ce)
 		intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 332d97bc5c27..6ac3b65373fe 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -541,11 +541,23 @@ static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
 	return i915_gem_find_active_request(engine);
 }
 
-static void reset_ring(struct intel_engine_cs *engine,
-		       struct i915_request *request)
+static void skip_request(struct i915_request *rq)
 {
-	GEM_TRACE("%s seqno=%x\n",
-		  engine->name, request ? request->global_seqno : 0);
+	void *vaddr = rq->ring->vaddr;
+	u32 head;
+
+	head = rq->infix;
+	if (rq->postfix < head) {
+		memset32(vaddr + head, MI_NOOP,
+			 (rq->ring->size - head) / sizeof(u32));
+		head = 0;
+	}
+	memset32(vaddr + head, MI_NOOP, (rq->postfix - head) / sizeof(u32));
+}
+
+static void reset_ring(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	GEM_TRACE("%s seqno=%x\n", engine->name, rq ? rq->global_seqno : 0);
 
 	/*
 	 * RC6 must be prevented until the reset is complete and the engine
@@ -569,43 +581,11 @@ static void reset_ring(struct intel_engine_cs *engine,
 	 * If the request was innocent, we try to replay the request with
 	 * the restored context.
 	 */
-	if (request) {
-		struct drm_i915_private *dev_priv = request->i915;
-		struct intel_context *ce = request->hw_context;
-		struct i915_hw_ppgtt *ppgtt;
-
-		if (ce->state) {
-			I915_WRITE(CCID,
-				   i915_ggtt_offset(ce->state) |
-				   BIT(8) /* must be set! */ |
-				   CCID_EXTENDED_STATE_SAVE |
-				   CCID_EXTENDED_STATE_RESTORE |
-				   CCID_EN);
-		}
-
-		ppgtt = request->gem_context->ppgtt ?: engine->i915->mm.aliasing_ppgtt;
-		if (ppgtt) {
-			u32 pd_offset = ppgtt->pd.base.ggtt_offset << 10;
-
-			I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
-			I915_WRITE(RING_PP_DIR_BASE(engine), pd_offset);
-
-			/* Wait for the PD reload to complete */
-			if (intel_wait_for_register(dev_priv,
-						    RING_PP_DIR_BASE(engine),
-						    BIT(0), 0,
-						    10))
-				DRM_ERROR("Wait for reload of ppgtt page-directory timed out\n");
-
-			ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
-		}
-
+	if (rq) {
 		/* If the rq hung, jump to its breadcrumb and skip the batch */
-		if (request->fence.error == -EIO)
-			request->ring->head = request->postfix;
-	} else {
-		engine->legacy_active_context = NULL;
-		engine->legacy_active_ppgtt = NULL;
+		rq->ring->head = intel_ring_wrap(rq->ring, rq->head);
+		if (rq->fence.error == -EIO)
+			skip_request(rq);
 	}
 }
 
@@ -1448,6 +1428,29 @@ void intel_legacy_submission_resume(struct drm_i915_private *dev_priv)
 		intel_ring_reset(engine->buffer, 0);
 }
 
+static int load_pd_dir(struct i915_request *rq,
+		       const struct i915_hw_ppgtt *ppgtt)
+{
+	const struct intel_engine_cs * const engine = rq->engine;
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine));
+	*cs++ = PP_DIR_DCLV_2G;
+
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
+	*cs++ = ppgtt->pd.base.ggtt_offset << 10;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static inline int mi_set_context(struct i915_request *rq, u32 flags)
 {
 	struct drm_i915_private *i915 = rq->i915;
@@ -1586,34 +1589,28 @@ static int remap_l3(struct i915_request *rq, int slice)
 static int switch_context(struct i915_request *rq)
 {
 	struct intel_engine_cs *engine = rq->engine;
-	struct i915_gem_context *to_ctx = rq->gem_context;
-	struct i915_hw_ppgtt *to_mm =
-		to_ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
-	struct i915_gem_context *from_ctx = engine->legacy_active_context;
-	struct i915_hw_ppgtt *from_mm = engine->legacy_active_ppgtt;
+	struct i915_gem_context *ctx = rq->gem_context;
+	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
+	unsigned int unwind_mm = 0;
 	u32 hw_flags = 0;
 	int ret, i;
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
 	GEM_BUG_ON(HAS_EXECLISTS(rq->i915));
 
-	if (to_mm != from_mm ||
-	    (to_mm && intel_engine_flag(engine) & to_mm->pd_dirty_rings)) {
-		trace_switch_mm(engine, to_ctx);
-		ret = to_mm->switch_mm(to_mm, rq);
+	if (ppgtt) {
+		ret = load_pd_dir(rq, ppgtt);
 		if (ret)
 			goto err;
 
-		to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
-		engine->legacy_active_ppgtt = to_mm;
-
-		if (to_ctx == from_ctx) {
+		if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
+			unwind_mm = intel_engine_flag(engine);
+			ppgtt->pd_dirty_rings &= ~unwind_mm;
 			hw_flags = MI_FORCE_RESTORE;
-			from_ctx = NULL;
 		}
 	}
 
-	if (rq->hw_context->state && to_ctx != from_ctx) {
+	if (rq->hw_context->state) {
 		GEM_BUG_ON(engine->id != RCS);
 
 		/*
@@ -1623,35 +1620,32 @@ static int switch_context(struct i915_request *rq)
 		 * as nothing actually executes using the kernel context; it
 		 * is purely used for flushing user contexts.
 		 */
-		if (i915_gem_context_is_kernel(to_ctx))
+		if (i915_gem_context_is_kernel(ctx))
 			hw_flags = MI_RESTORE_INHIBIT;
 
 		ret = mi_set_context(rq, hw_flags);
 		if (ret)
 			goto err_mm;
-
-		engine->legacy_active_context = to_ctx;
 	}
 
-	if (to_ctx->remap_slice) {
+	if (ctx->remap_slice) {
 		for (i = 0; i < MAX_L3_SLICES; i++) {
-			if (!(to_ctx->remap_slice & BIT(i)))
+			if (!(ctx->remap_slice & BIT(i)))
 				continue;
 
 			ret = remap_l3(rq, i);
 			if (ret)
-				goto err_ctx;
+				goto err_mm;
 		}
 
-		to_ctx->remap_slice = 0;
+		ctx->remap_slice = 0;
 	}
 
 	return 0;
 
-err_ctx:
-	engine->legacy_active_context = from_ctx;
 err_mm:
-	engine->legacy_active_ppgtt = from_mm;
+	if (unwind_mm)
+		ppgtt->pd_dirty_rings |= unwind_mm;
 err:
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index acef385c4c80..b44c67849749 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -557,15 +557,6 @@ struct intel_engine_cs {
 	 */
 	struct intel_context *last_retired_context;
 
-	/* We track the current MI_SET_CONTEXT in order to eliminate
-	 * redudant context switches. This presumes that requests are not
-	 * reordered! Or when they are the tracking is updated along with
-	 * the emission of individual requests into the legacy command
-	 * stream (ring).
-	 */
-	struct i915_gem_context *legacy_active_context;
-	struct i915_hw_ppgtt *legacy_active_ppgtt;
-
 	/* status_notifier: list of callbacks for context-switch changes */
 	struct atomic_notifier_head context_status_notifier;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev3)
  2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
                   ` (21 preceding siblings ...)
  2018-06-08 16:23 ` ✗ Fi.CI.IGT: failure " Patchwork
@ 2018-06-08 18:03 ` Patchwork
  22 siblings, 0 replies; 38+ messages in thread
From: Patchwork @ 2018-06-08 18:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev3)
URL   : https://patchwork.freedesktop.org/series/44486/
State : failure

== Summary ==

Applying: drm/i915: Apply batch location restrictions before pinning
Applying: drm/i915/ringbuffer: Brute force context restore
Applying: drm/i915/ringbuffer: Fix context restore upon reset
error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_gem_gtt.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/i915_gem_gtt.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/i915_gem_gtt.c
Patch failed at 0003 drm/i915/ringbuffer: Fix context restore upon reset
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] drm/i915/gtt: Make gen6 page directories evictable
  2018-06-06  6:51 ` Chris Wilson
@ 2018-06-06 23:07   ` Matthew Auld
  0 siblings, 0 replies; 38+ messages in thread
From: Matthew Auld @ 2018-06-06 23:07 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On 6 June 2018 at 07:51, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Currently all page directories are bound at creation using an
> unevictable node in the GGTT. This severely limits us as we cannot
> remove any inactive ppgtt for new contexts, or under aperture pressure.
> To fix this we need to make the page directory into a first class and
> unbindable vma. Hence, the creation of a custom vma to wrap the page
> directory as opposed to a GEM object.
>
> In this patch, we leave the page directories pinned upon creation.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 261 ++++++++++++++++------------
>  drivers/gpu/drm/i915/i915_gem_gtt.h |   2 +-
>  drivers/gpu/drm/i915/i915_vma.h     |   7 +
>  3 files changed, 159 insertions(+), 111 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index ea27f799101f..60b5966360b6 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1640,50 +1640,55 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
>  {
>         struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
>         struct i915_address_space *vm = &base->vm;
> -       struct i915_page_table *unused;
> -       gen6_pte_t scratch_pte;
> -       u32 pd_entry, pte, pde;
> -
> -       scratch_pte = vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
> -
> -       gen6_for_all_pdes(unused, &base->pd, pde) {
> -               u32 expected;
> -               gen6_pte_t *pt_vaddr;
> -               const dma_addr_t pt_addr = px_dma(base->pd.page_table[pde]);
> -               pd_entry = readl(ppgtt->pd_addr + pde);
> -               expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
> -
> -               if (pd_entry != expected)
> -                       seq_printf(m, "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
> -                                  pde,
> -                                  pd_entry,
> -                                  expected);
> -               seq_printf(m, "\tPDE: %x\n", pd_entry);
> -
> -               pt_vaddr = kmap_atomic_px(base->pd.page_table[pde]);
> -
> -               for (pte = 0; pte < GEN6_PTES; pte+=4) {
> -                       unsigned long va =
> -                               (pde * PAGE_SIZE * GEN6_PTES) +
> -                               (pte * PAGE_SIZE);
> +       const gen6_pte_t scratch_pte =
> +               vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
> +       struct i915_page_table *pt;
> +       u32 pte, pde;
> +
> +       gen6_for_all_pdes(pt, &base->pd, pde) {
> +               gen6_pte_t *vaddr;
> +
> +               if (pt == base->vm.scratch_pt)
> +                       continue;
> +
> +               if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND)) {
> +                       u32 expected =
> +                               GEN6_PDE_ADDR_ENCODE(px_dma(pt)) |
> +                               GEN6_PDE_VALID;
> +                       u32 pd_entry = readl(ppgtt->pd_addr + pde);
> +
> +                       if (pd_entry != expected)
> +                               seq_printf(m,
> +                                          "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
> +                                          pde,
> +                                          pd_entry,
> +                                          expected);
> +
> +                       seq_printf(m, "\tPDE: %x\n", pd_entry);
> +               }
> +
> +               vaddr = kmap_atomic_px(base->pd.page_table[pde]);
> +               for (pte = 0; pte < GEN6_PTES; pte += 4) {
>                         int i;
> -                       bool found = false;
> +
>                         for (i = 0; i < 4; i++)
> -                               if (pt_vaddr[pte + i] != scratch_pte)
> -                                       found = true;
> -                       if (!found)
> +                               if (vaddr[pte + i] != scratch_pte)
> +                                       break;
> +                       if (i == 4)
>                                 continue;
>
> -                       seq_printf(m, "\t\t0x%lx [%03d,%04d]: =", va, pde, pte);
> +                       seq_printf(m, "\t\t(%03d, %04d) %08lx: ",
> +                                  pde, pte,
> +                                  (pde * GEN6_PTES + pte) * PAGE_SIZE);
>                         for (i = 0; i < 4; i++) {
> -                               if (pt_vaddr[pte + i] != scratch_pte)
> -                                       seq_printf(m, " %08x", pt_vaddr[pte + i]);
> +                               if (vaddr[pte + i] != scratch_pte)
> +                                       seq_printf(m, " %08x", vaddr[pte + i]);
>                                 else
> -                                       seq_puts(m, "  SCRATCH ");
> +                                       seq_puts(m, "  SCRATCH");
>                         }
>                         seq_puts(m, "\n");
>                 }
> -               kunmap_atomic(pt_vaddr);
> +               kunmap_atomic(vaddr);
>         }
>  }
>
> @@ -1697,22 +1702,6 @@ static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt,
>                   ppgtt->pd_addr + pde);
>  }
>
> -/* Write all the page tables found in the ppgtt structure to incrementing page
> - * directories. */
> -static void gen6_write_page_range(struct i915_hw_ppgtt *base,
> -                                 u32 start, u32 length)
> -{
> -       struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
> -       struct i915_page_table *pt;
> -       unsigned int pde;
> -
> -       gen6_for_each_pde(pt, &base->pd, start, length, pde)
> -               gen6_write_pde(ppgtt, pde, pt);
> -
> -       mark_tlbs_dirty(base);
> -       gen6_ggtt_invalidate(base->vm.i915);
> -}
> -
>  static inline u32 get_pd_offset(struct gen6_hw_ppgtt *ppgtt)
>  {
>         GEM_BUG_ON(ppgtt->base.pd.base.ggtt_offset & 0x3f);
> @@ -1919,8 +1908,12 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>
>                         gen6_initialize_pt(vm, pt);
>                         ppgtt->base.pd.page_table[pde] = pt;
> -                       gen6_write_pde(ppgtt, pde, pt);
> -                       flush = true;
> +
> +                       if (i915_vma_is_bound(ppgtt->vma,
> +                                             I915_VMA_GLOBAL_BIND)) {
> +                               gen6_write_pde(ppgtt, pde, pt);
> +                               flush = true;
> +                       }
>                 }
>         }
>
> @@ -1936,8 +1929,11 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
>         return -ENOMEM;
>  }
>
> -static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
> +static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt)
>  {
> +       struct i915_address_space * const vm = &ppgtt->base.vm;
> +       struct i915_page_table *unused;
> +       u32 pde;
>         int ret;
>
>         ret = setup_scratch_page(vm, __GFP_HIGHMEM);
> @@ -1951,6 +1947,8 @@ static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
>         }
>
>         gen6_initialize_pt(vm, vm->scratch_pt);
> +       gen6_for_all_pdes(unused, &ppgtt->base.pd, pde)
> +               ppgtt->base.pd.page_table[pde] = vm->scratch_pt;
>
>         return 0;
>  }
> @@ -1975,52 +1973,110 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
>  {
>         struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
>
> -       drm_mm_remove_node(&ppgtt->node);
> +       i915_vma_unpin(ppgtt->vma);
> +       i915_vma_destroy(ppgtt->vma);
>
>         gen6_ppgtt_free_pd(ppgtt);
>         gen6_ppgtt_free_scratch(vm);
>  }
>
> -static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
> +static int pd_vma_set_pages(struct i915_vma *vma)
>  {
> -       struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
> -       struct i915_ggtt *ggtt = &dev_priv->ggtt;
> -       int err;
> +       vma->pages = ERR_PTR(-ENODEV);
> +       return 0;
> +}
>
> -       /* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
> -        * allocator works in address space sizes, so it's multiplied by page
> -        * size. We allocate at the top of the GTT to avoid fragmentation.
> -        */
> -       BUG_ON(!drm_mm_initialized(&ggtt->vm.mm));
> +static void pd_vma_clear_pages(struct i915_vma *vma)
> +{
> +       GEM_BUG_ON(!vma->pages);
>
> -       err = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
> -                                 GEN6_PD_SIZE, GEN6_PD_ALIGN,
> -                                 I915_COLOR_UNEVICTABLE,
> -                                 0, ggtt->vm.total,
> -                                 PIN_HIGH);
> -       if (err)
> -               return err;
> +       vma->pages = NULL;
> +}
>
> -       if (ppgtt->node.start < ggtt->mappable_end)
> -               DRM_DEBUG("Forced to use aperture for PDEs\n");
> +static int pd_vma_bind(struct i915_vma *vma,
> +                      enum i915_cache_level cache_level,
> +                      u32 unused)
> +{
> +       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
> +       struct gen6_hw_ppgtt *ppgtt = vma->private;
> +       u32 ggtt_offset = i915_ggtt_offset(vma) / PAGE_SIZE;
> +       struct i915_page_table *pt;
> +       unsigned int pde;
>
> -       ppgtt->base.pd.base.ggtt_offset =
> -               ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
> +       ppgtt->base.pd.base.ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
> +       ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
>
> -       ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
> -               ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
> +       gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
> +               gen6_write_pde(ppgtt, pde, pt);
> +
> +       mark_tlbs_dirty(&ppgtt->base);
> +       gen6_ggtt_invalidate(ppgtt->base.vm.i915);
>
>         return 0;
>  }
>
> -static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
> -                                 u64 start, u64 length)
> +static void pd_vma_unbind(struct i915_vma *vma)
>  {
> -       struct i915_page_table *unused;
> -       u32 pde;
> +}
> +
> +static const struct i915_vma_ops pd_vma_ops = {
> +       .set_pages = pd_vma_set_pages,
> +       .clear_pages = pd_vma_clear_pages,
> +       .bind_vma = pd_vma_bind,
> +       .unbind_vma = pd_vma_unbind,
> +};
> +
> +static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
> +{
> +       struct drm_i915_private *i915 = ppgtt->base.vm.i915;
> +       struct i915_ggtt *ggtt = &i915->ggtt;
> +       struct i915_vma *vma;
> +       int i;
>
> -       gen6_for_each_pde(unused, &ppgtt->base.pd, start, length, pde)
> -               ppgtt->base.pd.page_table[pde] = ppgtt->base.vm.scratch_pt;
> +       GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
> +       GEM_BUG_ON(size > ggtt->vm.total);
> +
> +       vma = kmem_cache_zalloc(i915->vmas, GFP_KERNEL);
> +       if (!vma)
> +               return ERR_PTR(-ENOMEM);
> +
> +       for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
> +               init_request_active(&vma->last_read[i], NULL);
> +       init_request_active(&vma->last_fence, NULL);
> +
> +       vma->vm = &ggtt->vm;
> +       vma->ops = &pd_vma_ops;
> +       vma->private = ppgtt;
> +
> +       vma->obj = NULL;
> +       vma->resv = NULL;
> +       vma->size = size;
> +       vma->display_alignment = I915_GTT_MIN_ALIGNMENT;
> +
> +       vma->fence_size = size;
> +       vma->fence_alignment = I915_GTT_MIN_ALIGNMENT;
> +
> +       vma->flags |= I915_VMA_GGTT;
> +       vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
> +
> +       INIT_LIST_HEAD(&vma->obj_link);
> +       list_add(&vma->vm_link, &vma->vm->unbound_list);
> +
> +       return vma;
> +}
> +
> +static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
> +{
> +       struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
> +
> +       /*
> +        * PPGTT PDEs reside in the GGTT and consists of 512 entries. The
> +        * allocator works in address space sizes, so it's multiplied by page
> +        * size. We allocate at the top of the GTT to avoid fragmentation.
> +        */
> +       return i915_vma_pin(ppgtt->vma,
> +                           0, GEN6_PD_ALIGN,
> +                           PIN_GLOBAL | PIN_HIGH);
>  }
>
>  static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
> @@ -2058,24 +2114,25 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
>         else
>                 BUG();
>
> -       err = gen6_ppgtt_init_scratch(&ppgtt->base.vm);
> +       err = gen6_ppgtt_init_scratch(ppgtt);
>         if (err)
>                 goto err_free;
>
> -       err = gen6_ppgtt_allocate_page_directories(ppgtt);
> -       if (err)
> +       ppgtt->vma = pd_vma_create(ppgtt, GEN6_PD_SIZE);
> +       if (IS_ERR(ppgtt->vma))

err = PTR_ERR()

Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] drm/i915/gtt: Make gen6 page directories evictable
  2018-06-06  6:27 [PATCH 08/17] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
  2018-06-06  6:45 ` [PATCH] " Chris Wilson
@ 2018-06-06  6:51 ` Chris Wilson
  2018-06-06 23:07   ` Matthew Auld
  1 sibling, 1 reply; 38+ messages in thread
From: Chris Wilson @ 2018-06-06  6:51 UTC (permalink / raw)
  To: intel-gfx

Currently all page directories are bound at creation using an
unevictable node in the GGTT. This severely limits us as we cannot
remove any inactive ppgtt for new contexts, or under aperture pressure.
To fix this we need to make the page directory into a first class and
unbindable vma. Hence, the creation of a custom vma to wrap the page
directory as opposed to a GEM object.

In this patch, we leave the page directories pinned upon creation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 261 ++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   2 +-
 drivers/gpu/drm/i915/i915_vma.h     |   7 +
 3 files changed, 159 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ea27f799101f..60b5966360b6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1640,50 +1640,55 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 	struct i915_address_space *vm = &base->vm;
-	struct i915_page_table *unused;
-	gen6_pte_t scratch_pte;
-	u32 pd_entry, pte, pde;
-
-	scratch_pte = vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
-
-	gen6_for_all_pdes(unused, &base->pd, pde) {
-		u32 expected;
-		gen6_pte_t *pt_vaddr;
-		const dma_addr_t pt_addr = px_dma(base->pd.page_table[pde]);
-		pd_entry = readl(ppgtt->pd_addr + pde);
-		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
-
-		if (pd_entry != expected)
-			seq_printf(m, "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
-				   pde,
-				   pd_entry,
-				   expected);
-		seq_printf(m, "\tPDE: %x\n", pd_entry);
-
-		pt_vaddr = kmap_atomic_px(base->pd.page_table[pde]);
-
-		for (pte = 0; pte < GEN6_PTES; pte+=4) {
-			unsigned long va =
-				(pde * PAGE_SIZE * GEN6_PTES) +
-				(pte * PAGE_SIZE);
+	const gen6_pte_t scratch_pte =
+		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	struct i915_page_table *pt;
+	u32 pte, pde;
+
+	gen6_for_all_pdes(pt, &base->pd, pde) {
+		gen6_pte_t *vaddr;
+
+		if (pt == base->vm.scratch_pt)
+			continue;
+
+		if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND)) {
+			u32 expected =
+				GEN6_PDE_ADDR_ENCODE(px_dma(pt)) |
+				GEN6_PDE_VALID;
+			u32 pd_entry = readl(ppgtt->pd_addr + pde);
+
+			if (pd_entry != expected)
+				seq_printf(m,
+					   "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
+					   pde,
+					   pd_entry,
+					   expected);
+
+			seq_printf(m, "\tPDE: %x\n", pd_entry);
+		}
+
+		vaddr = kmap_atomic_px(base->pd.page_table[pde]);
+		for (pte = 0; pte < GEN6_PTES; pte += 4) {
 			int i;
-			bool found = false;
+
 			for (i = 0; i < 4; i++)
-				if (pt_vaddr[pte + i] != scratch_pte)
-					found = true;
-			if (!found)
+				if (vaddr[pte + i] != scratch_pte)
+					break;
+			if (i == 4)
 				continue;
 
-			seq_printf(m, "\t\t0x%lx [%03d,%04d]: =", va, pde, pte);
+			seq_printf(m, "\t\t(%03d, %04d) %08lx: ",
+				   pde, pte,
+				   (pde * GEN6_PTES + pte) * PAGE_SIZE);
 			for (i = 0; i < 4; i++) {
-				if (pt_vaddr[pte + i] != scratch_pte)
-					seq_printf(m, " %08x", pt_vaddr[pte + i]);
+				if (vaddr[pte + i] != scratch_pte)
+					seq_printf(m, " %08x", vaddr[pte + i]);
 				else
-					seq_puts(m, "  SCRATCH ");
+					seq_puts(m, "  SCRATCH");
 			}
 			seq_puts(m, "\n");
 		}
-		kunmap_atomic(pt_vaddr);
+		kunmap_atomic(vaddr);
 	}
 }
 
@@ -1697,22 +1702,6 @@ static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt,
 		  ppgtt->pd_addr + pde);
 }
 
-/* Write all the page tables found in the ppgtt structure to incrementing page
- * directories. */
-static void gen6_write_page_range(struct i915_hw_ppgtt *base,
-				  u32 start, u32 length)
-{
-	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
-	struct i915_page_table *pt;
-	unsigned int pde;
-
-	gen6_for_each_pde(pt, &base->pd, start, length, pde)
-		gen6_write_pde(ppgtt, pde, pt);
-
-	mark_tlbs_dirty(base);
-	gen6_ggtt_invalidate(base->vm.i915);
-}
-
 static inline u32 get_pd_offset(struct gen6_hw_ppgtt *ppgtt)
 {
 	GEM_BUG_ON(ppgtt->base.pd.base.ggtt_offset & 0x3f);
@@ -1919,8 +1908,12 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 			gen6_initialize_pt(vm, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
-			gen6_write_pde(ppgtt, pde, pt);
-			flush = true;
+
+			if (i915_vma_is_bound(ppgtt->vma,
+					      I915_VMA_GLOBAL_BIND)) {
+				gen6_write_pde(ppgtt, pde, pt);
+				flush = true;
+			}
 		}
 	}
 
@@ -1936,8 +1929,11 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	return -ENOMEM;
 }
 
-static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
+static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt)
 {
+	struct i915_address_space * const vm = &ppgtt->base.vm;
+	struct i915_page_table *unused;
+	u32 pde;
 	int ret;
 
 	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
@@ -1951,6 +1947,8 @@ static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
 	}
 
 	gen6_initialize_pt(vm, vm->scratch_pt);
+	gen6_for_all_pdes(unused, &ppgtt->base.pd, pde)
+		ppgtt->base.pd.page_table[pde] = vm->scratch_pt;
 
 	return 0;
 }
@@ -1975,52 +1973,110 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 
-	drm_mm_remove_node(&ppgtt->node);
+	i915_vma_unpin(ppgtt->vma);
+	i915_vma_destroy(ppgtt->vma);
 
 	gen6_ppgtt_free_pd(ppgtt);
 	gen6_ppgtt_free_scratch(vm);
 }
 
-static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
+static int pd_vma_set_pages(struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	int err;
+	vma->pages = ERR_PTR(-ENODEV);
+	return 0;
+}
 
-	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
-	 * allocator works in address space sizes, so it's multiplied by page
-	 * size. We allocate at the top of the GTT to avoid fragmentation.
-	 */
-	BUG_ON(!drm_mm_initialized(&ggtt->vm.mm));
+static void pd_vma_clear_pages(struct i915_vma *vma)
+{
+	GEM_BUG_ON(!vma->pages);
 
-	err = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
-				  GEN6_PD_SIZE, GEN6_PD_ALIGN,
-				  I915_COLOR_UNEVICTABLE,
-				  0, ggtt->vm.total,
-				  PIN_HIGH);
-	if (err)
-		return err;
+	vma->pages = NULL;
+}
 
-	if (ppgtt->node.start < ggtt->mappable_end)
-		DRM_DEBUG("Forced to use aperture for PDEs\n");
+static int pd_vma_bind(struct i915_vma *vma,
+		       enum i915_cache_level cache_level,
+		       u32 unused)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct gen6_hw_ppgtt *ppgtt = vma->private;
+	u32 ggtt_offset = i915_ggtt_offset(vma) / PAGE_SIZE;
+	struct i915_page_table *pt;
+	unsigned int pde;
 
-	ppgtt->base.pd.base.ggtt_offset =
-		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
+	ppgtt->base.pd.base.ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
+	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
 
-	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
-		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
+		gen6_write_pde(ppgtt, pde, pt);
+
+	mark_tlbs_dirty(&ppgtt->base);
+	gen6_ggtt_invalidate(ppgtt->base.vm.i915);
 
 	return 0;
 }
 
-static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
-				  u64 start, u64 length)
+static void pd_vma_unbind(struct i915_vma *vma)
 {
-	struct i915_page_table *unused;
-	u32 pde;
+}
+
+static const struct i915_vma_ops pd_vma_ops = {
+	.set_pages = pd_vma_set_pages,
+	.clear_pages = pd_vma_clear_pages,
+	.bind_vma = pd_vma_bind,
+	.unbind_vma = pd_vma_unbind,
+};
+
+static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
+{
+	struct drm_i915_private *i915 = ppgtt->base.vm.i915;
+	struct i915_ggtt *ggtt = &i915->ggtt;
+	struct i915_vma *vma;
+	int i;
 
-	gen6_for_each_pde(unused, &ppgtt->base.pd, start, length, pde)
-		ppgtt->base.pd.page_table[pde] = ppgtt->base.vm.scratch_pt;
+	GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
+	GEM_BUG_ON(size > ggtt->vm.total);
+
+	vma = kmem_cache_zalloc(i915->vmas, GFP_KERNEL);
+	if (!vma)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
+		init_request_active(&vma->last_read[i], NULL);
+	init_request_active(&vma->last_fence, NULL);
+
+	vma->vm = &ggtt->vm;
+	vma->ops = &pd_vma_ops;
+	vma->private = ppgtt;
+
+	vma->obj = NULL;
+	vma->resv = NULL;
+	vma->size = size;
+	vma->display_alignment = I915_GTT_MIN_ALIGNMENT;
+
+	vma->fence_size = size;
+	vma->fence_alignment = I915_GTT_MIN_ALIGNMENT;
+
+	vma->flags |= I915_VMA_GGTT;
+	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
+
+	INIT_LIST_HEAD(&vma->obj_link);
+	list_add(&vma->vm_link, &vma->vm->unbound_list);
+
+	return vma;
+}
+
+static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+
+	/*
+	 * PPGTT PDEs reside in the GGTT and consists of 512 entries. The
+	 * allocator works in address space sizes, so it's multiplied by page
+	 * size. We allocate at the top of the GTT to avoid fragmentation.
+	 */
+	return i915_vma_pin(ppgtt->vma,
+			    0, GEN6_PD_ALIGN,
+			    PIN_GLOBAL | PIN_HIGH);
 }
 
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
@@ -2058,24 +2114,25 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	else
 		BUG();
 
-	err = gen6_ppgtt_init_scratch(&ppgtt->base.vm);
+	err = gen6_ppgtt_init_scratch(ppgtt);
 	if (err)
 		goto err_free;
 
-	err = gen6_ppgtt_allocate_page_directories(ppgtt);
-	if (err)
+	ppgtt->vma = pd_vma_create(ppgtt, GEN6_PD_SIZE);
+	if (IS_ERR(ppgtt->vma))
 		goto err_scratch;
 
-	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
-	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
-
 	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
+	if (err)
+		goto err_vma;
+
+	err = gen6_ppgtt_pin(&ppgtt->base);
 	if (err)
 		goto err_pd;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
-			 ppgtt->node.size >> 20,
-			 ppgtt->node.start / PAGE_SIZE);
+			 ppgtt->vma->node.size >> 20,
+			 ppgtt->vma->node.start / PAGE_SIZE);
 
 	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
 			 ppgtt->base.pd.base.ggtt_offset << 10);
@@ -2084,6 +2141,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 err_pd:
 	gen6_ppgtt_free_pd(ppgtt);
+err_vma:
+	i915_vma_destroy(ppgtt->vma);
 err_scratch:
 	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
@@ -3602,6 +3661,7 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	}
 
 	ggtt->vm.closed = false;
+	i915_ggtt_invalidate(dev_priv);
 
 	if (INTEL_GEN(dev_priv) >= 8) {
 		struct intel_ppat *ppat = &dev_priv->ppat;
@@ -3610,25 +3670,6 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 		dev_priv->ppat.update_hw(dev_priv);
 		return;
 	}
-
-	if (USES_PPGTT(dev_priv)) {
-		struct i915_address_space *vm;
-
-		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
-			struct i915_hw_ppgtt *ppgtt;
-
-			if (i915_is_ggtt(vm))
-				ppgtt = dev_priv->mm.aliasing_ppgtt;
-			else
-				ppgtt = i915_vm_to_ppgtt(vm);
-			if (!ppgtt)
-				continue;
-
-			gen6_write_page_range(ppgtt, 0, ppgtt->vm.total);
-		}
-	}
-
-	i915_ggtt_invalidate(dev_priv);
 }
 
 static struct scatterlist *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 199d6f47a557..c2f270c90bea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -410,7 +410,7 @@ struct i915_hw_ppgtt {
 struct gen6_hw_ppgtt {
 	struct i915_hw_ppgtt base;
 
-	struct drm_mm_node node;
+	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
 
 	int (*switch_mm)(struct gen6_hw_ppgtt *ppgtt, struct i915_request *rq);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 4321476a6a32..66a228931517 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -54,6 +54,7 @@ struct i915_vma {
 	struct reservation_object *resv; /** Alias of obj->resv */
 	struct sg_table *pages;
 	void __iomem *iomap;
+	void *private; /* owned by creator */
 	u64 size;
 	u64 display_alignment;
 	struct i915_page_sizes page_sizes;
@@ -340,6 +341,12 @@ static inline void i915_vma_unpin(struct i915_vma *vma)
 	__i915_vma_unpin(vma);
 }
 
+static inline bool i915_vma_is_bound(const struct i915_vma *vma,
+				     unsigned int where)
+{
+	return vma->flags & where;
+}
+
 /**
  * i915_vma_pin_iomap - calls ioremap_wc to map the GGTT VMA via the aperture
  * @vma: VMA to iomap
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH] drm/i915/gtt: Make gen6 page directories evictable
  2018-06-06  6:27 [PATCH 08/17] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
@ 2018-06-06  6:45 ` Chris Wilson
  2018-06-06  6:51 ` Chris Wilson
  1 sibling, 0 replies; 38+ messages in thread
From: Chris Wilson @ 2018-06-06  6:45 UTC (permalink / raw)
  To: intel-gfx

Currently all page directories are bound at creation using an
unevictable node in the GGTT. This severely limits us as we cannot
remove any inactive ppgtt for new contexts, or under aperture pressure.
To fix this we need to make the page directory into a first class and
unbindable vma. Hence, the creation of a custom vma to wrap the page
directory as opposed to a GEM object.

In this patch, we leave the page directories pinned upon creation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 276 ++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   2 +-
 drivers/gpu/drm/i915/i915_vma.h     |   7 +
 3 files changed, 164 insertions(+), 121 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index ea27f799101f..1487d99ce406 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1640,79 +1640,58 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 	struct i915_address_space *vm = &base->vm;
-	struct i915_page_table *unused;
-	gen6_pte_t scratch_pte;
-	u32 pd_entry, pte, pde;
-
-	scratch_pte = vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
-
-	gen6_for_all_pdes(unused, &base->pd, pde) {
-		u32 expected;
-		gen6_pte_t *pt_vaddr;
-		const dma_addr_t pt_addr = px_dma(base->pd.page_table[pde]);
-		pd_entry = readl(ppgtt->pd_addr + pde);
-		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
-
-		if (pd_entry != expected)
-			seq_printf(m, "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
-				   pde,
-				   pd_entry,
-				   expected);
-		seq_printf(m, "\tPDE: %x\n", pd_entry);
-
-		pt_vaddr = kmap_atomic_px(base->pd.page_table[pde]);
-
-		for (pte = 0; pte < GEN6_PTES; pte+=4) {
-			unsigned long va =
-				(pde * PAGE_SIZE * GEN6_PTES) +
-				(pte * PAGE_SIZE);
+	const gen6_pte_t scratch_pte =
+		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	struct i915_page_table *pt;
+	u32 pte, pde;
+
+	gen6_for_all_pdes(pt, &base->pd, pde) {
+		gen6_pte_t *vaddr;
+
+		if (pt == base->vm.scratch_pt)
+			continue;
+
+		if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND)) {
+			u32 expected =
+				GEN6_PDE_ADDR_ENCODE(px_dma(pt)) |
+				GEN6_PDE_VALID;
+			u32 pd_entry = readl(ppgtt->pd_addr + pde);
+
+			if (pd_entry != expected)
+				seq_printf(m,
+					   "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
+					   pde,
+					   pd_entry,
+					   expected);
+
+			seq_printf(m, "\tPDE: %x\n", pd_entry);
+		}
+
+		vaddr = kmap_atomic_px(base->pd.page_table[pde]);
+		for (pte = 0; pte < GEN6_PTES; pte += 4) {
 			int i;
-			bool found = false;
+
 			for (i = 0; i < 4; i++)
-				if (pt_vaddr[pte + i] != scratch_pte)
-					found = true;
-			if (!found)
+				if (vaddr[pte + i] != scratch_pte)
+					break;
+			if (i == 4)
 				continue;
 
-			seq_printf(m, "\t\t0x%lx [%03d,%04d]: =", va, pde, pte);
+			seq_printf(m, "\t\t(%03d, %04d) %08lx: ",
+				   pde, pte,
+				   (pde * GEN6_PTES + pte) * PAGE_SIZE);
 			for (i = 0; i < 4; i++) {
-				if (pt_vaddr[pte + i] != scratch_pte)
-					seq_printf(m, " %08x", pt_vaddr[pte + i]);
+				if (vaddr[pte + i] != scratch_pte)
+					seq_printf(m, " %08x", vaddr[pte + i]);
 				else
-					seq_puts(m, "  SCRATCH ");
+					seq_puts(m, "  SCRATCH");
 			}
 			seq_puts(m, "\n");
 		}
-		kunmap_atomic(pt_vaddr);
+		kunmap_atomic(vaddr);
 	}
 }
 
-/* Write pde (index) from the page directory @pd to the page table @pt */
-static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt,
-				  const unsigned int pde,
-				  const struct i915_page_table *pt)
-{
-	/* Caller needs to make sure the write completes if necessary */
-	iowrite32(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
-		  ppgtt->pd_addr + pde);
-}
-
-/* Write all the page tables found in the ppgtt structure to incrementing page
- * directories. */
-static void gen6_write_page_range(struct i915_hw_ppgtt *base,
-				  u32 start, u32 length)
-{
-	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
-	struct i915_page_table *pt;
-	unsigned int pde;
-
-	gen6_for_each_pde(pt, &base->pd, start, length, pde)
-		gen6_write_pde(ppgtt, pde, pt);
-
-	mark_tlbs_dirty(base);
-	gen6_ggtt_invalidate(base->vm.i915);
-}
-
 static inline u32 get_pd_offset(struct gen6_hw_ppgtt *ppgtt)
 {
 	GEM_BUG_ON(ppgtt->base.pd.base.ggtt_offset & 0x3f);
@@ -1919,8 +1898,15 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 			gen6_initialize_pt(vm, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
-			gen6_write_pde(ppgtt, pde, pt);
-			flush = true;
+
+			if (i915_vma_is_bound(ppgtt->vma,
+					      I915_VMA_GLOBAL_BIND)) {
+				u32 addr = GEN6_PDE_ADDR_ENCODE(px_dma(pt));
+
+				iowrite32(addr | GEN6_PDE_VALID,
+					  ppgtt->pd_addr + pde);
+				flush = true;
+			}
 		}
 	}
 
@@ -1936,8 +1922,11 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	return -ENOMEM;
 }
 
-static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
+static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt)
 {
+	struct i915_address_space * const vm = &ppgtt->base.vm;
+	struct i915_page_table *unused;
+	u32 pde;
 	int ret;
 
 	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
@@ -1951,6 +1940,8 @@ static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
 	}
 
 	gen6_initialize_pt(vm, vm->scratch_pt);
+	gen6_for_all_pdes(unused, &ppgtt->base.pd, pde)
+		ppgtt->base.pd.page_table[pde] = vm->scratch_pt;
 
 	return 0;
 }
@@ -1975,52 +1966,112 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 
-	drm_mm_remove_node(&ppgtt->node);
+	i915_vma_unpin(ppgtt->vma);
+	i915_vma_destroy(ppgtt->vma);
 
 	gen6_ppgtt_free_pd(ppgtt);
 	gen6_ppgtt_free_scratch(vm);
 }
 
-static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
+static int pd_vma_set_pages(struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	int err;
+	vma->pages = ERR_PTR(-ENODEV);
+	return 0;
+}
 
-	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
-	 * allocator works in address space sizes, so it's multiplied by page
-	 * size. We allocate at the top of the GTT to avoid fragmentation.
-	 */
-	BUG_ON(!drm_mm_initialized(&ggtt->vm.mm));
+static void pd_vma_clear_pages(struct i915_vma *vma)
+{
+	GEM_BUG_ON(!vma->pages);
 
-	err = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
-				  GEN6_PD_SIZE, GEN6_PD_ALIGN,
-				  I915_COLOR_UNEVICTABLE,
-				  0, ggtt->vm.total,
-				  PIN_HIGH);
-	if (err)
-		return err;
+	vma->pages = NULL;
+}
 
-	if (ppgtt->node.start < ggtt->mappable_end)
-		DRM_DEBUG("Forced to use aperture for PDEs\n");
+static int pd_vma_bind(struct i915_vma *vma,
+		       enum i915_cache_level cache_level,
+		       u32 unused)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct gen6_hw_ppgtt *ppgtt = vma->private;
+	u32 ggtt_offset = i915_ggtt_offset(vma) / PAGE_SIZE;
+	struct i915_page_table *pt;
+	unsigned int pde;
 
-	ppgtt->base.pd.base.ggtt_offset =
-		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
+	ppgtt->base.pd.base.ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
+	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
 
-	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
-		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde) {
+		u32 val = GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID;
+
+		iowrite32(val, ppgtt->pd_addr + pde);
+	}
+	mark_tlbs_dirty(&ppgtt->base);
+	gen6_ggtt_invalidate(ppgtt->base.vm.i915);
 
 	return 0;
 }
 
-static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
-				  u64 start, u64 length)
+static void pd_vma_unbind(struct i915_vma *vma)
 {
-	struct i915_page_table *unused;
-	u32 pde;
+}
+
+static const struct i915_vma_ops pd_vma_ops = {
+	.set_pages = pd_vma_set_pages,
+	.clear_pages = pd_vma_clear_pages,
+	.bind_vma = pd_vma_bind,
+	.unbind_vma = pd_vma_unbind,
+};
+
+static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
+{
+	struct drm_i915_private *i915 = ppgtt->base.vm.i915;
+	struct i915_ggtt *ggtt = &i915->ggtt;
+	struct i915_vma *vma;
+	int i;
+
+	GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
+	GEM_BUG_ON(size > ggtt->vm.total);
+
+	vma = kmem_cache_zalloc(i915->vmas, GFP_KERNEL);
+	if (!vma)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
+		init_request_active(&vma->last_read[i], NULL);
+	init_request_active(&vma->last_fence, NULL);
+
+	vma->vm = &ggtt->vm;
+	vma->ops = &pd_vma_ops;
+	vma->private = ppgtt;
+
+	vma->obj = NULL;
+	vma->resv = NULL;
+	vma->size = size;
+	vma->display_alignment = I915_GTT_MIN_ALIGNMENT;
+
+	vma->fence_size = size;
+	vma->fence_alignment = I915_GTT_MIN_ALIGNMENT;
+
+	vma->flags |= I915_VMA_GGTT;
+	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
+
+	INIT_LIST_HEAD(&vma->obj_link);
+	list_add(&vma->vm_link, &vma->vm->unbound_list);
 
-	gen6_for_each_pde(unused, &ppgtt->base.pd, start, length, pde)
-		ppgtt->base.pd.page_table[pde] = ppgtt->base.vm.scratch_pt;
+	return vma;
+}
+
+static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+
+	/*
+	 * PPGTT PDEs reside in the GGTT and consists of 512 entries. The
+	 * allocator works in address space sizes, so it's multiplied by page
+	 * size. We allocate at the top of the GTT to avoid fragmentation.
+	 */
+	return i915_vma_pin(ppgtt->vma,
+			    0, GEN6_PD_ALIGN,
+			    PIN_GLOBAL | PIN_HIGH);
 }
 
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
@@ -2058,24 +2109,25 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	else
 		BUG();
 
-	err = gen6_ppgtt_init_scratch(&ppgtt->base.vm);
+	err = gen6_ppgtt_init_scratch(ppgtt);
 	if (err)
 		goto err_free;
 
-	err = gen6_ppgtt_allocate_page_directories(ppgtt);
-	if (err)
+	ppgtt->vma = pd_vma_create(ppgtt, GEN6_PD_SIZE);
+	if (IS_ERR(ppgtt->vma))
 		goto err_scratch;
 
-	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
-	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
-
 	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
+	if (err)
+		goto err_vma;
+
+	err = gen6_ppgtt_pin(&ppgtt->base);
 	if (err)
 		goto err_pd;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
-			 ppgtt->node.size >> 20,
-			 ppgtt->node.start / PAGE_SIZE);
+			 ppgtt->vma->node.size >> 20,
+			 ppgtt->vma->node.start / PAGE_SIZE);
 
 	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
 			 ppgtt->base.pd.base.ggtt_offset << 10);
@@ -2084,6 +2136,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 err_pd:
 	gen6_ppgtt_free_pd(ppgtt);
+err_vma:
+	i915_vma_destroy(ppgtt->vma);
 err_scratch:
 	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
@@ -3602,6 +3656,7 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	}
 
 	ggtt->vm.closed = false;
+	i915_ggtt_invalidate(dev_priv);
 
 	if (INTEL_GEN(dev_priv) >= 8) {
 		struct intel_ppat *ppat = &dev_priv->ppat;
@@ -3610,25 +3665,6 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 		dev_priv->ppat.update_hw(dev_priv);
 		return;
 	}
-
-	if (USES_PPGTT(dev_priv)) {
-		struct i915_address_space *vm;
-
-		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
-			struct i915_hw_ppgtt *ppgtt;
-
-			if (i915_is_ggtt(vm))
-				ppgtt = dev_priv->mm.aliasing_ppgtt;
-			else
-				ppgtt = i915_vm_to_ppgtt(vm);
-			if (!ppgtt)
-				continue;
-
-			gen6_write_page_range(ppgtt, 0, ppgtt->vm.total);
-		}
-	}
-
-	i915_ggtt_invalidate(dev_priv);
 }
 
 static struct scatterlist *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 199d6f47a557..c2f270c90bea 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -410,7 +410,7 @@ struct i915_hw_ppgtt {
 struct gen6_hw_ppgtt {
 	struct i915_hw_ppgtt base;
 
-	struct drm_mm_node node;
+	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
 
 	int (*switch_mm)(struct gen6_hw_ppgtt *ppgtt, struct i915_request *rq);
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 4321476a6a32..66a228931517 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -54,6 +54,7 @@ struct i915_vma {
 	struct reservation_object *resv; /** Alias of obj->resv */
 	struct sg_table *pages;
 	void __iomem *iomap;
+	void *private; /* owned by creator */
 	u64 size;
 	u64 display_alignment;
 	struct i915_page_sizes page_sizes;
@@ -340,6 +341,12 @@ static inline void i915_vma_unpin(struct i915_vma *vma)
 	__i915_vma_unpin(vma);
 }
 
+static inline bool i915_vma_is_bound(const struct i915_vma *vma,
+				     unsigned int where)
+{
+	return vma->flags & where;
+}
+
 /**
  * i915_vma_pin_iomap - calls ioremap_wc to map the GGTT VMA via the aperture
  * @vma: VMA to iomap
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-06-08 18:03 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-08 12:55 Haswell full-ppgtt, no really Chris Wilson
2018-06-08 12:55 ` [PATCH 01/18] drm/i915: Apply batch location restrictions before pinning Chris Wilson
2018-06-08 12:55 ` [PATCH 02/18] drm/i915/ringbuffer: Brute force context restore Chris Wilson
2018-06-08 13:52   ` Mika Kuoppala
2018-06-08 14:00     ` Chris Wilson
2018-06-08 12:55 ` [PATCH 03/18] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
2018-06-08 14:26   ` Chris Wilson
2018-06-08 17:26   ` [PATCH v2] " Chris Wilson
2018-06-08 12:55 ` [PATCH 04/18] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories Chris Wilson
2018-06-08 12:55 ` [PATCH 05/18] drm/i915/gtt: Subclass gen6_hw_ppgtt Chris Wilson
2018-06-08 12:55 ` [PATCH 06/18] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create Chris Wilson
2018-06-08 12:55 ` [PATCH 07/18] drm/i915/gtt: Reorder aliasing_ppgtt fini Chris Wilson
2018-06-08 12:55 ` [PATCH 08/18] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
2018-06-08 13:10   ` [PATCH] " Chris Wilson
2018-06-08 12:55 ` [PATCH 09/18] drm/i915/gtt: Only keep gen6 page directories pinned while active Chris Wilson
2018-06-08 12:55 ` [PATCH 10/18] drm/i915/gtt: Lazily allocate page directories for gen7 Chris Wilson
2018-06-08 14:37   ` Matthew Auld
2018-06-08 14:43     ` Chris Wilson
2018-06-08 12:55 ` [PATCH 11/18] drm/i915/gtt: Free unused page tables on unbind the context Chris Wilson
2018-06-08 12:55 ` [PATCH 12/18] drm/i915/gtt: Skip initializing PT with scratch if full Chris Wilson
2018-06-08 12:55 ` [PATCH 13/18] drm/i915/gtt: Cache the PTE encoding of the scratch page Chris Wilson
2018-06-08 12:55 ` [PATCH 14/18] drm/i915/gtt: Reduce a pair of runtime asserts Chris Wilson
2018-06-08 12:55 ` [PATCH 15/18] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt Chris Wilson
2018-06-08 12:56 ` [PATCH 16/18] drm/i915/gtt: Remove redundant hsw_mm_switch() Chris Wilson
2018-06-08 14:03   ` Mika Kuoppala
2018-06-08 12:56 ` [PATCH 17/18] drm/i915/gtt: Remove vgpu check for gen6 Chris Wilson
2018-06-08 14:06   ` Mika Kuoppala
2018-06-08 12:56 ` [PATCH 18/18] RFT drm/i915/gtt: Enable full-ppgtt by default everywhere Chris Wilson
2018-06-08 13:25 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev2) Patchwork
2018-06-08 13:30 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-06-08 13:41 ` ✓ Fi.CI.BAT: success " Patchwork
2018-06-08 16:23 ` ✗ Fi.CI.IGT: failure " Patchwork
2018-06-08 16:36   ` Chris Wilson
2018-06-08 17:09     ` Chris Wilson
2018-06-08 18:03 ` ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915: Apply batch location restrictions before pinning (rev3) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2018-06-06  6:27 [PATCH 08/17] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
2018-06-06  6:45 ` [PATCH] " Chris Wilson
2018-06-06  6:51 ` Chris Wilson
2018-06-06 23:07   ` Matthew Auld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.