All of lore.kernel.org
 help / color / mirror / Atom feed
* Haswell full-ppgtt
@ 2018-06-10 19:43 Chris Wilson
  2018-06-10 19:43 ` [PATCH 01/17] drm/i915: Apply batch location restrictions before pinning Chris Wilson
                   ` (20 more replies)
  0 siblings, 21 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

Found the magic barrier required to fix the last remaining stress test,
so as far as I can tell, this is ready to go: full-ppgtt for gen7,
ivb/hsw/vlv.
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/17] drm/i915: Apply batch location restrictions before pinning
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11  9:57   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore Chris Wilson
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

We special case the position of the batch within the GTT to prevent
negative self-relocation deltas from underflowing. However, that
restriction is being applied after a trial pin of the batch in its
current position. Thus we are not rejecting an invalid location if the
batch has been before, leading to an assertion if we happen to need to
rearrange the entire payload. In the worst case, this may cause a GPU
hang on gen7 or perhaps missing state.

References: https://bugs.freedesktop.org/show_bug.cgi?id=105720
Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Martin Peres <martin.peres@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 49 ++++++++++++----------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index eefd449502e2..2d2eb3075960 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -489,7 +489,9 @@ eb_validate_vma(struct i915_execbuffer *eb,
 }
 
 static int
-eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma)
+eb_add_vma(struct i915_execbuffer *eb,
+	   unsigned int i, unsigned batch_idx,
+	   struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry = &eb->exec[i];
 	int err;
@@ -522,6 +524,24 @@ eb_add_vma(struct i915_execbuffer *eb, unsigned int i, struct i915_vma *vma)
 	eb->flags[i] = entry->flags;
 	vma->exec_flags = &eb->flags[i];
 
+	/*
+	 * SNA is doing fancy tricks with compressing batch buffers, which leads
+	 * to negative relocation deltas. Usually that works out ok since the
+	 * relocate address is still positive, except when the batch is placed
+	 * very low in the GTT. Ensure this doesn't happen.
+	 *
+	 * Note that actual hangs have only been observed on gen7, but for
+	 * paranoia do it everywhere.
+	 */
+	if (i == batch_idx) {
+		if (!(eb->flags[i] & EXEC_OBJECT_PINNED))
+			eb->flags[i] |= __EXEC_OBJECT_NEEDS_BIAS;
+		if (eb->reloc_cache.has_fence)
+			eb->flags[i] |= EXEC_OBJECT_NEEDS_FENCE;
+
+		eb->batch = vma;
+	}
+
 	err = 0;
 	if (eb_pin_vma(eb, entry, vma)) {
 		if (entry->offset != vma->node.start) {
@@ -716,7 +736,7 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct radix_tree_root *handles_vma = &eb->ctx->handles_vma;
 	struct drm_i915_gem_object *obj;
-	unsigned int i;
+	unsigned int i, batch;
 	int err;
 
 	if (unlikely(i915_gem_context_is_closed(eb->ctx)))
@@ -728,6 +748,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 	INIT_LIST_HEAD(&eb->relocs);
 	INIT_LIST_HEAD(&eb->unbound);
 
+	batch = eb_batch_index(eb);
+
 	for (i = 0; i < eb->buffer_count; i++) {
 		u32 handle = eb->exec[i].handle;
 		struct i915_lut_handle *lut;
@@ -770,33 +792,16 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
 		lut->handle = handle;
 
 add_vma:
-		err = eb_add_vma(eb, i, vma);
+		err = eb_add_vma(eb, i, batch, vma);
 		if (unlikely(err))
 			goto err_vma;
 
 		GEM_BUG_ON(vma != eb->vma[i]);
 		GEM_BUG_ON(vma->exec_flags != &eb->flags[i]);
+		GEM_BUG_ON(drm_mm_node_allocated(&vma->node) &&
+			   eb_vma_misplaced(&eb->exec[i], vma, eb->flags[i]));
 	}
 
-	/* take note of the batch buffer before we might reorder the lists */
-	i = eb_batch_index(eb);
-	eb->batch = eb->vma[i];
-	GEM_BUG_ON(eb->batch->exec_flags != &eb->flags[i]);
-
-	/*
-	 * SNA is doing fancy tricks with compressing batch buffers, which leads
-	 * to negative relocation deltas. Usually that works out ok since the
-	 * relocate address is still positive, except when the batch is placed
-	 * very low in the GTT. Ensure this doesn't happen.
-	 *
-	 * Note that actual hangs have only been observed on gen7, but for
-	 * paranoia do it everywhere.
-	 */
-	if (!(eb->flags[i] & EXEC_OBJECT_PINNED))
-		eb->flags[i] |= __EXEC_OBJECT_NEEDS_BIAS;
-	if (eb->reloc_cache.has_fence)
-		eb->flags[i] |= EXEC_OBJECT_NEEDS_FENCE;
-
 	eb->args->flags |= __EXEC_VALIDATED;
 	return eb_reserve(eb);
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
  2018-06-10 19:43 ` [PATCH 01/17] drm/i915: Apply batch location restrictions before pinning Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 10:00   ` Tvrtko Ursulin
                     ` (2 more replies)
  2018-06-10 19:43 ` [PATCH 03/17] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
                   ` (18 subsequent siblings)
  20 siblings, 3 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

An issue encountered with switching mm on gen7 is that the GPU likes to
hang (with the VS unit busy) when told to force restore the current
context. We can simply workaround this by substituting the
MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
forcing the context to be saved and restored; thereby reloading the
PP_DIR registers and updating the modified page directory!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 65811e2fa7da..6bfa6030198d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1458,6 +1458,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		(HAS_LEGACY_SEMAPHORES(i915) && IS_GEN7(i915)) ?
 		INTEL_INFO(i915)->num_rings - 1 :
 		0;
+	bool force_restore = false;
 	int len;
 	u32 *cs;
 
@@ -1471,6 +1472,12 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 	len = 4;
 	if (IS_GEN7(i915))
 		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
+	if (flags & MI_FORCE_RESTORE) {
+		GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
+		flags &= ~MI_FORCE_RESTORE;
+		force_restore = true;
+		len += 2;
+	}
 
 	cs = intel_ring_begin(rq, len);
 	if (IS_ERR(cs))
@@ -1495,6 +1502,21 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		}
 	}
 
+	if (force_restore) {
+		/*
+		 * The HW doesn't handle being told to restore the current
+		 * context very well. Quite often it likes goes to go off and
+		 * sulk, especially when it is meant to be reloading PP_DIR.
+		 * A very simple fix to force the reload is to simply switch
+		 * away from the current context and back again.
+		 */
+		*cs++ = MI_SET_CONTEXT;
+		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
+							  engine)->state) |
+			MI_MM_SPACE_GTT |
+			MI_RESTORE_INHIBIT;
+	}
+
 	*cs++ = MI_NOOP;
 	*cs++ = MI_SET_CONTEXT;
 	*cs++ = i915_ggtt_offset(rq->hw_context->state) | flags;
@@ -1585,11 +1607,14 @@ static int switch_context(struct i915_request *rq)
 
 		to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
 		engine->legacy_active_ppgtt = to_mm;
-		hw_flags = MI_FORCE_RESTORE;
+
+		if (to_ctx == from_ctx) {
+			hw_flags = MI_FORCE_RESTORE;
+			from_ctx = NULL;
+		}
 	}
 
-	if (rq->hw_context->state &&
-	    (to_ctx != from_ctx || hw_flags & MI_FORCE_RESTORE)) {
+	if (rq->hw_context->state && to_ctx != from_ctx) {
 		GEM_BUG_ON(engine->id != RCS);
 
 		/*
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/17] drm/i915/ringbuffer: Fix context restore upon reset
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
  2018-06-10 19:43 ` [PATCH 01/17] drm/i915: Apply batch location restrictions before pinning Chris Wilson
  2018-06-10 19:43 ` [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 10:28   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail Chris Wilson
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

The discovery with trying to enable full-ppgtt was that we were
completely failing to the load both the mm and context following the
reset. Although we were performing mmio to set the PP_DIR (per-process
GTT) and CCID (context), these were taking no effect (the assumption was
that this would trigger reload of the context and restore the page
tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
following context switch would anything occur.

Since we are then required to reset the context image and PP_DIR using
CS commands, we place those commands into every batch. The hardware
should recognise the no-ops and eliminate the expensive context loads,
but we still have to pay the cost of using cross-powerwell register
writes. In practice, this has no effect on actual context switch times,
and only adds a few hundred nanoseconds to no-op switches. We can improve
the latter by eliminating the w/a around known no-op switches, but there
is an ulterior motive to keeping them.

Always emitting the context switch at the beginning of the request (and
relying on HW to skip unneeded switches) does have one key advantage.
Should we implement request reordering on Haswell, we will not know in
advance what the previous executing context was on the GPU and so we
would not be able to elide the MI_SET_CONTEXT commands ourselves and
always have to emit them. Having our hand forced now actually prepares
us for later.

v2: Sandybridge has to agree to use LRI as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c     |  45 ---------
 drivers/gpu/drm/i915/i915_gem_gtt.h     |   2 -
 drivers/gpu/drm/i915/i915_request.c     |   2 +
 drivers/gpu/drm/i915/i915_request.h     |   3 +
 drivers/gpu/drm/i915/i915_trace.h       |  33 -------
 drivers/gpu/drm/i915/intel_engine_cs.c  |   3 -
 drivers/gpu/drm/i915/intel_ringbuffer.c | 124 +++++++++++-------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   9 --
 8 files changed, 64 insertions(+), 157 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 63bd40688fbb..9966845a7f22 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1712,45 +1712,6 @@ static void gen6_write_page_range(struct i915_hw_ppgtt *ppgtt,
 	wmb();
 }
 
-static inline u32 get_pd_offset(struct i915_hw_ppgtt *ppgtt)
-{
-	GEM_BUG_ON(ppgtt->pd.base.ggtt_offset & 0x3f);
-	return ppgtt->pd.base.ggtt_offset << 10;
-}
-
-static int gen7_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct i915_request *rq)
-{
-	struct intel_engine_cs *engine = rq->engine;
-	u32 *cs;
-
-	/* NB: TLBs must be flushed and invalidated before a switch */
-	cs = intel_ring_begin(rq, 6);
-	if (IS_ERR(cs))
-		return PTR_ERR(cs);
-
-	*cs++ = MI_LOAD_REGISTER_IMM(2);
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine));
-	*cs++ = PP_DIR_DCLV_2G;
-	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
-	*cs++ = get_pd_offset(ppgtt);
-	*cs++ = MI_NOOP;
-	intel_ring_advance(rq, cs);
-
-	return 0;
-}
-
-static int gen6_mm_switch(struct i915_hw_ppgtt *ppgtt,
-			  struct i915_request *rq)
-{
-	struct intel_engine_cs *engine = rq->engine;
-	struct drm_i915_private *dev_priv = rq->i915;
-
-	I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
-	I915_WRITE(RING_PP_DIR_BASE(engine), get_pd_offset(ppgtt));
-	return 0;
-}
-
 static void gen8_ppgtt_enable(struct drm_i915_private *dev_priv)
 {
 	struct intel_engine_cs *engine;
@@ -2024,12 +1985,6 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->vm.dma = &i915->drm.pdev->dev;
 
 	ppgtt->vm.pte_encode = ggtt->vm.pte_encode;
-	if (IS_GEN6(i915))
-		ppgtt->switch_mm = gen6_mm_switch;
-	else if (IS_GEN7(i915))
-		ppgtt->switch_mm = gen7_mm_switch;
-	else
-		BUG();
 
 	err = gen6_ppgtt_alloc(ppgtt);
 	if (err)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 16307ba7e303..e70f6abcd0f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -406,8 +406,6 @@ struct i915_hw_ppgtt {
 
 	gen6_pte_t __iomem *pd_addr;
 
-	int (*switch_mm)(struct i915_hw_ppgtt *ppgtt,
-			 struct i915_request *rq);
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f187250e60c6..9092f5464c24 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -817,6 +817,8 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	/* Keep a second pin for the dual retirement along engine and ring */
 	__intel_context_pin(ce);
 
+	rq->infix = rq->ring->emit; /* end of header; start of user payload */
+
 	/* Check that we didn't interrupt ourselves with a new request */
 	GEM_BUG_ON(rq->timeline->seqno != rq->fence.seqno);
 	return rq;
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 491ff81d0fea..0e9aba53d0e4 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -134,6 +134,9 @@ struct i915_request {
 	/** Position in the ring of the start of the request */
 	u32 head;
 
+	/** Position in the ring of the start of the user packets */
+	u32 infix;
+
 	/**
 	 * Position in the ring of the start of the postfix.
 	 * This is required to calculate the maximum available ring space
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 1472f48ab2e8..b50c6b829715 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -973,39 +973,6 @@ DEFINE_EVENT(i915_context, i915_context_free,
 	TP_ARGS(ctx)
 );
 
-/**
- * DOC: switch_mm tracepoint
- *
- * This tracepoint allows tracking of the mm switch, which is an important point
- * in the lifetime of the vm in the legacy submission path. This tracepoint is
- * called only if full ppgtt is enabled.
- */
-TRACE_EVENT(switch_mm,
-	TP_PROTO(struct intel_engine_cs *engine, struct i915_gem_context *to),
-
-	TP_ARGS(engine, to),
-
-	TP_STRUCT__entry(
-			__field(u16, class)
-			__field(u16, instance)
-			__field(struct i915_gem_context *, to)
-			__field(struct i915_address_space *, vm)
-			__field(u32, dev)
-	),
-
-	TP_fast_assign(
-			__entry->class = engine->uabi_class;
-			__entry->instance = engine->instance;
-			__entry->to = to;
-			__entry->vm = to->ppgtt ? &to->ppgtt->vm : NULL;
-			__entry->dev = engine->i915->drm.primary->index;
-	),
-
-	TP_printk("dev=%u, engine=%u:%u, ctx=%p, ctx_vm=%p",
-		  __entry->dev, __entry->class, __entry->instance, __entry->to,
-		  __entry->vm)
-);
-
 #endif /* _I915_TRACE_H_ */
 
 /* This part must be outside protection */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2ec2e60dc670..d1cf8b4926ab 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1168,9 +1168,6 @@ void intel_engine_lost_context(struct intel_engine_cs *engine)
 
 	lockdep_assert_held(&engine->i915->drm.struct_mutex);
 
-	engine->legacy_active_context = NULL;
-	engine->legacy_active_ppgtt = NULL;
-
 	ce = fetch_and_zero(&engine->last_retired_context);
 	if (ce)
 		intel_context_unpin(ce);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6bfa6030198d..409f499c0a45 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -541,11 +541,23 @@ static struct i915_request *reset_prepare(struct intel_engine_cs *engine)
 	return i915_gem_find_active_request(engine);
 }
 
-static void reset_ring(struct intel_engine_cs *engine,
-		       struct i915_request *request)
+static void skip_request(struct i915_request *rq)
 {
-	GEM_TRACE("%s seqno=%x\n",
-		  engine->name, request ? request->global_seqno : 0);
+	void *vaddr = rq->ring->vaddr;
+	u32 head;
+
+	head = rq->infix;
+	if (rq->postfix < head) {
+		memset32(vaddr + head, MI_NOOP,
+			 (rq->ring->size - head) / sizeof(u32));
+		head = 0;
+	}
+	memset32(vaddr + head, MI_NOOP, (rq->postfix - head) / sizeof(u32));
+}
+
+static void reset_ring(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	GEM_TRACE("%s seqno=%x\n", engine->name, rq ? rq->global_seqno : 0);
 
 	/*
 	 * RC6 must be prevented until the reset is complete and the engine
@@ -569,43 +581,11 @@ static void reset_ring(struct intel_engine_cs *engine,
 	 * If the request was innocent, we try to replay the request with
 	 * the restored context.
 	 */
-	if (request) {
-		struct drm_i915_private *dev_priv = request->i915;
-		struct intel_context *ce = request->hw_context;
-		struct i915_hw_ppgtt *ppgtt;
-
-		if (ce->state) {
-			I915_WRITE(CCID,
-				   i915_ggtt_offset(ce->state) |
-				   BIT(8) /* must be set! */ |
-				   CCID_EXTENDED_STATE_SAVE |
-				   CCID_EXTENDED_STATE_RESTORE |
-				   CCID_EN);
-		}
-
-		ppgtt = request->gem_context->ppgtt ?: engine->i915->mm.aliasing_ppgtt;
-		if (ppgtt) {
-			u32 pd_offset = ppgtt->pd.base.ggtt_offset << 10;
-
-			I915_WRITE(RING_PP_DIR_DCLV(engine), PP_DIR_DCLV_2G);
-			I915_WRITE(RING_PP_DIR_BASE(engine), pd_offset);
-
-			/* Wait for the PD reload to complete */
-			if (intel_wait_for_register(dev_priv,
-						    RING_PP_DIR_BASE(engine),
-						    BIT(0), 0,
-						    10))
-				DRM_ERROR("Wait for reload of ppgtt page-directory timed out\n");
-
-			ppgtt->pd_dirty_rings &= ~intel_engine_flag(engine);
-		}
-
+	if (rq) {
 		/* If the rq hung, jump to its breadcrumb and skip the batch */
-		if (request->fence.error == -EIO)
-			request->ring->head = request->postfix;
-	} else {
-		engine->legacy_active_context = NULL;
-		engine->legacy_active_ppgtt = NULL;
+		rq->ring->head = intel_ring_wrap(rq->ring, rq->head);
+		if (rq->fence.error == -EIO)
+			skip_request(rq);
 	}
 }
 
@@ -1448,6 +1428,29 @@ void intel_legacy_submission_resume(struct drm_i915_private *dev_priv)
 		intel_ring_reset(engine->buffer, 0);
 }
 
+static int load_pd_dir(struct i915_request *rq,
+		       const struct i915_hw_ppgtt *ppgtt)
+{
+	const struct intel_engine_cs * const engine = rq->engine;
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 6);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_DCLV(engine));
+	*cs++ = PP_DIR_DCLV_2G;
+
+	*cs++ = MI_LOAD_REGISTER_IMM(1);
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
+	*cs++ = ppgtt->pd.base.ggtt_offset << 10;
+
+	intel_ring_advance(rq, cs);
+
+	return 0;
+}
+
 static inline int mi_set_context(struct i915_request *rq, u32 flags)
 {
 	struct drm_i915_private *i915 = rq->i915;
@@ -1587,34 +1590,28 @@ static int remap_l3(struct i915_request *rq, int slice)
 static int switch_context(struct i915_request *rq)
 {
 	struct intel_engine_cs *engine = rq->engine;
-	struct i915_gem_context *to_ctx = rq->gem_context;
-	struct i915_hw_ppgtt *to_mm =
-		to_ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
-	struct i915_gem_context *from_ctx = engine->legacy_active_context;
-	struct i915_hw_ppgtt *from_mm = engine->legacy_active_ppgtt;
+	struct i915_gem_context *ctx = rq->gem_context;
+	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt;
+	unsigned int unwind_mm = 0;
 	u32 hw_flags = 0;
 	int ret, i;
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
 	GEM_BUG_ON(HAS_EXECLISTS(rq->i915));
 
-	if (to_mm != from_mm ||
-	    (to_mm && intel_engine_flag(engine) & to_mm->pd_dirty_rings)) {
-		trace_switch_mm(engine, to_ctx);
-		ret = to_mm->switch_mm(to_mm, rq);
+	if (ppgtt) {
+		ret = load_pd_dir(rq, ppgtt);
 		if (ret)
 			goto err;
 
-		to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
-		engine->legacy_active_ppgtt = to_mm;
-
-		if (to_ctx == from_ctx) {
+		if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
+			unwind_mm = intel_engine_flag(engine);
+			ppgtt->pd_dirty_rings &= ~unwind_mm;
 			hw_flags = MI_FORCE_RESTORE;
-			from_ctx = NULL;
 		}
 	}
 
-	if (rq->hw_context->state && to_ctx != from_ctx) {
+	if (rq->hw_context->state) {
 		GEM_BUG_ON(engine->id != RCS);
 
 		/*
@@ -1624,35 +1621,32 @@ static int switch_context(struct i915_request *rq)
 		 * as nothing actually executes using the kernel context; it
 		 * is purely used for flushing user contexts.
 		 */
-		if (i915_gem_context_is_kernel(to_ctx))
+		if (i915_gem_context_is_kernel(ctx))
 			hw_flags = MI_RESTORE_INHIBIT;
 
 		ret = mi_set_context(rq, hw_flags);
 		if (ret)
 			goto err_mm;
-
-		engine->legacy_active_context = to_ctx;
 	}
 
-	if (to_ctx->remap_slice) {
+	if (ctx->remap_slice) {
 		for (i = 0; i < MAX_L3_SLICES; i++) {
-			if (!(to_ctx->remap_slice & BIT(i)))
+			if (!(ctx->remap_slice & BIT(i)))
 				continue;
 
 			ret = remap_l3(rq, i);
 			if (ret)
-				goto err_ctx;
+				goto err_mm;
 		}
 
-		to_ctx->remap_slice = 0;
+		ctx->remap_slice = 0;
 	}
 
 	return 0;
 
-err_ctx:
-	engine->legacy_active_context = from_ctx;
 err_mm:
-	engine->legacy_active_ppgtt = from_mm;
+	if (unwind_mm)
+		ppgtt->pd_dirty_rings |= unwind_mm;
 err:
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index acef385c4c80..b44c67849749 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -557,15 +557,6 @@ struct intel_engine_cs {
 	 */
 	struct intel_context *last_retired_context;
 
-	/* We track the current MI_SET_CONTEXT in order to eliminate
-	 * redudant context switches. This presumes that requests are not
-	 * reordered! Or when they are the tracking is updated along with
-	 * the emission of individual requests into the legacy command
-	 * stream (ring).
-	 */
-	struct i915_gem_context *legacy_active_context;
-	struct i915_hw_ppgtt *legacy_active_ppgtt;
-
 	/* status_notifier: list of callbacks for context-switch changes */
 	struct atomic_notifier_head context_status_notifier;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (2 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 03/17] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 10:16   ` Mika Kuoppala
  2018-06-11 10:30   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 05/17] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories Chris Wilson
                   ` (16 subsequent siblings)
  20 siblings, 2 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

The HW only accepts offsets within ring->size, and fails peculiarly if
the RING_HEAD or RING_TAIL is set to ring->size. Therefore whenever we
set ring->head/ring->tail we want to make sure it is within value (using
intel_ring_wrap()).

v2: Double check execlists as well

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        |  6 ++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |  5 +++++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 12 ++++++++++++
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 091e28f0e024..3e008adf5a01 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1413,6 +1413,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
 	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
 		i915_ggtt_offset(ce->ring->vma);
+	GEM_BUG_ON(!intel_ring_offset_valid(ce->ring, ce->ring->head));
 	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
 
 	ce->state->obj->pin_global++;
@@ -2001,9 +2002,10 @@ static void execlists_reset(struct intel_engine_cs *engine,
 
 	/* Move the RING_HEAD onto the breadcrumb, past the hanging batch */
 	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(request->ring->vma);
-	regs[CTX_RING_HEAD + 1] = request->postfix;
 
-	request->ring->head = request->postfix;
+	request->ring->head = intel_ring_wrap(request->ring, request->postfix);
+	regs[CTX_RING_HEAD + 1] = request->ring->head;
+
 	intel_ring_update_space(request->ring);
 
 	/* Reset WaIdleLiteRestore:bdw,skl as well */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 409f499c0a45..7970ecb199e2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -496,6 +496,10 @@ static int init_ring_common(struct intel_engine_cs *engine)
 		DRM_DEBUG_DRIVER("%s initialization failed [head=%08x], fudging\n",
 				 engine->name, I915_READ_HEAD(engine));
 
+	/* Check that the ring offsets point within the ring! */
+	GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->head));
+	GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->tail));
+
 	intel_ring_update_space(ring);
 	I915_WRITE_HEAD(engine, ring->head);
 	I915_WRITE_TAIL(engine, ring->tail);
@@ -1064,6 +1068,7 @@ int intel_ring_pin(struct intel_ring *ring,
 
 void intel_ring_reset(struct intel_ring *ring, u32 tail)
 {
+	tail = intel_ring_wrap(ring, tail);
 	ring->tail = tail;
 	ring->head = tail;
 	ring->emit = tail;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b44c67849749..1d8140ac2016 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -805,6 +805,18 @@ static inline u32 intel_ring_wrap(const struct intel_ring *ring, u32 pos)
 	return pos & (ring->size - 1);
 }
 
+static inline bool
+intel_ring_offset_valid(const struct intel_ring *ring, u32 pos)
+{
+	if (pos & -ring->size) /* must be strictly within the ring */
+		return false;
+
+	if (!IS_ALIGNED(pos, 8)) /* must be qword aligned */
+		return false;
+
+	return true;
+}
+
 static inline u32 intel_ring_offset(const struct i915_request *rq, void *addr)
 {
 	/* Don't write ring->size (equivalent to 0) as that hangs some GPUs. */
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/17] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (3 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 10:37   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR Chris Wilson
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

When we update the gen6 ppgtt page directories, we do so by writing the
new address into a reserved slot in the GGTT. It appears that when the
GPU reads that entry from the gsm, it uses its small cache and that we
need to invalidate that cache after writing. We don't see an issue
currently as we prefill the ppgtt page directories on creation; and only
create the single aliasing_ppgtt long before we start using the GGTT
(and so before the cache mayhave a conflicting entry).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9966845a7f22..63569bd71fef 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1693,8 +1693,8 @@ static inline void gen6_write_pde(const struct i915_hw_ppgtt *ppgtt,
 				  const struct i915_page_table *pt)
 {
 	/* Caller needs to make sure the write completes if necessary */
-	writel_relaxed(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
-		       ppgtt->pd_addr + pde);
+	iowrite32(GEN6_PDE_ADDR_ENCODE(px_dma(pt)) | GEN6_PDE_VALID,
+		  ppgtt->pd_addr + pde);
 }
 
 /* Write all the page tables found in the ppgtt structure to incrementing page
@@ -1709,7 +1709,7 @@ static void gen6_write_page_range(struct i915_hw_ppgtt *ppgtt,
 		gen6_write_pde(ppgtt, pde, pt);
 
 	mark_tlbs_dirty(ppgtt);
-	wmb();
+	gen6_ggtt_invalidate(ppgtt->vm.i915);
 }
 
 static void gen8_ppgtt_enable(struct drm_i915_private *dev_priv)
@@ -1864,7 +1864,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 	if (flush) {
 		mark_tlbs_dirty(ppgtt);
-		wmb();
+		gen6_ggtt_invalidate(ppgtt->vm.i915);
 	}
 
 	return 0;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (4 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 05/17] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 10:12   ` Chris Wilson
  2018-06-11 10:43   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 07/17] drm/i915/gtt: Subclass gen6_hw_ppgtt Chris Wilson
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

After triggering the mm switch with a load of PD_DIR, which may be
deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands
with that load by posting a read of PD_DIR (or else those subsequent
commands may access the stale page tables).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c  |  5 +--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 48 ++++++++++++++++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  5 ++-
 3 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index d1cf8b4926ab..d278fed8cb31 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -499,7 +499,8 @@ void intel_engine_setup_common(struct intel_engine_cs *engine)
 	intel_engine_init_cmd_parser(engine);
 }
 
-int intel_engine_create_scratch(struct intel_engine_cs *engine, int size)
+int intel_engine_create_scratch(struct intel_engine_cs *engine,
+				unsigned int size)
 {
 	struct drm_i915_gem_object *obj;
 	struct i915_vma *vma;
@@ -533,7 +534,7 @@ int intel_engine_create_scratch(struct intel_engine_cs *engine, int size)
 	return ret;
 }
 
-static void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
+void intel_engine_cleanup_scratch(struct intel_engine_cs *engine)
 {
 	i915_vma_unpin_and_release(&engine->scratch);
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 7970ecb199e2..66183eb3c102 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1362,8 +1362,9 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
 
 static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 {
-	struct intel_ring *ring;
 	struct i915_timeline *timeline;
+	struct intel_ring *ring;
+	unsigned int size;
 	int err;
 
 	intel_engine_setup_common(engine);
@@ -1389,12 +1390,21 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	GEM_BUG_ON(engine->buffer);
 	engine->buffer = ring;
 
-	err = intel_engine_init_common(engine);
+	size = PAGE_SIZE;
+	if (HAS_BROKEN_CS_TLB(engine->i915))
+		size = I830_WA_SIZE;
+	err = intel_engine_create_scratch(engine, size);
 	if (err)
 		goto err_unpin;
 
+	err = intel_engine_init_common(engine);
+	if (err)
+		goto err_scratch;
+
 	return 0;
 
+err_scratch:
+	intel_engine_cleanup_scratch(engine);
 err_unpin:
 	intel_ring_unpin(ring);
 err_ring:
@@ -1456,6 +1466,24 @@ static int load_pd_dir(struct i915_request *rq,
 	return 0;
 }
 
+static int flush_pd_dir(struct i915_request *rq)
+{
+	const struct intel_engine_cs * const engine = rq->engine;
+	u32 *cs;
+
+	cs = intel_ring_begin(rq, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	*cs++ = MI_STORE_REGISTER_MEM | MI_SRM_LRM_GLOBAL_GTT;
+	*cs++ = i915_mmio_reg_offset(RING_PP_DIR_BASE(engine));
+	*cs++ = i915_ggtt_offset(engine->scratch);
+	*cs++ = MI_NOOP;
+
+	intel_ring_advance(rq, cs);
+	return 0;
+}
+
 static inline int mi_set_context(struct i915_request *rq, u32 flags)
 {
 	struct drm_i915_private *i915 = rq->i915;
@@ -1634,6 +1662,12 @@ static int switch_context(struct i915_request *rq)
 			goto err_mm;
 	}
 
+	if (ppgtt) {
+		ret = flush_pd_dir(rq);
+		if (ret)
+			goto err_mm;
+	}
+
 	if (ctx->remap_slice) {
 		for (i = 0; i < MAX_L3_SLICES; i++) {
 			if (!(ctx->remap_slice & BIT(i)))
@@ -2154,16 +2188,6 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	if (INTEL_GEN(dev_priv) >= 6) {
-		ret = intel_engine_create_scratch(engine, PAGE_SIZE);
-		if (ret)
-			return ret;
-	} else if (HAS_BROKEN_CS_TLB(dev_priv)) {
-		ret = intel_engine_create_scratch(engine, I830_WA_SIZE);
-		if (ret)
-			return ret;
-	}
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 1d8140ac2016..eba271e74c25 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -873,9 +873,12 @@ void intel_engine_init_global_seqno(struct intel_engine_cs *engine, u32 seqno);
 
 void intel_engine_setup_common(struct intel_engine_cs *engine);
 int intel_engine_init_common(struct intel_engine_cs *engine);
-int intel_engine_create_scratch(struct intel_engine_cs *engine, int size);
 void intel_engine_cleanup_common(struct intel_engine_cs *engine);
 
+int intel_engine_create_scratch(struct intel_engine_cs *engine,
+				unsigned int size);
+void intel_engine_cleanup_scratch(struct intel_engine_cs *engine);
+
 int intel_init_render_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine);
 int intel_init_blt_ring_buffer(struct intel_engine_cs *engine);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/17] drm/i915/gtt: Subclass gen6_hw_ppgtt
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (5 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-10 19:43 ` [PATCH 08/17] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create Chris Wilson
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

The legacy gen6 ppgtt needs a little more hand holding than gen8+, and
so requires a larger structure. As I intend to make this slightly more
complicated in the future, separate the gen6 from the core gen8 hw
struct by subclassing. This patch moves the gen6 only features out to
gen6_hw_ppgtt and pipes the new type everywhere that needs it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c           | 94 +++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h           | 19 +++-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 10 +-
 3 files changed, 65 insertions(+), 58 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 63569bd71fef..cc0e747b9286 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1636,20 +1636,20 @@ static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915)
 	return ERR_PTR(err);
 }
 
-static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
+static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
-	struct i915_address_space *vm = &ppgtt->vm;
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+	struct i915_address_space *vm = &base->vm;
 	struct i915_page_table *unused;
 	gen6_pte_t scratch_pte;
 	u32 pd_entry, pte, pde;
-	u32 start = 0, length = ppgtt->vm.total;
 
 	scratch_pte = vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
 
-	gen6_for_each_pde(unused, &ppgtt->pd, start, length, pde) {
+	gen6_for_all_pdes(unused, &base->pd, pde) {
 		u32 expected;
 		gen6_pte_t *pt_vaddr;
-		const dma_addr_t pt_addr = px_dma(ppgtt->pd.page_table[pde]);
+		const dma_addr_t pt_addr = px_dma(base->pd.page_table[pde]);
 		pd_entry = readl(ppgtt->pd_addr + pde);
 		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
 
@@ -1660,7 +1660,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 				   expected);
 		seq_printf(m, "\tPDE: %x\n", pd_entry);
 
-		pt_vaddr = kmap_atomic_px(ppgtt->pd.page_table[pde]);
+		pt_vaddr = kmap_atomic_px(base->pd.page_table[pde]);
 
 		for (pte = 0; pte < GEN6_PTES; pte+=4) {
 			unsigned long va =
@@ -1688,7 +1688,7 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *ppgtt, struct seq_file *m)
 }
 
 /* Write pde (index) from the page directory @pd to the page table @pt */
-static inline void gen6_write_pde(const struct i915_hw_ppgtt *ppgtt,
+static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt,
 				  const unsigned int pde,
 				  const struct i915_page_table *pt)
 {
@@ -1699,17 +1699,18 @@ static inline void gen6_write_pde(const struct i915_hw_ppgtt *ppgtt,
 
 /* Write all the page tables found in the ppgtt structure to incrementing page
  * directories. */
-static void gen6_write_page_range(struct i915_hw_ppgtt *ppgtt,
+static void gen6_write_page_range(struct i915_hw_ppgtt *base,
 				  u32 start, u32 length)
 {
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 	struct i915_page_table *pt;
 	unsigned int pde;
 
-	gen6_for_each_pde(pt, &ppgtt->pd, start, length, pde)
+	gen6_for_each_pde(pt, &base->pd, start, length, pde)
 		gen6_write_pde(ppgtt, pde, pt);
 
-	mark_tlbs_dirty(ppgtt);
-	gen6_ggtt_invalidate(ppgtt->vm.i915);
+	mark_tlbs_dirty(base);
+	gen6_ggtt_invalidate(base->vm.i915);
 }
 
 static void gen8_ppgtt_enable(struct drm_i915_private *dev_priv)
@@ -1843,28 +1844,28 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 static int gen6_alloc_va_range(struct i915_address_space *vm,
 			       u64 start, u64 length)
 {
-	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	struct i915_page_table *pt;
 	u64 from = start;
 	unsigned int pde;
 	bool flush = false;
 
-	gen6_for_each_pde(pt, &ppgtt->pd, start, length, pde) {
+	gen6_for_each_pde(pt, &ppgtt->base.pd, start, length, pde) {
 		if (pt == vm->scratch_pt) {
 			pt = alloc_pt(vm);
 			if (IS_ERR(pt))
 				goto unwind_out;
 
 			gen6_initialize_pt(vm, pt);
-			ppgtt->pd.page_table[pde] = pt;
+			ppgtt->base.pd.page_table[pde] = pt;
 			gen6_write_pde(ppgtt, pde, pt);
 			flush = true;
 		}
 	}
 
 	if (flush) {
-		mark_tlbs_dirty(ppgtt);
-		gen6_ggtt_invalidate(ppgtt->vm.i915);
+		mark_tlbs_dirty(&ppgtt->base);
+		gen6_ggtt_invalidate(ppgtt->base.vm.i915);
 	}
 
 	return 0;
@@ -1901,24 +1902,23 @@ static void gen6_free_scratch(struct i915_address_space *vm)
 
 static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
-	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
-	struct i915_page_directory *pd = &ppgtt->pd;
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	struct i915_page_table *pt;
 	u32 pde;
 
 	drm_mm_remove_node(&ppgtt->node);
 
-	gen6_for_all_pdes(pt, pd, pde)
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
 		if (pt != vm->scratch_pt)
 			free_pt(vm, pt);
 
 	gen6_free_scratch(vm);
 }
 
-static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
 {
-	struct i915_address_space *vm = &ppgtt->vm;
-	struct drm_i915_private *dev_priv = ppgtt->vm.i915;
+	struct i915_address_space *vm = &ppgtt->base.vm;
+	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
 	int ret;
 
@@ -1943,11 +1943,11 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	if (ppgtt->node.start < ggtt->mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
 
-	ppgtt->pd.base.ggtt_offset =
+	ppgtt->base.pd.base.ggtt_offset =
 		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
 
 	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
-		ppgtt->pd.base.ggtt_offset / sizeof(gen6_pte_t);
+		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
 
 	return 0;
 
@@ -1956,70 +1956,70 @@ static int gen6_ppgtt_allocate_page_directories(struct i915_hw_ppgtt *ppgtt)
 	return ret;
 }
 
-static int gen6_ppgtt_alloc(struct i915_hw_ppgtt *ppgtt)
+static int gen6_ppgtt_alloc(struct gen6_hw_ppgtt *ppgtt)
 {
 	return gen6_ppgtt_allocate_page_directories(ppgtt);
 }
 
-static void gen6_scratch_va_range(struct i915_hw_ppgtt *ppgtt,
+static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
 				  u64 start, u64 length)
 {
 	struct i915_page_table *unused;
 	u32 pde;
 
-	gen6_for_each_pde(unused, &ppgtt->pd, start, length, pde)
-		ppgtt->pd.page_table[pde] = ppgtt->vm.scratch_pt;
+	gen6_for_each_pde(unused, &ppgtt->base.pd, start, length, pde)
+		ppgtt->base.pd.page_table[pde] = ppgtt->base.vm.scratch_pt;
 }
 
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 {
 	struct i915_ggtt * const ggtt = &i915->ggtt;
-	struct i915_hw_ppgtt *ppgtt;
+	struct gen6_hw_ppgtt *ppgtt;
 	int err;
 
 	ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
 	if (!ppgtt)
 		return ERR_PTR(-ENOMEM);
 
-	ppgtt->vm.i915 = i915;
-	ppgtt->vm.dma = &i915->drm.pdev->dev;
+	ppgtt->base.vm.i915 = i915;
+	ppgtt->base.vm.dma = &i915->drm.pdev->dev;
 
-	ppgtt->vm.pte_encode = ggtt->vm.pte_encode;
+	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 
 	err = gen6_ppgtt_alloc(ppgtt);
 	if (err)
 		goto err_free;
 
-	ppgtt->vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
+	ppgtt->base.vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
 
-	gen6_scratch_va_range(ppgtt, 0, ppgtt->vm.total);
-	gen6_write_page_range(ppgtt, 0, ppgtt->vm.total);
+	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
+	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
 
-	err = gen6_alloc_va_range(&ppgtt->vm, 0, ppgtt->vm.total);
+	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
 	if (err)
 		goto err_cleanup;
 
-	ppgtt->vm.clear_range = gen6_ppgtt_clear_range;
-	ppgtt->vm.insert_entries = gen6_ppgtt_insert_entries;
-	ppgtt->vm.cleanup = gen6_ppgtt_cleanup;
-	ppgtt->debug_dump = gen6_dump_ppgtt;
+	ppgtt->base.vm.clear_range = gen6_ppgtt_clear_range;
+	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
+	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
+	ppgtt->base.debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->vm.vma_ops.bind_vma    = gen6_ppgtt_bind_vma;
-	ppgtt->vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
-	ppgtt->vm.vma_ops.set_pages   = ppgtt_set_pages;
-	ppgtt->vm.vma_ops.clear_pages = clear_pages;
+	ppgtt->base.vm.vma_ops.bind_vma    = gen6_ppgtt_bind_vma;
+	ppgtt->base.vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
+	ppgtt->base.vm.vma_ops.set_pages   = ppgtt_set_pages;
+	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
 
 	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
-			 ppgtt->pd.base.ggtt_offset << 10);
+			 ppgtt->base.pd.base.ggtt_offset << 10);
 
-	return ppgtt;
+	return &ppgtt->base;
 
 err_cleanup:
-	gen6_ppgtt_cleanup(&ppgtt->vm);
+	gen6_ppgtt_cleanup(&ppgtt->base.vm);
 err_free:
 	kfree(ppgtt);
 	return ERR_PTR(err);
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index e70f6abcd0f2..7d65d813eff0 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -396,7 +396,7 @@ struct i915_ggtt {
 struct i915_hw_ppgtt {
 	struct i915_address_space vm;
 	struct kref ref;
-	struct drm_mm_node node;
+
 	unsigned long pd_dirty_rings;
 	union {
 		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
@@ -404,11 +404,24 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory pd;		/* GEN6-7 */
 	};
 
-	gen6_pte_t __iomem *pd_addr;
-
 	void (*debug_dump)(struct i915_hw_ppgtt *ppgtt, struct seq_file *m);
 };
 
+struct gen6_hw_ppgtt {
+	struct i915_hw_ppgtt base;
+
+	struct drm_mm_node node;
+	gen6_pte_t __iomem *pd_addr;
+};
+
+#define __to_gen6_ppgtt(base) container_of(base, struct gen6_hw_ppgtt, base)
+
+static inline struct gen6_hw_ppgtt *to_gen6_ppgtt(struct i915_hw_ppgtt *base)
+{
+	BUILD_BUG_ON(offsetof(struct gen6_hw_ppgtt, base));
+	return __to_gen6_ppgtt(base);
+}
+
 /*
  * gen6_for_each_pde() iterates over every pde from start until start+length.
  * If start and start+length are not perfectly divisible, the macro will round
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index f80cf7ce3fa9..a4060238bef0 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -135,17 +135,13 @@ static int igt_ppgtt_alloc(void *arg)
 	struct drm_i915_private *dev_priv = arg;
 	struct i915_hw_ppgtt *ppgtt;
 	u64 size, last;
-	int err;
+	int err = 0;
 
 	/* Allocate a ppggt and try to fill the entire range */
 
 	if (!USES_PPGTT(dev_priv))
 		return 0;
 
-	ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
-	if (!ppgtt)
-		return -ENOMEM;
-
 	mutex_lock(&dev_priv->drm.struct_mutex);
 	ppgtt = __hw_ppgtt_create(dev_priv);
 	if (IS_ERR(ppgtt)) {
@@ -153,10 +149,8 @@ static int igt_ppgtt_alloc(void *arg)
 		goto err_unlock;
 	}
 
-	if (!ppgtt->vm.allocate_va_range) {
-		err = 0;
+	if (!ppgtt->vm.allocate_va_range)
 		goto err_ppgtt_cleanup;
-	}
 
 	/* Check we can allocate the entire range */
 	for (size = 4096;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/17] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (6 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 07/17] drm/i915/gtt: Subclass gen6_hw_ppgtt Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 10:48   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 09/17] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

Pull the empty stubs together into the top level gen6_ppgtt_create, and
tear each one down on error in proper onion order (rather than use
Joonas' pet hate of calling the cleanup function in indeterminable
state).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 81 ++++++++++++++---------------
 1 file changed, 39 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index cc0e747b9286..e9fcc4370b1a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1875,7 +1875,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	return -ENOMEM;
 }
 
-static int gen6_init_scratch(struct i915_address_space *vm)
+static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
 {
 	int ret;
 
@@ -1894,33 +1894,37 @@ static int gen6_init_scratch(struct i915_address_space *vm)
 	return 0;
 }
 
-static void gen6_free_scratch(struct i915_address_space *vm)
+static void gen6_ppgtt_free_scratch(struct i915_address_space *vm)
 {
 	free_pt(vm, vm->scratch_pt);
 	cleanup_scratch_page(vm);
 }
 
-static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
+static void gen6_ppgtt_free_pd(struct gen6_hw_ppgtt *ppgtt)
 {
-	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	struct i915_page_table *pt;
 	u32 pde;
 
-	drm_mm_remove_node(&ppgtt->node);
-
 	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
-		if (pt != vm->scratch_pt)
-			free_pt(vm, pt);
+		if (pt != ppgtt->base.vm.scratch_pt)
+			free_pt(&ppgtt->base.vm, pt);
+}
 
-	gen6_free_scratch(vm);
+static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
+
+	drm_mm_remove_node(&ppgtt->node);
+
+	gen6_ppgtt_free_pd(ppgtt);
+	gen6_ppgtt_free_scratch(vm);
 }
 
 static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
 {
-	struct i915_address_space *vm = &ppgtt->base.vm;
 	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
 	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	int ret;
+	int err;
 
 	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
 	 * allocator works in address space sizes, so it's multiplied by page
@@ -1928,17 +1932,13 @@ static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
 	 */
 	BUG_ON(!drm_mm_initialized(&ggtt->vm.mm));
 
-	ret = gen6_init_scratch(vm);
-	if (ret)
-		return ret;
-
-	ret = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
+	err = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
 				  GEN6_PD_SIZE, GEN6_PD_ALIGN,
 				  I915_COLOR_UNEVICTABLE,
 				  0, ggtt->vm.total,
 				  PIN_HIGH);
-	if (ret)
-		goto err_out;
+	if (err)
+		return err;
 
 	if (ppgtt->node.start < ggtt->mappable_end)
 		DRM_DEBUG("Forced to use aperture for PDEs\n");
@@ -1950,15 +1950,6 @@ static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
 		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
 
 	return 0;
-
-err_out:
-	gen6_free_scratch(vm);
-	return ret;
-}
-
-static int gen6_ppgtt_alloc(struct gen6_hw_ppgtt *ppgtt)
-{
-	return gen6_ppgtt_allocate_page_directories(ppgtt);
 }
 
 static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
@@ -1984,21 +1975,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->base.vm.i915 = i915;
 	ppgtt->base.vm.dma = &i915->drm.pdev->dev;
 
-	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
-
-	err = gen6_ppgtt_alloc(ppgtt);
-	if (err)
-		goto err_free;
-
 	ppgtt->base.vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
 
-	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
-	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
-
-	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
-	if (err)
-		goto err_cleanup;
-
 	ppgtt->base.vm.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
@@ -2009,6 +1987,23 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->base.vm.vma_ops.set_pages   = ppgtt_set_pages;
 	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
 
+	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
+
+	err = gen6_ppgtt_init_scratch(&ppgtt->base.vm);
+	if (err)
+		goto err_free;
+
+	err = gen6_ppgtt_allocate_page_directories(ppgtt);
+	if (err)
+		goto err_scratch;
+
+	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
+	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
+
+	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
+	if (err)
+		goto err_pd;
+
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
 			 ppgtt->node.size >> 20,
 			 ppgtt->node.start / PAGE_SIZE);
@@ -2018,8 +2013,10 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 	return &ppgtt->base;
 
-err_cleanup:
-	gen6_ppgtt_cleanup(&ppgtt->base.vm);
+err_pd:
+	gen6_ppgtt_free_pd(ppgtt);
+err_scratch:
+	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
 	kfree(ppgtt);
 	return ERR_PTR(err);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/17] drm/i915/gtt: Make gen6 page directories evictable
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (7 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 08/17] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 10:56   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 10/17] drm/i915/gtt: Only keep gen6 page directories pinned while active Chris Wilson
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

Currently all page directories are bound at creation using an
unevictable node in the GGTT. This severely limits us as we cannot
remove any inactive ppgtt for new contexts, or under aperture pressure.
To fix this we need to make the page directory into a first class and
unbindable vma. Hence, the creation of a custom vma to wrap the page
directory as opposed to a GEM object.

In this patch, we leave the page directories pinned upon creation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 257 ++++++++++++++++------------
 drivers/gpu/drm/i915/i915_gem_gtt.h |   2 +-
 drivers/gpu/drm/i915/i915_vma.h     |   7 +
 3 files changed, 155 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e9fcc4370b1a..c6949da3423f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1640,50 +1640,55 @@ static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 	struct i915_address_space *vm = &base->vm;
-	struct i915_page_table *unused;
-	gen6_pte_t scratch_pte;
-	u32 pd_entry, pte, pde;
-
-	scratch_pte = vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
-
-	gen6_for_all_pdes(unused, &base->pd, pde) {
-		u32 expected;
-		gen6_pte_t *pt_vaddr;
-		const dma_addr_t pt_addr = px_dma(base->pd.page_table[pde]);
-		pd_entry = readl(ppgtt->pd_addr + pde);
-		expected = (GEN6_PDE_ADDR_ENCODE(pt_addr) | GEN6_PDE_VALID);
-
-		if (pd_entry != expected)
-			seq_printf(m, "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
-				   pde,
-				   pd_entry,
-				   expected);
-		seq_printf(m, "\tPDE: %x\n", pd_entry);
-
-		pt_vaddr = kmap_atomic_px(base->pd.page_table[pde]);
-
-		for (pte = 0; pte < GEN6_PTES; pte+=4) {
-			unsigned long va =
-				(pde * PAGE_SIZE * GEN6_PTES) +
-				(pte * PAGE_SIZE);
+	const gen6_pte_t scratch_pte =
+		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	struct i915_page_table *pt;
+	u32 pte, pde;
+
+	gen6_for_all_pdes(pt, &base->pd, pde) {
+		gen6_pte_t *vaddr;
+
+		if (pt == base->vm.scratch_pt)
+			continue;
+
+		if (i915_vma_is_bound(ppgtt->vma, I915_VMA_GLOBAL_BIND)) {
+			u32 expected =
+				GEN6_PDE_ADDR_ENCODE(px_dma(pt)) |
+				GEN6_PDE_VALID;
+			u32 pd_entry = readl(ppgtt->pd_addr + pde);
+
+			if (pd_entry != expected)
+				seq_printf(m,
+					   "\tPDE #%d mismatch: Actual PDE: %x Expected PDE: %x\n",
+					   pde,
+					   pd_entry,
+					   expected);
+
+			seq_printf(m, "\tPDE: %x\n", pd_entry);
+		}
+
+		vaddr = kmap_atomic_px(base->pd.page_table[pde]);
+		for (pte = 0; pte < GEN6_PTES; pte += 4) {
 			int i;
-			bool found = false;
+
 			for (i = 0; i < 4; i++)
-				if (pt_vaddr[pte + i] != scratch_pte)
-					found = true;
-			if (!found)
+				if (vaddr[pte + i] != scratch_pte)
+					break;
+			if (i == 4)
 				continue;
 
-			seq_printf(m, "\t\t0x%lx [%03d,%04d]: =", va, pde, pte);
+			seq_printf(m, "\t\t(%03d, %04d) %08lx: ",
+				   pde, pte,
+				   (pde * GEN6_PTES + pte) * PAGE_SIZE);
 			for (i = 0; i < 4; i++) {
-				if (pt_vaddr[pte + i] != scratch_pte)
-					seq_printf(m, " %08x", pt_vaddr[pte + i]);
+				if (vaddr[pte + i] != scratch_pte)
+					seq_printf(m, " %08x", vaddr[pte + i]);
 				else
-					seq_puts(m, "  SCRATCH ");
+					seq_puts(m, "  SCRATCH");
 			}
 			seq_puts(m, "\n");
 		}
-		kunmap_atomic(pt_vaddr);
+		kunmap_atomic(vaddr);
 	}
 }
 
@@ -1697,22 +1702,6 @@ static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt,
 		  ppgtt->pd_addr + pde);
 }
 
-/* Write all the page tables found in the ppgtt structure to incrementing page
- * directories. */
-static void gen6_write_page_range(struct i915_hw_ppgtt *base,
-				  u32 start, u32 length)
-{
-	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
-	struct i915_page_table *pt;
-	unsigned int pde;
-
-	gen6_for_each_pde(pt, &base->pd, start, length, pde)
-		gen6_write_pde(ppgtt, pde, pt);
-
-	mark_tlbs_dirty(base);
-	gen6_ggtt_invalidate(base->vm.i915);
-}
-
 static void gen8_ppgtt_enable(struct drm_i915_private *dev_priv)
 {
 	struct intel_engine_cs *engine;
@@ -1858,8 +1847,12 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 
 			gen6_initialize_pt(vm, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
-			gen6_write_pde(ppgtt, pde, pt);
-			flush = true;
+
+			if (i915_vma_is_bound(ppgtt->vma,
+					      I915_VMA_GLOBAL_BIND)) {
+				gen6_write_pde(ppgtt, pde, pt);
+				flush = true;
+			}
 		}
 	}
 
@@ -1875,8 +1868,11 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	return -ENOMEM;
 }
 
-static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
+static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt)
 {
+	struct i915_address_space * const vm = &ppgtt->base.vm;
+	struct i915_page_table *unused;
+	u32 pde;
 	int ret;
 
 	ret = setup_scratch_page(vm, __GFP_HIGHMEM);
@@ -1890,6 +1886,8 @@ static int gen6_ppgtt_init_scratch(struct i915_address_space *vm)
 	}
 
 	gen6_initialize_pt(vm, vm->scratch_pt);
+	gen6_for_all_pdes(unused, &ppgtt->base.pd, pde)
+		ppgtt->base.pd.page_table[pde] = vm->scratch_pt;
 
 	return 0;
 }
@@ -1914,52 +1912,104 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 
-	drm_mm_remove_node(&ppgtt->node);
+	i915_vma_unpin(ppgtt->vma);
+	i915_vma_destroy(ppgtt->vma);
 
 	gen6_ppgtt_free_pd(ppgtt);
 	gen6_ppgtt_free_scratch(vm);
 }
 
-static int gen6_ppgtt_allocate_page_directories(struct gen6_hw_ppgtt *ppgtt)
+static int pd_vma_set_pages(struct i915_vma *vma)
 {
-	struct drm_i915_private *dev_priv = ppgtt->base.vm.i915;
-	struct i915_ggtt *ggtt = &dev_priv->ggtt;
-	int err;
+	vma->pages = ERR_PTR(-ENODEV);
+	return 0;
+}
 
-	/* PPGTT PDEs reside in the GGTT and consists of 512 entries. The
-	 * allocator works in address space sizes, so it's multiplied by page
-	 * size. We allocate at the top of the GTT to avoid fragmentation.
-	 */
-	BUG_ON(!drm_mm_initialized(&ggtt->vm.mm));
+static void pd_vma_clear_pages(struct i915_vma *vma)
+{
+	GEM_BUG_ON(!vma->pages);
 
-	err = i915_gem_gtt_insert(&ggtt->vm, &ppgtt->node,
-				  GEN6_PD_SIZE, GEN6_PD_ALIGN,
-				  I915_COLOR_UNEVICTABLE,
-				  0, ggtt->vm.total,
-				  PIN_HIGH);
-	if (err)
-		return err;
+	vma->pages = NULL;
+}
+
+static int pd_vma_bind(struct i915_vma *vma,
+		       enum i915_cache_level cache_level,
+		       u32 unused)
+{
+	struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm);
+	struct gen6_hw_ppgtt *ppgtt = vma->private;
+	u32 ggtt_offset = i915_ggtt_offset(vma) / PAGE_SIZE;
+	struct i915_page_table *pt;
+	unsigned int pde;
 
-	if (ppgtt->node.start < ggtt->mappable_end)
-		DRM_DEBUG("Forced to use aperture for PDEs\n");
+	ppgtt->base.pd.base.ggtt_offset = ggtt_offset * sizeof(gen6_pte_t);
+	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm + ggtt_offset;
 
-	ppgtt->base.pd.base.ggtt_offset =
-		ppgtt->node.start / PAGE_SIZE * sizeof(gen6_pte_t);
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde)
+		gen6_write_pde(ppgtt, pde, pt);
 
-	ppgtt->pd_addr = (gen6_pte_t __iomem *)ggtt->gsm +
-		ppgtt->base.pd.base.ggtt_offset / sizeof(gen6_pte_t);
+	mark_tlbs_dirty(&ppgtt->base);
+	gen6_ggtt_invalidate(ppgtt->base.vm.i915);
 
 	return 0;
 }
 
-static void gen6_scratch_va_range(struct gen6_hw_ppgtt *ppgtt,
-				  u64 start, u64 length)
+static void pd_vma_unbind(struct i915_vma *vma)
 {
-	struct i915_page_table *unused;
-	u32 pde;
+}
 
-	gen6_for_each_pde(unused, &ppgtt->base.pd, start, length, pde)
-		ppgtt->base.pd.page_table[pde] = ppgtt->base.vm.scratch_pt;
+static const struct i915_vma_ops pd_vma_ops = {
+	.set_pages = pd_vma_set_pages,
+	.clear_pages = pd_vma_clear_pages,
+	.bind_vma = pd_vma_bind,
+	.unbind_vma = pd_vma_unbind,
+};
+
+static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
+{
+	struct drm_i915_private *i915 = ppgtt->base.vm.i915;
+	struct i915_ggtt *ggtt = &i915->ggtt;
+	struct i915_vma *vma;
+	int i;
+
+	GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE));
+	GEM_BUG_ON(size > ggtt->vm.total);
+
+	vma = kmem_cache_zalloc(i915->vmas, GFP_KERNEL);
+	if (!vma)
+		return ERR_PTR(-ENOMEM);
+
+	for (i = 0; i < ARRAY_SIZE(vma->last_read); i++)
+		init_request_active(&vma->last_read[i], NULL);
+	init_request_active(&vma->last_fence, NULL);
+
+	vma->vm = &ggtt->vm;
+	vma->ops = &pd_vma_ops;
+	vma->private = ppgtt;
+
+	vma->size = size;
+	vma->fence_size = size;
+	vma->flags = I915_VMA_GGTT;
+	vma->ggtt_view.type = I915_GGTT_VIEW_ROTATED; /* prevent fencing */
+
+	INIT_LIST_HEAD(&vma->obj_link);
+	list_add(&vma->vm_link, &vma->vm->unbound_list);
+
+	return vma;
+}
+
+static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+
+	/*
+	 * PPGTT PDEs reside in the GGTT and consists of 512 entries. The
+	 * allocator works in address space sizes, so it's multiplied by page
+	 * size. We allocate at the top of the GTT to avoid fragmentation.
+	 */
+	return i915_vma_pin(ppgtt->vma,
+			    0, GEN6_PD_ALIGN,
+			    PIN_GLOBAL | PIN_HIGH);
 }
 
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
@@ -1989,24 +2039,27 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 	ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode;
 
-	err = gen6_ppgtt_init_scratch(&ppgtt->base.vm);
+	err = gen6_ppgtt_init_scratch(ppgtt);
 	if (err)
 		goto err_free;
 
-	err = gen6_ppgtt_allocate_page_directories(ppgtt);
-	if (err)
+	ppgtt->vma = pd_vma_create(ppgtt, GEN6_PD_SIZE);
+	if (IS_ERR(ppgtt->vma)) {
+		err = PTR_ERR(ppgtt->vma);
 		goto err_scratch;
-
-	gen6_scratch_va_range(ppgtt, 0, ppgtt->base.vm.total);
-	gen6_write_page_range(&ppgtt->base, 0, ppgtt->base.vm.total);
+	}
 
 	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
+	if (err)
+		goto err_vma;
+
+	err = gen6_ppgtt_pin(&ppgtt->base);
 	if (err)
 		goto err_pd;
 
 	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
-			 ppgtt->node.size >> 20,
-			 ppgtt->node.start / PAGE_SIZE);
+			 ppgtt->vma->node.size >> 20,
+			 ppgtt->vma->node.start / PAGE_SIZE);
 
 	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
 			 ppgtt->base.pd.base.ggtt_offset << 10);
@@ -2015,6 +2068,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 err_pd:
 	gen6_ppgtt_free_pd(ppgtt);
+err_vma:
+	i915_vma_destroy(ppgtt->vma);
 err_scratch:
 	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
@@ -3533,6 +3588,7 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 	}
 
 	ggtt->vm.closed = false;
+	i915_ggtt_invalidate(dev_priv);
 
 	if (INTEL_GEN(dev_priv) >= 8) {
 		struct intel_ppat *ppat = &dev_priv->ppat;
@@ -3541,25 +3597,6 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv)
 		dev_priv->ppat.update_hw(dev_priv);
 		return;
 	}
-
-	if (USES_PPGTT(dev_priv)) {
-		struct i915_address_space *vm;
-
-		list_for_each_entry(vm, &dev_priv->vm_list, global_link) {
-			struct i915_hw_ppgtt *ppgtt;
-
-			if (i915_is_ggtt(vm))
-				ppgtt = dev_priv->mm.aliasing_ppgtt;
-			else
-				ppgtt = i915_vm_to_ppgtt(vm);
-			if (!ppgtt)
-				continue;
-
-			gen6_write_page_range(ppgtt, 0, ppgtt->vm.total);
-		}
-	}
-
-	i915_ggtt_invalidate(dev_priv);
 }
 
 static struct scatterlist *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 7d65d813eff0..6e9acd99ecc6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -410,7 +410,7 @@ struct i915_hw_ppgtt {
 struct gen6_hw_ppgtt {
 	struct i915_hw_ppgtt base;
 
-	struct drm_mm_node node;
+	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
 };
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 4321476a6a32..66a228931517 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -54,6 +54,7 @@ struct i915_vma {
 	struct reservation_object *resv; /** Alias of obj->resv */
 	struct sg_table *pages;
 	void __iomem *iomap;
+	void *private; /* owned by creator */
 	u64 size;
 	u64 display_alignment;
 	struct i915_page_sizes page_sizes;
@@ -340,6 +341,12 @@ static inline void i915_vma_unpin(struct i915_vma *vma)
 	__i915_vma_unpin(vma);
 }
 
+static inline bool i915_vma_is_bound(const struct i915_vma *vma,
+				     unsigned int where)
+{
+	return vma->flags & where;
+}
+
 /**
  * i915_vma_pin_iomap - calls ioremap_wc to map the GGTT VMA via the aperture
  * @vma: VMA to iomap
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/17] drm/i915/gtt: Only keep gen6 page directories pinned while active
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (8 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 09/17] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-10 19:43 ` [PATCH 11/17] drm/i915/gtt: Lazily allocate page directories for gen7 Chris Wilson
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

In order to be able to evict the gen6 ppgtt, we have to unpin it at some
point. We can simply use our context activity tracking to know when the
ppgtt is no longer in use by hardware, and so only keep it pinned while
being used a request.

For the kernel_context (and thus aliasing_ppgtt), it remains pinned at
all times, as the kernel_context itself is pinned at all times.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c     | 36 ++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem_gtt.h     |  5 ++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 28 +++++++++++++++++++
 3 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index c6949da3423f..317f27a9d78e 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1912,7 +1912,6 @@ static void gen6_ppgtt_cleanup(struct i915_address_space *vm)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 
-	i915_vma_unpin(ppgtt->vma);
 	i915_vma_destroy(ppgtt->vma);
 
 	gen6_ppgtt_free_pd(ppgtt);
@@ -1998,10 +1997,19 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size)
 	return vma;
 }
 
-static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
+int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 
+	/*
+	 * Workaround the limited maximum vma->pin_count and the aliasing_ppgtt
+	 * which will be pinned into every active context.
+	 * (When vma->pin_count becomes atomic, I expect we will naturally
+	 * need a larger, unpacked, type and kill this redundancy.)
+	 */
+	if (ppgtt->pin_count++)
+		return 0;
+
 	/*
 	 * PPGTT PDEs reside in the GGTT and consists of 512 entries. The
 	 * allocator works in address space sizes, so it's multiplied by page
@@ -2012,6 +2020,17 @@ static int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
 			    PIN_GLOBAL | PIN_HIGH);
 }
 
+void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+
+	GEM_BUG_ON(!ppgtt->pin_count);
+	if (--ppgtt->pin_count)
+		return;
+
+	i915_vma_unpin(ppgtt->vma);
+}
+
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 {
 	struct i915_ggtt * const ggtt = &i915->ggtt;
@@ -2053,21 +2072,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 	if (err)
 		goto err_vma;
 
-	err = gen6_ppgtt_pin(&ppgtt->base);
-	if (err)
-		goto err_pd;
-
-	DRM_DEBUG_DRIVER("Allocated pde space (%lldM) at GTT entry: %llx\n",
-			 ppgtt->vma->node.size >> 20,
-			 ppgtt->vma->node.start / PAGE_SIZE);
-
-	DRM_DEBUG_DRIVER("Adding PPGTT at offset %x\n",
-			 ppgtt->base.pd.base.ggtt_offset << 10);
-
 	return &ppgtt->base;
 
-err_pd:
-	gen6_ppgtt_free_pd(ppgtt);
 err_vma:
 	i915_vma_destroy(ppgtt->vma);
 err_scratch:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 6e9acd99ecc6..d7b7b4afe060 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -412,6 +412,8 @@ struct gen6_hw_ppgtt {
 
 	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
+
+	unsigned int pin_count;
 };
 
 #define __to_gen6_ppgtt(base) container_of(base, struct gen6_hw_ppgtt, base)
@@ -625,6 +627,9 @@ static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
 		kref_put(&ppgtt->ref, i915_ppgtt_release);
 }
 
+int gen6_ppgtt_pin(struct i915_hw_ppgtt *base);
+void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base);
+
 void i915_check_and_clear_faults(struct drm_i915_private *dev_priv);
 void i915_gem_suspend_gtt_mappings(struct drm_i915_private *dev_priv);
 void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 66183eb3c102..edb4682dee2f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1180,6 +1180,27 @@ static void intel_ring_context_destroy(struct intel_context *ce)
 		__i915_gem_object_release_unless_active(ce->state->obj);
 }
 
+static int __context_pin_ppgtt(struct i915_gem_context *ctx)
+{
+	struct i915_hw_ppgtt *ppgtt;
+	int err = 0;
+
+	ppgtt = ctx->ppgtt ?: ctx->i915->mm.aliasing_ppgtt;
+	if (ppgtt)
+		err = gen6_ppgtt_pin(ppgtt);
+
+	return err;
+}
+
+static void __context_unpin_ppgtt(struct i915_gem_context *ctx)
+{
+	struct i915_hw_ppgtt *ppgtt;
+
+	ppgtt = ctx->ppgtt ?: ctx->i915->mm.aliasing_ppgtt;
+	if (ppgtt)
+		gen6_ppgtt_unpin(ppgtt);
+}
+
 static int __context_pin(struct intel_context *ce)
 {
 	struct i915_vma *vma;
@@ -1228,6 +1249,7 @@ static void __context_unpin(struct intel_context *ce)
 
 static void intel_ring_context_unpin(struct intel_context *ce)
 {
+	__context_unpin_ppgtt(ce->gem_context);
 	__context_unpin(ce);
 
 	i915_gem_context_put(ce->gem_context);
@@ -1325,6 +1347,10 @@ __ring_context_pin(struct intel_engine_cs *engine,
 	if (err)
 		goto err;
 
+	err = __context_pin_ppgtt(ce->gem_context);
+	if (err)
+		goto err_unpin;
+
 	i915_gem_context_get(ctx);
 
 	/* One ringbuffer to rule them all */
@@ -1333,6 +1359,8 @@ __ring_context_pin(struct intel_engine_cs *engine,
 
 	return ce;
 
+err_unpin:
+	__context_unpin(ce);
 err:
 	ce->pin_count = 0;
 	return ERR_PTR(err);
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/17] drm/i915/gtt: Lazily allocate page directories for gen7
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (9 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 10/17] drm/i915/gtt: Only keep gen6 page directories pinned while active Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 11:03   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 12/17] drm/i915/gtt: Free unused page tables on unbind the context Chris Wilson
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

As we were only supporting aliasing_ppgtt on gen7 for some time, we
saved a few checks by preallocating the page directories on creation.
However, since we need 2MiB of page directories for each ppgtt, to
support arbitrary numbers of user contexts, we need to be more prudent
in our allocations, and defer the page allocation until it is used. We
don't recover unused pages yet as we found that doing so on the fly
(i.e. altering TLB entries) would confuse the GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 67 +++++++++++------------------
 1 file changed, 26 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 317f27a9d78e..3dc17793afac 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -190,11 +190,19 @@ int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv,
 	return 1;
 }
 
-static int gen6_ppgtt_bind_vma(struct i915_vma *vma,
-			       enum i915_cache_level cache_level,
-			       u32 unused)
+static int ppgtt_bind_vma(struct i915_vma *vma,
+			  enum i915_cache_level cache_level,
+			  u32 unused)
 {
 	u32 pte_flags;
+	int err;
+
+	if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
+		err = vma->vm->allocate_va_range(vma->vm,
+						 vma->node.start, vma->size);
+		if (err)
+			return err;
+	}
 
 	/* Currently applicable only to VLV */
 	pte_flags = 0;
@@ -206,22 +214,6 @@ static int gen6_ppgtt_bind_vma(struct i915_vma *vma,
 	return 0;
 }
 
-static int gen8_ppgtt_bind_vma(struct i915_vma *vma,
-			       enum i915_cache_level cache_level,
-			       u32 unused)
-{
-	int ret;
-
-	if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
-		ret = vma->vm->allocate_va_range(vma->vm,
-						 vma->node.start, vma->size);
-		if (ret)
-			return ret;
-	}
-
-	return gen6_ppgtt_bind_vma(vma, cache_level, unused);
-}
-
 static void ppgtt_unbind_vma(struct i915_vma *vma)
 {
 	vma->vm->clear_range(vma->vm, vma->node.start, vma->size);
@@ -1622,7 +1614,7 @@ static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915)
 	ppgtt->vm.cleanup = gen8_ppgtt_cleanup;
 	ppgtt->debug_dump = gen8_dump_ppgtt;
 
-	ppgtt->vm.vma_ops.bind_vma    = gen8_ppgtt_bind_vma;
+	ppgtt->vm.vma_ops.bind_vma    = ppgtt_bind_vma;
 	ppgtt->vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
 	ppgtt->vm.vma_ops.set_pages   = ppgtt_set_pages;
 	ppgtt->vm.vma_ops.clear_pages = clear_pages;
@@ -1776,7 +1768,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 
 		num_entries -= end - pte;
 
-		/* Note that the hw doesn't support removing PDE on the fly
+		/*
+		 * Note that the hw doesn't support removing PDE on the fly
 		 * (they are cached inside the context with no means to
 		 * invalidate the cache), so we can only reset the PTE
 		 * entries back to scratch.
@@ -2046,12 +2039,13 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 
 	ppgtt->base.vm.total = I915_PDES * GEN6_PTES * PAGE_SIZE;
 
+	ppgtt->base.vm.allocate_va_range = gen6_alloc_va_range;
 	ppgtt->base.vm.clear_range = gen6_ppgtt_clear_range;
 	ppgtt->base.vm.insert_entries = gen6_ppgtt_insert_entries;
 	ppgtt->base.vm.cleanup = gen6_ppgtt_cleanup;
 	ppgtt->base.debug_dump = gen6_dump_ppgtt;
 
-	ppgtt->base.vm.vma_ops.bind_vma    = gen6_ppgtt_bind_vma;
+	ppgtt->base.vm.vma_ops.bind_vma    = ppgtt_bind_vma;
 	ppgtt->base.vm.vma_ops.unbind_vma  = ppgtt_unbind_vma;
 	ppgtt->base.vm.vma_ops.set_pages   = ppgtt_set_pages;
 	ppgtt->base.vm.vma_ops.clear_pages = clear_pages;
@@ -2068,14 +2062,8 @@ static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 		goto err_scratch;
 	}
 
-	err = gen6_alloc_va_range(&ppgtt->base.vm, 0, ppgtt->base.vm.total);
-	if (err)
-		goto err_vma;
-
 	return &ppgtt->base;
 
-err_vma:
-	i915_vma_destroy(ppgtt->vma);
 err_scratch:
 	gen6_ppgtt_free_scratch(&ppgtt->base.vm);
 err_free:
@@ -2671,8 +2659,7 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 	if (flags & I915_VMA_LOCAL_BIND) {
 		struct i915_hw_ppgtt *appgtt = i915->mm.aliasing_ppgtt;
 
-		if (!(vma->flags & I915_VMA_LOCAL_BIND) &&
-		    appgtt->vm.allocate_va_range) {
+		if (!(vma->flags & I915_VMA_LOCAL_BIND)) {
 			ret = appgtt->vm.allocate_va_range(&appgtt->vm,
 							   vma->node.start,
 							   vma->size);
@@ -2776,17 +2763,15 @@ int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915)
 		goto err_ppgtt;
 	}
 
-	if (ppgtt->vm.allocate_va_range) {
-		/* Note we only pre-allocate as far as the end of the global
-		 * GTT. On 48b / 4-level page-tables, the difference is very,
-		 * very significant! We have to preallocate as GVT/vgpu does
-		 * not like the page directory disappearing.
-		 */
-		err = ppgtt->vm.allocate_va_range(&ppgtt->vm,
-						  0, ggtt->vm.total);
-		if (err)
-			goto err_ppgtt;
-	}
+	/*
+	 * Note we only pre-allocate as far as the end of the global
+	 * GTT. On 48b / 4-level page-tables, the difference is very,
+	 * very significant! We have to preallocate as GVT/vgpu does
+	 * not like the page directory disappearing.
+	 */
+	err = ppgtt->vm.allocate_va_range(&ppgtt->vm, 0, ggtt->vm.total);
+	if (err)
+		goto err_ppgtt;
 
 	i915->mm.aliasing_ppgtt = ppgtt;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/17] drm/i915/gtt: Free unused page tables on unbind the context
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (10 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 11/17] drm/i915/gtt: Lazily allocate page directories for gen7 Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11 11:09   ` Joonas Lahtinen
  2018-06-10 19:43 ` [PATCH 13/17] drm/i915/gtt: Skip initializing PT with scratch if full Chris Wilson
                   ` (8 subsequent siblings)
  20 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

As we cannot reliably change used page tables while the context is
active, the earliest opportunity we have to recover excess pages is when
the context becomes idle. So whenever we unbind the context (it must be
idle, and indeed being evicted) free the unused ptes.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 44 +++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 3dc17793afac..8883e75cb594 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1753,20 +1753,28 @@ static void gen6_ppgtt_enable(struct drm_i915_private *dev_priv)
 static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 				   u64 start, u64 length)
 {
-	struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm);
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm));
 	unsigned int first_entry = start >> PAGE_SHIFT;
 	unsigned int pde = first_entry / GEN6_PTES;
 	unsigned int pte = first_entry % GEN6_PTES;
 	unsigned int num_entries = length >> PAGE_SHIFT;
-	gen6_pte_t scratch_pte =
+	const gen6_pte_t scratch_pte =
 		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
 
 	while (num_entries) {
-		struct i915_page_table *pt = ppgtt->pd.page_table[pde++];
-		unsigned int end = min(pte + num_entries, GEN6_PTES);
+		struct i915_page_table *pt = ppgtt->base.pd.page_table[pde++];
+		const unsigned int end = min(pte + num_entries, GEN6_PTES);
+		const unsigned int count = end - pte;
 		gen6_pte_t *vaddr;
 
-		num_entries -= end - pte;
+		GEM_BUG_ON(pt == vm->scratch_pt);
+
+		num_entries -= count;
+
+		GEM_BUG_ON(count > pt->used_ptes);
+		pt->used_ptes -= count;
+		if (!pt->used_ptes)
+			ppgtt->scan_for_unused_pt = true;
 
 		/*
 		 * Note that the hw doesn't support removing PDE on the fly
@@ -1798,6 +1806,8 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm,
 	struct sgt_dma iter = sgt_dma(vma);
 	gen6_pte_t *vaddr;
 
+	GEM_BUG_ON(ppgtt->pd.page_table[act_pt] == vm->scratch_pt);
+
 	vaddr = kmap_atomic_px(ppgtt->pd.page_table[act_pt]);
 	do {
 		vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma);
@@ -1833,6 +1843,8 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 	bool flush = false;
 
 	gen6_for_each_pde(pt, &ppgtt->base.pd, start, length, pde) {
+		const unsigned int count = gen6_pte_count(start, length);
+
 		if (pt == vm->scratch_pt) {
 			pt = alloc_pt(vm);
 			if (IS_ERR(pt))
@@ -1846,7 +1858,11 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				gen6_write_pde(ppgtt, pde, pt);
 				flush = true;
 			}
+
+			GEM_BUG_ON(pt->used_ptes);
 		}
+
+		pt->used_ptes += count;
 	}
 
 	if (flush) {
@@ -1948,6 +1964,24 @@ static int pd_vma_bind(struct i915_vma *vma,
 
 static void pd_vma_unbind(struct i915_vma *vma)
 {
+	struct gen6_hw_ppgtt *ppgtt = vma->private;
+	struct i915_page_table * const scratch_pt = ppgtt->base.vm.scratch_pt;
+	struct i915_page_table *pt;
+	unsigned int pde;
+
+	if (!ppgtt->scan_for_unused_pt)
+		return;
+
+	/* Free all no longer used page tables */
+	gen6_for_all_pdes(pt, &ppgtt->base.pd, pde) {
+		if (pt->used_ptes || pt == scratch_pt)
+			continue;
+
+		free_pt(&ppgtt->base.vm, pt);
+		ppgtt->base.pd.page_table[pde] = scratch_pt;
+	}
+
+	ppgtt->scan_for_unused_pt = false;
 }
 
 static const struct i915_vma_ops pd_vma_ops = {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index d7b7b4afe060..46c95d188580 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -414,6 +414,7 @@ struct gen6_hw_ppgtt {
 	gen6_pte_t __iomem *pd_addr;
 
 	unsigned int pin_count;
+	bool scan_for_unused_pt;
 };
 
 #define __to_gen6_ppgtt(base) container_of(base, struct gen6_hw_ppgtt, base)
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/17] drm/i915/gtt: Skip initializing PT with scratch if full
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (11 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 12/17] drm/i915/gtt: Free unused page tables on unbind the context Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-10 19:43 ` [PATCH 14/17] drm/i915/gtt: Cache the PTE encoding of the scratch page Chris Wilson
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

If we will completely overwrite the PT with PTEs for the object, we can
forgo filling it with scratch entries.

References: 14826673247e ("drm/i915: Only initialize partially filled pagetables")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 8883e75cb594..fdd34b4af7e1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1850,7 +1850,8 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 			if (IS_ERR(pt))
 				goto unwind_out;
 
-			gen6_initialize_pt(vm, pt);
+			if (count < GEN6_PTES)
+				gen6_initialize_pt(vm, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
 
 			if (i915_vma_is_bound(ppgtt->vma,
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/17] drm/i915/gtt: Cache the PTE encoding of the scratch page
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (12 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 13/17] drm/i915/gtt: Skip initializing PT with scratch if full Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-10 19:43 ` [PATCH 15/17] drm/i915/gtt: Reduce a pair of runtime asserts Chris Wilson
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

As the most frequent PTE encoding is for the scratch page, cache it upon
creation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 20 ++++++++++----------
 drivers/gpu/drm/i915/i915_gem_gtt.h |  1 +
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index fdd34b4af7e1..358c3ee4b302 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -640,11 +640,10 @@ static void gen8_initialize_pt(struct i915_address_space *vm,
 		gen8_pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC));
 }
 
-static void gen6_initialize_pt(struct i915_address_space *vm,
+static void gen6_initialize_pt(struct gen6_hw_ppgtt *ppgtt,
 			       struct i915_page_table *pt)
 {
-	fill32_px(vm, pt,
-		  vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0));
+	fill32_px(&ppgtt->base.vm, pt, ppgtt->scratch_pte);
 }
 
 static struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
@@ -1631,9 +1630,7 @@ static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915)
 static void gen6_dump_ppgtt(struct i915_hw_ppgtt *base, struct seq_file *m)
 {
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
-	struct i915_address_space *vm = &base->vm;
-	const gen6_pte_t scratch_pte =
-		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	const gen6_pte_t scratch_pte = ppgtt->scratch_pte;
 	struct i915_page_table *pt;
 	u32 pte, pde;
 
@@ -1758,8 +1755,7 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm,
 	unsigned int pde = first_entry / GEN6_PTES;
 	unsigned int pte = first_entry % GEN6_PTES;
 	unsigned int num_entries = length >> PAGE_SHIFT;
-	const gen6_pte_t scratch_pte =
-		vm->pte_encode(vm->scratch_page.daddr, I915_CACHE_LLC, 0);
+	const gen6_pte_t scratch_pte = ppgtt->scratch_pte;
 
 	while (num_entries) {
 		struct i915_page_table *pt = ppgtt->base.pd.page_table[pde++];
@@ -1851,7 +1847,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm,
 				goto unwind_out;
 
 			if (count < GEN6_PTES)
-				gen6_initialize_pt(vm, pt);
+				gen6_initialize_pt(ppgtt, pt);
 			ppgtt->base.pd.page_table[pde] = pt;
 
 			if (i915_vma_is_bound(ppgtt->vma,
@@ -1889,13 +1885,17 @@ static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt)
 	if (ret)
 		return ret;
 
+	ppgtt->scratch_pte =
+		vm->pte_encode(vm->scratch_page.daddr,
+			       I915_CACHE_NONE, PTE_READ_ONLY);
+
 	vm->scratch_pt = alloc_pt(vm);
 	if (IS_ERR(vm->scratch_pt)) {
 		cleanup_scratch_page(vm);
 		return PTR_ERR(vm->scratch_pt);
 	}
 
-	gen6_initialize_pt(vm, vm->scratch_pt);
+	gen6_initialize_pt(ppgtt, vm->scratch_pt);
 	gen6_for_all_pdes(unused, &ppgtt->base.pd, pde)
 		ppgtt->base.pd.page_table[pde] = vm->scratch_pt;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 46c95d188580..7ff5cc612771 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -412,6 +412,7 @@ struct gen6_hw_ppgtt {
 
 	struct i915_vma *vma;
 	gen6_pte_t __iomem *pd_addr;
+	gen6_pte_t scratch_pte;
 
 	unsigned int pin_count;
 	bool scan_for_unused_pt;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 15/17] drm/i915/gtt: Reduce a pair of runtime asserts
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (13 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 14/17] drm/i915/gtt: Cache the PTE encoding of the scratch page Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-10 19:43 ` [PATCH 16/17] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt Chris Wilson
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

We can stop asserting using WARN_ON as given sufficient CI coverage, we
can rely on using GEM_BUG_ON() to catch problems before merging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 358c3ee4b302..e5381d6dfbc8 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2793,7 +2793,7 @@ int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915)
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
-	if (WARN_ON(ppgtt->vm.total < ggtt->vm.total)) {
+	if (GEM_WARN_ON(ppgtt->vm.total < ggtt->vm.total)) {
 		err = -ENODEV;
 		goto err_ppgtt;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 7ff5cc612771..4f6f4595c9e9 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -464,8 +464,8 @@ static inline u32 i915_pte_count(u64 addr, u64 length, unsigned int pde_shift)
 	const u64 mask = ~((1ULL << pde_shift) - 1);
 	u64 end;
 
-	WARN_ON(length == 0);
-	WARN_ON(offset_in_page(addr|length));
+	GEM_BUG_ON(length == 0);
+	GEM_BUG_ON(offset_in_page(addr|length));
 
 	end = addr + length;
 
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 16/17] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (14 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 15/17] drm/i915/gtt: Reduce a pair of runtime asserts Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-11  9:28   ` Matthew Auld
  2018-06-10 19:43 ` [PATCH 17/17] drm/i915/gtt: Enable full-ppgtt by default everywhere Chris Wilson
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

If we know that the user cannot access the GGTT, by virtue of having a
segregated memory area, we can skip clearing the unused entries as they
cannot be accessed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e5381d6dfbc8..5795934918ab 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3383,7 +3383,9 @@ static int gen6_gmch_probe(struct i915_ggtt *ggtt)
 	size = gen6_get_total_gtt_size(snb_gmch_ctl);
 	ggtt->vm.total = (size / sizeof(gen6_pte_t)) << PAGE_SHIFT;
 
-	ggtt->vm.clear_range = gen6_ggtt_clear_range;
+	ggtt->vm.clear_range = nop_clear_range;
+	if (!USES_FULL_PPGTT(dev_priv) || intel_scanout_needs_vtd_wa(dev_priv))
+		ggtt->vm.clear_range = gen6_ggtt_clear_range;
 	ggtt->vm.insert_page = gen6_ggtt_insert_page;
 	ggtt->vm.insert_entries = gen6_ggtt_insert_entries;
 	ggtt->vm.cleanup = gen6_gmch_remove;
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 17/17] drm/i915/gtt: Enable full-ppgtt by default everywhere
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (15 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 16/17] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt Chris Wilson
@ 2018-06-10 19:43 ` Chris Wilson
  2018-06-10 20:00 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/17] drm/i915: Apply batch location restrictions before pinning Patchwork
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-10 19:43 UTC (permalink / raw)
  To: intel-gfx

We should we have all the kinks worked out and full-ppgtt now works
reliably on gen7 (Ivybridge, Valleyview/Baytrail and Haswell). If we can
let userspace have full control over their own ppgtt, it makes softpinning
far more effective, in turn making GPU dispatch far more efficient and
more secure (due to better mm segregation).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 5795934918ab..4f363bb140dc 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -179,13 +179,11 @@ int intel_sanitize_enable_ppgtt(struct drm_i915_private *dev_priv,
 		return 0;
 	}
 
-	if (HAS_LOGICAL_RING_CONTEXTS(dev_priv)) {
-		if (has_full_48bit_ppgtt)
-			return 3;
+	if (has_full_48bit_ppgtt)
+		return 3;
 
-		if (has_full_ppgtt)
-			return 2;
-	}
+	if (has_full_ppgtt)
+		return 2;
 
 	return 1;
 }
-- 
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/17] drm/i915: Apply batch location restrictions before pinning
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (16 preceding siblings ...)
  2018-06-10 19:43 ` [PATCH 17/17] drm/i915/gtt: Enable full-ppgtt by default everywhere Chris Wilson
@ 2018-06-10 20:00 ` Patchwork
  2018-06-10 20:05 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2018-06-10 20:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/17] drm/i915: Apply batch location restrictions before pinning
URL   : https://patchwork.freedesktop.org/series/44539/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
17cecb85f303 drm/i915: Apply batch location restrictions before pinning
-:30: WARNING:UNSPECIFIED_INT: Prefer 'unsigned int' to bare use of 'unsigned'
#30: FILE: drivers/gpu/drm/i915/i915_gem_execbuffer.c:493:
+	   unsigned int i, unsigned batch_idx,

total: 0 errors, 1 warnings, 0 checks, 86 lines checked
a01d03491a31 drm/i915/ringbuffer: Brute force context restore
47f79a76e877 drm/i915/ringbuffer: Fix context restore upon reset
82fd75bde6dd drm/i915: Wrap around the tail offset before setting ring->tail
eb6f6414086e drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
99a2bbbf20d1 drm/i915/ringbuffer: Serialize load of PD_DIR
483cfe1c9f3a drm/i915/gtt: Subclass gen6_hw_ppgtt
-:290: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'base' - possible side-effects?
#290: FILE: drivers/gpu/drm/i915/i915_gem_gtt.h:417:
+#define __to_gen6_ppgtt(base) container_of(base, struct gen6_hw_ppgtt, base)

total: 0 errors, 0 warnings, 1 checks, 294 lines checked
2e7a6a454a92 drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
314ece1f1246 drm/i915/gtt: Make gen6 page directories evictable
ac8215c1f544 drm/i915/gtt: Only keep gen6 page directories pinned while active
651e6ba71483 drm/i915/gtt: Lazily allocate page directories for gen7
5e6d68c4ceb4 drm/i915/gtt: Free unused page tables on unbind the context
e699cee5e36d drm/i915/gtt: Skip initializing PT with scratch if full
-:9: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#9: 
References: 14826673247e ("drm/i915: Only initialize partially filled pagetables")

-:9: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 14826673247e ("drm/i915: Only initialize partially filled pagetables")'
#9: 
References: 14826673247e ("drm/i915: Only initialize partially filled pagetables")

total: 1 errors, 1 warnings, 0 checks, 9 lines checked
13b6bde56af2 drm/i915/gtt: Cache the PTE encoding of the scratch page
1648f5cc38eb drm/i915/gtt: Reduce a pair of runtime asserts
-:39: CHECK:SPACING: spaces preferred around that '|' (ctx:VxV)
#39: FILE: drivers/gpu/drm/i915/i915_gem_gtt.h:468:
+	GEM_BUG_ON(offset_in_page(addr|length));
 	                              ^

total: 0 errors, 0 warnings, 1 checks, 18 lines checked
3f18536c8f2a drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
77568d77af36 drm/i915/gtt: Enable full-ppgtt by default everywhere

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/17] drm/i915: Apply batch location restrictions before pinning
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (17 preceding siblings ...)
  2018-06-10 20:00 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/17] drm/i915: Apply batch location restrictions before pinning Patchwork
@ 2018-06-10 20:05 ` Patchwork
  2018-06-10 20:18 ` ✓ Fi.CI.BAT: success " Patchwork
  2018-06-10 21:10 ` ✓ Fi.CI.IGT: " Patchwork
  20 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2018-06-10 20:05 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/17] drm/i915: Apply batch location restrictions before pinning
URL   : https://patchwork.freedesktop.org/series/44539/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Commit: drm/i915: Apply batch location restrictions before pinning
Okay!

Commit: drm/i915/ringbuffer: Brute force context restore
Okay!

Commit: drm/i915/ringbuffer: Fix context restore upon reset
Okay!

Commit: drm/i915: Wrap around the tail offset before setting ring->tail
Okay!

Commit: drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
Okay!

Commit: drm/i915/ringbuffer: Serialize load of PD_DIR
Okay!

Commit: drm/i915/gtt: Subclass gen6_hw_ppgtt
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1649:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1649:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1708:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1708:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1709:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1709:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1852:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1852:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1970:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1970:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1853:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1853:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1970:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1970:9: warning: expression using sizeof(void)

Commit: drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
Okay!

Commit: drm/i915/gtt: Make gen6 page directories evictable
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1709:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1709:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1961:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1961:9: warning: expression using sizeof(void)

Commit: drm/i915/gtt: Only keep gen6 page directories pinned while active
Okay!

Commit: drm/i915/gtt: Lazily allocate page directories for gen7
Okay!

Commit: drm/i915/gtt: Free unused page tables on unbind the context
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1766:36: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1835:9: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/i915_gem_gtt.c:1835:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1766:42: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1845:9: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem_gtt.c:1845:9: warning: expression using sizeof(void)

Commit: drm/i915/gtt: Skip initializing PT with scratch if full
Okay!

Commit: drm/i915/gtt: Cache the PTE encoding of the scratch page
Okay!

Commit: drm/i915/gtt: Reduce a pair of runtime asserts
Okay!

Commit: drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
Okay!

Commit: drm/i915/gtt: Enable full-ppgtt by default everywhere
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/17] drm/i915: Apply batch location restrictions before pinning
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (18 preceding siblings ...)
  2018-06-10 20:05 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2018-06-10 20:18 ` Patchwork
  2018-06-10 21:10 ` ✓ Fi.CI.IGT: " Patchwork
  20 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2018-06-10 20:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/17] drm/i915: Apply batch location restrictions before pinning
URL   : https://patchwork.freedesktop.org/series/44539/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4298 -> Patchwork_9252 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/44539/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_9252 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_exec_suspend@basic-s4-devices:
      fi-kbl-7500u:       PASS -> DMESG-WARN (fdo#105128)

    igt@kms_flip@basic-flip-vs-modeset:
      fi-glk-j4005:       PASS -> DMESG-WARN (fdo#106097, fdo#106000)

    igt@kms_flip@basic-flip-vs-wf_vblank:
      fi-glk-j4005:       PASS -> FAIL (fdo#100368)

    
    ==== Possible fixes ====

    igt@gem_mmap_gtt@basic-small-bo-tiledx:
      fi-gdg-551:         FAIL (fdo#102575) -> PASS

    igt@kms_pipe_crc_basic@read-crc-pipe-c-frame-sequence:
      fi-glk-j4005:       FAIL (fdo#103481) -> PASS

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      fi-kbl-7567u:       FAIL (fdo#104724, fdo#103191) -> PASS

    
  fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
  fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103481 https://bugs.freedesktop.org/show_bug.cgi?id=103481
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105128 https://bugs.freedesktop.org/show_bug.cgi?id=105128
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106097 https://bugs.freedesktop.org/show_bug.cgi?id=106097


== Participating hosts (39 -> 34) ==

  Missing    (5): fi-hsw-peppy fi-ilk-m540 fi-byt-squawks fi-skl-6700hq fi-hsw-4200u 


== Build changes ==

    * Linux: CI_DRM_4298 -> Patchwork_9252

  CI_DRM_4298: 104ae0f882a6a384617f5d06e317fbe0aa50d7df @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4513: 7b6838781441cfbc7f6c18f421f127dfb02b44cf @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9252: 77568d77af362c84b147f0a1734d41a4cd35d9a4 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

77568d77af36 drm/i915/gtt: Enable full-ppgtt by default everywhere
3f18536c8f2a drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
1648f5cc38eb drm/i915/gtt: Reduce a pair of runtime asserts
13b6bde56af2 drm/i915/gtt: Cache the PTE encoding of the scratch page
e699cee5e36d drm/i915/gtt: Skip initializing PT with scratch if full
5e6d68c4ceb4 drm/i915/gtt: Free unused page tables on unbind the context
651e6ba71483 drm/i915/gtt: Lazily allocate page directories for gen7
ac8215c1f544 drm/i915/gtt: Only keep gen6 page directories pinned while active
314ece1f1246 drm/i915/gtt: Make gen6 page directories evictable
2e7a6a454a92 drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
483cfe1c9f3a drm/i915/gtt: Subclass gen6_hw_ppgtt
99a2bbbf20d1 drm/i915/ringbuffer: Serialize load of PD_DIR
eb6f6414086e drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
82fd75bde6dd drm/i915: Wrap around the tail offset before setting ring->tail
47f79a76e877 drm/i915/ringbuffer: Fix context restore upon reset
a01d03491a31 drm/i915/ringbuffer: Brute force context restore
17cecb85f303 drm/i915: Apply batch location restrictions before pinning

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9252/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* ✓ Fi.CI.IGT: success for series starting with [01/17] drm/i915: Apply batch location restrictions before pinning
  2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
                   ` (19 preceding siblings ...)
  2018-06-10 20:18 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2018-06-10 21:10 ` Patchwork
  20 siblings, 0 replies; 44+ messages in thread
From: Patchwork @ 2018-06-10 21:10 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/17] drm/i915: Apply batch location restrictions before pinning
URL   : https://patchwork.freedesktop.org/series/44539/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_4298_full -> Patchwork_9252_full =

== Summary - WARNING ==

  Minor unknown changes coming with Patchwork_9252_full need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9252_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9252_full:

  === IGT changes ===

    ==== Warnings ====

    igt@gem_exec_schedule@deep-bsd2:
      shard-kbl:          SKIP -> PASS +2

    igt@gem_mocs_settings@mocs-rc6-blt:
      shard-kbl:          PASS -> SKIP

    igt@gem_ppgtt@flink-and-close-vma-leak:
      shard-hsw:          SKIP -> PASS +2

    igt@pm_rc6_residency@rc6-accuracy:
      shard-snb:          SKIP -> PASS

    
== Known issues ==

  Here are the changes found in Patchwork_9252_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_hangcheck:
      shard-apl:          PASS -> DMESG-FAIL (fdo#106560)

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking-fencing:
      shard-glk:          PASS -> FAIL (fdo#105703)

    igt@kms_cursor_crc@cursor-64x64-suspend:
      shard-glk:          PASS -> FAIL (fdo#103375)

    igt@kms_flip@flip-vs-expired-vblank:
      shard-glk:          PASS -> FAIL (fdo#102887)

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-glk:          PASS -> FAIL (fdo#105363)

    igt@kms_frontbuffer_tracking@fbc-2p-primscrn-pri-shrfb-draw-mmap-wc:
      shard-glk:          PASS -> FAIL (fdo#103167, fdo#104724)

    igt@kms_setmode@basic:
      shard-apl:          PASS -> FAIL (fdo#99912)

    
    ==== Possible fixes ====

    igt@gem_eio@hibernate:
      shard-snb:          INCOMPLETE (fdo#105411) -> PASS

    igt@gem_eio@suspend:
      shard-snb:          FAIL (fdo#105957) -> PASS

    igt@kms_flip@2x-flip-vs-expired-vblank:
      shard-hsw:          FAIL (fdo#105363) -> PASS

    igt@kms_flip_tiling@flip-to-x-tiled:
      shard-glk:          FAIL (fdo#104724) -> PASS +1

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
      shard-kbl:          INCOMPLETE (fdo#103665) -> PASS

    igt@kms_vblank@pipe-b-accuracy-idle:
      shard-hsw:          FAIL (fdo#102583) -> PASS

    
  fdo#102583 https://bugs.freedesktop.org/show_bug.cgi?id=102583
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#104724 https://bugs.freedesktop.org/show_bug.cgi?id=104724
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105703 https://bugs.freedesktop.org/show_bug.cgi?id=105703
  fdo#105957 https://bugs.freedesktop.org/show_bug.cgi?id=105957
  fdo#106560 https://bugs.freedesktop.org/show_bug.cgi?id=106560
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (5 -> 5) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_4298 -> Patchwork_9252

  CI_DRM_4298: 104ae0f882a6a384617f5d06e317fbe0aa50d7df @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4513: 7b6838781441cfbc7f6c18f421f127dfb02b44cf @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_9252: 77568d77af362c84b147f0a1734d41a4cd35d9a4 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9252/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 16/17] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt
  2018-06-10 19:43 ` [PATCH 16/17] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt Chris Wilson
@ 2018-06-11  9:28   ` Matthew Auld
  0 siblings, 0 replies; 44+ messages in thread
From: Matthew Auld @ 2018-06-11  9:28 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On 10 June 2018 at 20:43, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> If we know that the user cannot access the GGTT, by virtue of having a
> segregated memory area, we can skip clearing the unused entries as they
> cannot be accessed.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/17] drm/i915: Apply batch location restrictions before pinning
  2018-06-10 19:43 ` [PATCH 01/17] drm/i915: Apply batch location restrictions before pinning Chris Wilson
@ 2018-06-11  9:57   ` Joonas Lahtinen
  2018-06-11 10:08     ` Chris Wilson
  0 siblings, 1 reply; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11  9:57 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:09)
> We special case the position of the batch within the GTT to prevent
> negative self-relocation deltas from underflowing. However, that
> restriction is being applied after a trial pin of the batch in its
> current position. Thus we are not rejecting an invalid location if the
> batch has been before, leading to an assertion if we happen to need to

"has been used/bound/pinned/whatever"?

> rearrange the entire payload. In the worst case, this may cause a GPU
> hang on gen7 or perhaps missing state.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=105720
> Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Martin Peres <martin.peres@linux.intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore
  2018-06-10 19:43 ` [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore Chris Wilson
@ 2018-06-11 10:00   ` Tvrtko Ursulin
  2018-06-11 10:04     ` Chris Wilson
  2018-06-11 10:07   ` Mika Kuoppala
  2018-06-11 10:11   ` Joonas Lahtinen
  2 siblings, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-06-11 10:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 10/06/2018 20:43, Chris Wilson wrote:
> An issue encountered with switching mm on gen7 is that the GPU likes to
> hang (with the VS unit busy) when told to force restore the current
> context. We can simply workaround this by substituting the
> MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
> forcing the context to be saved and restored; thereby reloading the
> PP_DIR registers and updating the modified page directory!
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> ---
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++++++++++++++++---
>   1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 65811e2fa7da..6bfa6030198d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1458,6 +1458,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   		(HAS_LEGACY_SEMAPHORES(i915) && IS_GEN7(i915)) ?
>   		INTEL_INFO(i915)->num_rings - 1 :
>   		0;
> +	bool force_restore = false;
>   	int len;
>   	u32 *cs;
>   
> @@ -1471,6 +1472,12 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   	len = 4;
>   	if (IS_GEN7(i915))
>   		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
> +	if (flags & MI_FORCE_RESTORE) {
> +		GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
> +		flags &= ~MI_FORCE_RESTORE;
> +		force_restore = true;
> +		len += 2;
> +	}
>   
>   	cs = intel_ring_begin(rq, len);
>   	if (IS_ERR(cs))
> @@ -1495,6 +1502,21 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   		}
>   	}
>   
> +	if (force_restore) {
> +		/*
> +		 * The HW doesn't handle being told to restore the current
> +		 * context very well. Quite often it likes goes to go off and
> +		 * sulk, especially when it is meant to be reloading PP_DIR.
> +		 * A very simple fix to force the reload is to simply switch
> +		 * away from the current context and back again.
> +		 */
> +		*cs++ = MI_SET_CONTEXT;
> +		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
> +							  engine)->state) |
> +			MI_MM_SPACE_GTT |
> +			MI_RESTORE_INHIBIT;
> +	}
> +
>   	*cs++ = MI_NOOP;
>   	*cs++ = MI_SET_CONTEXT;
>   	*cs++ = i915_ggtt_offset(rq->hw_context->state) | flags;
> @@ -1585,11 +1607,14 @@ static int switch_context(struct i915_request *rq)
>   
>   		to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
>   		engine->legacy_active_ppgtt = to_mm;
> -		hw_flags = MI_FORCE_RESTORE;
> +
> +		if (to_ctx == from_ctx) {

Contexts can be the same here , when the parent condition is "if (to_mm 
!= from_mm || to_mm ...)" ?

> +			hw_flags = MI_FORCE_RESTORE;
> +			from_ctx = NULL;

Now on the error path we can end up with engine->legacy_active_context 
== NULL, but commands to switch have been put to the ring. Is that OK?

> +		}
>   	}
>   
> -	if (rq->hw_context->state &&
> -	    (to_ctx != from_ctx || hw_flags & MI_FORCE_RESTORE)) {
> +	if (rq->hw_context->state && to_ctx != from_ctx) {
>   		GEM_BUG_ON(engine->id != RCS);
>   
>   		/*
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore
  2018-06-11 10:00   ` Tvrtko Ursulin
@ 2018-06-11 10:04     ` Chris Wilson
  2018-06-11 10:23       ` Tvrtko Ursulin
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-11 10:04 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-06-11 11:00:31)
> 
> On 10/06/2018 20:43, Chris Wilson wrote:
> > @@ -1585,11 +1607,14 @@ static int switch_context(struct i915_request *rq)
> >   
> >               to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
> >               engine->legacy_active_ppgtt = to_mm;
> > -             hw_flags = MI_FORCE_RESTORE;
> > +
> > +             if (to_ctx == from_ctx) {
> 
> Contexts can be the same here , when the parent condition is "if (to_mm 
> != from_mm || to_mm ...)" ?
> 
> > +                     hw_flags = MI_FORCE_RESTORE;
> > +                     from_ctx = NULL;
> 
> Now on the error path we can end up with engine->legacy_active_context 
> == NULL, but commands to switch have been put to the ring. Is that OK?

On the error path, we unwind the commands in the ring and will overwrite
them with the next request (see i915_request_alloc() err_unwind:)

So leaving legacy_active_context = NULL here just ensures we emit a
context switch next time, no matter what. That is no bad thing as the HW
ignores redundant MI_SET_CONTEXT (and requires the FORCE_RESTORE flag to
force a redundant reload).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore
  2018-06-10 19:43 ` [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore Chris Wilson
  2018-06-11 10:00   ` Tvrtko Ursulin
@ 2018-06-11 10:07   ` Mika Kuoppala
  2018-06-11 10:11   ` Joonas Lahtinen
  2 siblings, 0 replies; 44+ messages in thread
From: Mika Kuoppala @ 2018-06-11 10:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> An issue encountered with switching mm on gen7 is that the GPU likes to
> hang (with the VS unit busy) when told to force restore the current
> context. We can simply workaround this by substituting the
> MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
> forcing the context to be saved and restored; thereby reloading the
> PP_DIR registers and updating the modified page directory!
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 31 ++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 65811e2fa7da..6bfa6030198d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1458,6 +1458,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  		(HAS_LEGACY_SEMAPHORES(i915) && IS_GEN7(i915)) ?
>  		INTEL_INFO(i915)->num_rings - 1 :
>  		0;
> +	bool force_restore = false;
>  	int len;
>  	u32 *cs;
>  
> @@ -1471,6 +1472,12 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  	len = 4;
>  	if (IS_GEN7(i915))
>  		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
> +	if (flags & MI_FORCE_RESTORE) {
> +		GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
> +		flags &= ~MI_FORCE_RESTORE;
> +		force_restore = true;
> +		len += 2;
> +	}
>  
>  	cs = intel_ring_begin(rq, len);
>  	if (IS_ERR(cs))
> @@ -1495,6 +1502,21 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  		}
>  	}
>  
> +	if (force_restore) {
> +		/*
> +		 * The HW doesn't handle being told to restore the current
> +		 * context very well. Quite often it likes goes to go off and
> +		 * sulk, especially when it is meant to be reloading PP_DIR.
> +		 * A very simple fix to force the reload is to simply switch
> +		 * away from the current context and back again.
> +		 */
> +		*cs++ = MI_SET_CONTEXT;
> +		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
> +							  engine)->state) |
> +			MI_MM_SPACE_GTT |
> +			MI_RESTORE_INHIBIT;

The above comment could be more verbose about the INHIBIT flag,
like we discussed in irc. We dont actually restore the kernel
context image state and we will trample it with current ctx.

But as we don't ever run anything through kernel_context,
this should be fine and creating an another context just
for switching through seems overkill.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


> +	}
> +
>  	*cs++ = MI_NOOP;
>  	*cs++ = MI_SET_CONTEXT;
>  	*cs++ = i915_ggtt_offset(rq->hw_context->state) | flags;
> @@ -1585,11 +1607,14 @@ static int switch_context(struct i915_request *rq)
>  
>  		to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
>  		engine->legacy_active_ppgtt = to_mm;
> -		hw_flags = MI_FORCE_RESTORE;
> +
> +		if (to_ctx == from_ctx) {
> +			hw_flags = MI_FORCE_RESTORE;
> +			from_ctx = NULL;
> +		}
>  	}
>  
> -	if (rq->hw_context->state &&
> -	    (to_ctx != from_ctx || hw_flags & MI_FORCE_RESTORE)) {
> +	if (rq->hw_context->state && to_ctx != from_ctx) {
>  		GEM_BUG_ON(engine->id != RCS);
>  
>  		/*
> -- 
> 2.17.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 01/17] drm/i915: Apply batch location restrictions before pinning
  2018-06-11  9:57   ` Joonas Lahtinen
@ 2018-06-11 10:08     ` Chris Wilson
  0 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-11 10:08 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

Quoting Joonas Lahtinen (2018-06-11 10:57:26)
> Quoting Chris Wilson (2018-06-10 22:43:09)
> > We special case the position of the batch within the GTT to prevent
> > negative self-relocation deltas from underflowing. However, that
> > restriction is being applied after a trial pin of the batch in its
> > current position. Thus we are not rejecting an invalid location if the
> > batch has been before, leading to an assertion if we happen to need to
> 
> "has been used/bound/pinned/whatever"?
> 
> > rearrange the entire payload. In the worst case, this may cause a GPU
> > hang on gen7 or perhaps missing state.
> > 
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=105720
> > Fixes: 2889caa92321 ("drm/i915: Eliminate lots of iterations over the execobjects array")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Martin Peres <martin.peres@linux.intel.com>
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Fixed up and pushed, to see if it does indeed allow us to make some
progress on the bug. Thanks for the review,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore
  2018-06-10 19:43 ` [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore Chris Wilson
  2018-06-11 10:00   ` Tvrtko Ursulin
  2018-06-11 10:07   ` Mika Kuoppala
@ 2018-06-11 10:11   ` Joonas Lahtinen
  2 siblings, 0 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 10:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:10)
> An issue encountered with switching mm on gen7 is that the GPU likes to
> hang (with the VS unit busy) when told to force restore the current
> context. We can simply workaround this by substituting the
> MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
> forcing the context to be saved and restored; thereby reloading the
> PP_DIR registers and updating the modified page directory!
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>

Commit message checks out with code, and as you obviously have it
crunching through the testsuites, must be the right thing to do.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR
  2018-06-10 19:43 ` [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR Chris Wilson
@ 2018-06-11 10:12   ` Chris Wilson
  2018-06-11 10:43   ` Joonas Lahtinen
  1 sibling, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-11 10:12 UTC (permalink / raw)
  To: intel-gfx

Quoting Chris Wilson (2018-06-10 20:43:14)
> After triggering the mm switch with a load of PD_DIR, which may be
> deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands
> with that load by posting a read of PD_DIR (or else those subsequent
> commands may access the stale page tables).

My stress test failed over night. Much improved MTBF, but not eliminated
yet. Next step is to follow the STORE_REG_MEM with an
engine->emit_flush(EMIT_INVALIDATE).

I think we are indeed on the right track, and that these are much more
than random delays.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail
  2018-06-10 19:43 ` [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail Chris Wilson
@ 2018-06-11 10:16   ` Mika Kuoppala
  2018-06-11 10:26     ` Chris Wilson
  2018-06-11 10:30   ` Joonas Lahtinen
  1 sibling, 1 reply; 44+ messages in thread
From: Mika Kuoppala @ 2018-06-11 10:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> The HW only accepts offsets within ring->size, and fails peculiarly if
> the RING_HEAD or RING_TAIL is set to ring->size. Therefore whenever we
> set ring->head/ring->tail we want to make sure it is within value (using
> intel_ring_wrap()).
>
> v2: Double check execlists as well
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c        |  6 ++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  5 +++++
>  drivers/gpu/drm/i915/intel_ringbuffer.h | 12 ++++++++++++
>  3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 091e28f0e024..3e008adf5a01 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1413,6 +1413,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>  	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
>  	ce->lrc_reg_state[CTX_RING_BUFFER_START+1] =
>  		i915_ggtt_offset(ce->ring->vma);
> +	GEM_BUG_ON(!intel_ring_offset_valid(ce->ring, ce->ring->head));
>  	ce->lrc_reg_state[CTX_RING_HEAD+1] = ce->ring->head;
>  
>  	ce->state->obj->pin_global++;
> @@ -2001,9 +2002,10 @@ static void execlists_reset(struct intel_engine_cs *engine,
>  
>  	/* Move the RING_HEAD onto the breadcrumb, past the hanging batch */
>  	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(request->ring->vma);
> -	regs[CTX_RING_HEAD + 1] = request->postfix;
>  
> -	request->ring->head = request->postfix;
> +	request->ring->head = intel_ring_wrap(request->ring, request->postfix);
> +	regs[CTX_RING_HEAD + 1] = request->ring->head;
> +
>  	intel_ring_update_space(request->ring);
>  
>  	/* Reset WaIdleLiteRestore:bdw,skl as well */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 409f499c0a45..7970ecb199e2 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -496,6 +496,10 @@ static int init_ring_common(struct intel_engine_cs *engine)
>  		DRM_DEBUG_DRIVER("%s initialization failed [head=%08x], fudging\n",
>  				 engine->name, I915_READ_HEAD(engine));
>  
> +	/* Check that the ring offsets point within the ring! */
> +	GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->head));
> +	GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->tail));
> +
>  	intel_ring_update_space(ring);
>  	I915_WRITE_HEAD(engine, ring->head);
>  	I915_WRITE_TAIL(engine, ring->tail);
> @@ -1064,6 +1068,7 @@ int intel_ring_pin(struct intel_ring *ring,
>  
>  void intel_ring_reset(struct intel_ring *ring, u32 tail)
>  {
> +	tail = intel_ring_wrap(ring, tail);

I am pondering this wrap here and it's usefulness. Where
could we ever get a tail which is not valid? From corrupted
context?

-Mika


>  	ring->tail = tail;
>  	ring->head = tail;
>  	ring->emit = tail;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index b44c67849749..1d8140ac2016 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -805,6 +805,18 @@ static inline u32 intel_ring_wrap(const struct intel_ring *ring, u32 pos)
>  	return pos & (ring->size - 1);
>  }
>  
> +static inline bool
> +intel_ring_offset_valid(const struct intel_ring *ring, u32 pos)
> +{
> +	if (pos & -ring->size) /* must be strictly within the ring */
> +		return false;
> +
> +	if (!IS_ALIGNED(pos, 8)) /* must be qword aligned */
> +		return false;
> +
> +	return true;
> +}
> +
>  static inline u32 intel_ring_offset(const struct i915_request *rq, void *addr)
>  {
>  	/* Don't write ring->size (equivalent to 0) as that hangs some GPUs. */
> -- 
> 2.17.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore
  2018-06-11 10:04     ` Chris Wilson
@ 2018-06-11 10:23       ` Tvrtko Ursulin
  2018-06-11 10:33         ` Chris Wilson
  0 siblings, 1 reply; 44+ messages in thread
From: Tvrtko Ursulin @ 2018-06-11 10:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/06/2018 11:04, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-11 11:00:31)
>>
>> On 10/06/2018 20:43, Chris Wilson wrote:
>>> @@ -1585,11 +1607,14 @@ static int switch_context(struct i915_request *rq)
>>>    
>>>                to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
>>>                engine->legacy_active_ppgtt = to_mm;
>>> -             hw_flags = MI_FORCE_RESTORE;
>>> +
>>> +             if (to_ctx == from_ctx) {
>>
>> Contexts can be the same here , when the parent condition is "if (to_mm
>> != from_mm || to_mm ...)" ?
>>
>>> +                     hw_flags = MI_FORCE_RESTORE;
>>> +                     from_ctx = NULL;
>>
>> Now on the error path we can end up with engine->legacy_active_context
>> == NULL, but commands to switch have been put to the ring. Is that OK?
> 
> On the error path, we unwind the commands in the ring and will overwrite
> them with the next request (see i915_request_alloc() err_unwind:)
> 
> So leaving legacy_active_context = NULL here just ensures we emit a
> context switch next time, no matter what. That is no bad thing as the HW
> ignores redundant MI_SET_CONTEXT (and requires the FORCE_RESTORE flag to
> force a redundant reload).

But we don't set MI_FORCE_RESTORE if the old context is equal to new 
one, since we discarded information about the previous one?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail
  2018-06-11 10:16   ` Mika Kuoppala
@ 2018-06-11 10:26     ` Chris Wilson
  2018-06-11 10:40       ` Mika Kuoppala
  0 siblings, 1 reply; 44+ messages in thread
From: Chris Wilson @ 2018-06-11 10:26 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-06-11 11:16:14)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 409f499c0a45..7970ecb199e2 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -496,6 +496,10 @@ static int init_ring_common(struct intel_engine_cs *engine)
> >               DRM_DEBUG_DRIVER("%s initialization failed [head=%08x], fudging\n",
> >                                engine->name, I915_READ_HEAD(engine));
> >  
> > +     /* Check that the ring offsets point within the ring! */
> > +     GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->head));
> > +     GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->tail));
> > +
> >       intel_ring_update_space(ring);
> >       I915_WRITE_HEAD(engine, ring->head);
> >       I915_WRITE_TAIL(engine, ring->tail);
> > @@ -1064,6 +1068,7 @@ int intel_ring_pin(struct intel_ring *ring,
> >  
> >  void intel_ring_reset(struct intel_ring *ring, u32 tail)
> >  {
> > +     tail = intel_ring_wrap(ring, tail);
> 
> I am pondering this wrap here and it's usefulness. Where
> could we ever get a tail which is not valid? From corrupted
> context?

It's just being defensive. We could make this
GEM_BUG_ON(!offset_valid()) instead?

Your choice?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 03/17] drm/i915/ringbuffer: Fix context restore upon reset
  2018-06-10 19:43 ` [PATCH 03/17] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
@ 2018-06-11 10:28   ` Joonas Lahtinen
  0 siblings, 0 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 10:28 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:11)
> The discovery with trying to enable full-ppgtt was that we were
> completely failing to the load both the mm and context following the
> reset. Although we were performing mmio to set the PP_DIR (per-process
> GTT) and CCID (context), these were taking no effect (the assumption was
> that this would trigger reload of the context and restore the page
> tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
> following context switch would anything occur.
> 
> Since we are then required to reset the context image and PP_DIR using
> CS commands, we place those commands into every batch. The hardware
> should recognise the no-ops and eliminate the expensive context loads,
> but we still have to pay the cost of using cross-powerwell register
> writes. In practice, this has no effect on actual context switch times,
> and only adds a few hundred nanoseconds to no-op switches. We can improve
> the latter by eliminating the w/a around known no-op switches, but there
> is an ulterior motive to keeping them.
> 
> Always emitting the context switch at the beginning of the request (and
> relying on HW to skip unneeded switches) does have one key advantage.
> Should we implement request reordering on Haswell, we will not know in
> advance what the previous executing context was on the GPU and so we
> would not be able to elide the MI_SET_CONTEXT commands ourselves and
> always have to emit them. Having our hand forced now actually prepares
> us for later.

Please comment about the destiny of the tracepoint for the sake of the
bisecter that will end up here to look for a lost tracepoint :)

Also, you could throw a Testcase: this affects.

> 
> v2: Sandybridge has to agree to use LRI as well.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail
  2018-06-10 19:43 ` [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail Chris Wilson
  2018-06-11 10:16   ` Mika Kuoppala
@ 2018-06-11 10:30   ` Joonas Lahtinen
  1 sibling, 0 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 10:30 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:12)
> The HW only accepts offsets within ring->size, and fails peculiarly if
> the RING_HEAD or RING_TAIL is set to ring->size. Therefore whenever we
> set ring->head/ring->tail we want to make sure it is within value (using
> intel_ring_wrap()).
> 
> v2: Double check execlists as well
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore
  2018-06-11 10:23       ` Tvrtko Ursulin
@ 2018-06-11 10:33         ` Chris Wilson
  0 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-11 10:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2018-06-11 11:23:53)
> 
> On 11/06/2018 11:04, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-06-11 11:00:31)
> >>
> >> On 10/06/2018 20:43, Chris Wilson wrote:
> >>> @@ -1585,11 +1607,14 @@ static int switch_context(struct i915_request *rq)
> >>>    
> >>>                to_mm->pd_dirty_rings &= ~intel_engine_flag(engine);
> >>>                engine->legacy_active_ppgtt = to_mm;
> >>> -             hw_flags = MI_FORCE_RESTORE;
> >>> +
> >>> +             if (to_ctx == from_ctx) {
> >>
> >> Contexts can be the same here , when the parent condition is "if (to_mm
> >> != from_mm || to_mm ...)" ?
> >>
> >>> +                     hw_flags = MI_FORCE_RESTORE;
> >>> +                     from_ctx = NULL;
> >>
> >> Now on the error path we can end up with engine->legacy_active_context
> >> == NULL, but commands to switch have been put to the ring. Is that OK?
> > 
> > On the error path, we unwind the commands in the ring and will overwrite
> > them with the next request (see i915_request_alloc() err_unwind:)
> > 
> > So leaving legacy_active_context = NULL here just ensures we emit a
> > context switch next time, no matter what. That is no bad thing as the HW
> > ignores redundant MI_SET_CONTEXT (and requires the FORCE_RESTORE flag to
> > force a redundant reload).
> 
> But we don't set MI_FORCE_RESTORE if the old context is equal to new 
> one, since we discarded information about the previous one?

Bah, no I was thinking the opposite that we would just end up with more
FORCE_RESTORE. In the next patch, we kill it. Let's just undo all the
unnecessary changes here (can you tell I was avoiding work?)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 05/17] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories
  2018-06-10 19:43 ` [PATCH 05/17] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories Chris Wilson
@ 2018-06-11 10:37   ` Joonas Lahtinen
  0 siblings, 0 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 10:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:13)
> When we update the gen6 ppgtt page directories, we do so by writing the
> new address into a reserved slot in the GGTT. It appears that when the
> GPU reads that entry from the gsm, it uses its small cache and that we
> need to invalidate that cache after writing. We don't see an issue
> currently as we prefill the ppgtt page directories on creation; and only
> create the single aliasing_ppgtt long before we start using the GGTT
> (and so before the cache mayhave a conflicting entry).

"may have"

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail
  2018-06-11 10:26     ` Chris Wilson
@ 2018-06-11 10:40       ` Mika Kuoppala
  0 siblings, 0 replies; 44+ messages in thread
From: Mika Kuoppala @ 2018-06-11 10:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-06-11 11:16:14)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > index 409f499c0a45..7970ecb199e2 100644
>> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> > @@ -496,6 +496,10 @@ static int init_ring_common(struct intel_engine_cs *engine)
>> >               DRM_DEBUG_DRIVER("%s initialization failed [head=%08x], fudging\n",
>> >                                engine->name, I915_READ_HEAD(engine));
>> >  
>> > +     /* Check that the ring offsets point within the ring! */
>> > +     GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->head));
>> > +     GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->tail));
>> > +
>> >       intel_ring_update_space(ring);
>> >       I915_WRITE_HEAD(engine, ring->head);
>> >       I915_WRITE_TAIL(engine, ring->tail);
>> > @@ -1064,6 +1068,7 @@ int intel_ring_pin(struct intel_ring *ring,
>> >  
>> >  void intel_ring_reset(struct intel_ring *ring, u32 tail)
>> >  {
>> > +     tail = intel_ring_wrap(ring, tail);
>> 
>> I am pondering this wrap here and it's usefulness. Where
>> could we ever get a tail which is not valid? From corrupted
>> context?
>
> It's just being defensive. We could make this
> GEM_BUG_ON(!offset_valid()) instead?
>
> Your choice?

That would be best to assert it.

For fancyness, you could do a ring_offset_valid check from
assert_ring_tail_valid as it is a subset and share
some of the checks.

-Mika
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR
  2018-06-10 19:43 ` [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR Chris Wilson
  2018-06-11 10:12   ` Chris Wilson
@ 2018-06-11 10:43   ` Joonas Lahtinen
  2018-06-11 10:47     ` Chris Wilson
  1 sibling, 1 reply; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 10:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:14)
> After triggering the mm switch with a load of PD_DIR, which may be
> deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands
> with that load by posting a read of PD_DIR (or else those subsequent
> commands may access the stale page tables).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>

Again, commit message checks out with code. Do you have your own
bug-triaging flowchart that you come up with these W/As? :)

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR
  2018-06-11 10:43   ` Joonas Lahtinen
@ 2018-06-11 10:47     ` Chris Wilson
  0 siblings, 0 replies; 44+ messages in thread
From: Chris Wilson @ 2018-06-11 10:47 UTC (permalink / raw)
  To: Joonas Lahtinen, intel-gfx

Quoting Joonas Lahtinen (2018-06-11 11:43:51)
> Quoting Chris Wilson (2018-06-10 22:43:14)
> > After triggering the mm switch with a load of PD_DIR, which may be
> > deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands
> > with that load by posting a read of PD_DIR (or else those subsequent
> > commands may access the stale page tables).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > Cc: Matthew Auld <matthew.william.auld@gmail.com>
> 
> Again, commit message checks out with code. Do you have your own
> bug-triaging flowchart that you come up with these W/As? :)

It's called running the tests that take too for CI :(

The lack of automation scares me, because I know how lazy and forgetful
I am.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 08/17] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create
  2018-06-10 19:43 ` [PATCH 08/17] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create Chris Wilson
@ 2018-06-11 10:48   ` Joonas Lahtinen
  0 siblings, 0 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 10:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:16)
> Pull the empty stubs together into the top level gen6_ppgtt_create, and
> tear each one down on error in proper onion order (rather than use
> Joonas' pet hate of calling the cleanup function in indeterminable
> state).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 09/17] drm/i915/gtt: Make gen6 page directories evictable
  2018-06-10 19:43 ` [PATCH 09/17] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
@ 2018-06-11 10:56   ` Joonas Lahtinen
  0 siblings, 0 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 10:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:17)
> Currently all page directories are bound at creation using an
> unevictable node in the GGTT. This severely limits us as we cannot
> remove any inactive ppgtt for new contexts, or under aperture pressure.
> To fix this we need to make the page directory into a first class and
> unbindable vma. Hence, the creation of a custom vma to wrap the page
> directory as opposed to a GEM object.
> 
> In this patch, we leave the page directories pinned upon creation.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>

The misuse of I915_GGTT_VIEW_ROTATED from previous review is a one that I would
like to see addressed.

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 11/17] drm/i915/gtt: Lazily allocate page directories for gen7
  2018-06-10 19:43 ` [PATCH 11/17] drm/i915/gtt: Lazily allocate page directories for gen7 Chris Wilson
@ 2018-06-11 11:03   ` Joonas Lahtinen
  0 siblings, 0 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 11:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:19)
> As we were only supporting aliasing_ppgtt on gen7 for some time, we
> saved a few checks by preallocating the page directories on creation.
> However, since we need 2MiB of page directories for each ppgtt, to
> support arbitrary numbers of user contexts, we need to be more prudent
> in our allocations, and defer the page allocation until it is used. We
> don't recover unused pages yet as we found that doing so on the fly
> (i.e. altering TLB entries) would confuse the GPU.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 12/17] drm/i915/gtt: Free unused page tables on unbind the context
  2018-06-10 19:43 ` [PATCH 12/17] drm/i915/gtt: Free unused page tables on unbind the context Chris Wilson
@ 2018-06-11 11:09   ` Joonas Lahtinen
  0 siblings, 0 replies; 44+ messages in thread
From: Joonas Lahtinen @ 2018-06-11 11:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Quoting Chris Wilson (2018-06-10 22:43:20)
> As we cannot reliably change used page tables while the context is
> active, the earliest opportunity we have to recover excess pages is when
> the context becomes idle. So whenever we unbind the context (it must be
> idle, and indeed being evicted) free the unused ptes.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2018-06-11 11:09 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-10 19:43 Haswell full-ppgtt Chris Wilson
2018-06-10 19:43 ` [PATCH 01/17] drm/i915: Apply batch location restrictions before pinning Chris Wilson
2018-06-11  9:57   ` Joonas Lahtinen
2018-06-11 10:08     ` Chris Wilson
2018-06-10 19:43 ` [PATCH 02/17] drm/i915/ringbuffer: Brute force context restore Chris Wilson
2018-06-11 10:00   ` Tvrtko Ursulin
2018-06-11 10:04     ` Chris Wilson
2018-06-11 10:23       ` Tvrtko Ursulin
2018-06-11 10:33         ` Chris Wilson
2018-06-11 10:07   ` Mika Kuoppala
2018-06-11 10:11   ` Joonas Lahtinen
2018-06-10 19:43 ` [PATCH 03/17] drm/i915/ringbuffer: Fix context restore upon reset Chris Wilson
2018-06-11 10:28   ` Joonas Lahtinen
2018-06-10 19:43 ` [PATCH 04/17] drm/i915: Wrap around the tail offset before setting ring->tail Chris Wilson
2018-06-11 10:16   ` Mika Kuoppala
2018-06-11 10:26     ` Chris Wilson
2018-06-11 10:40       ` Mika Kuoppala
2018-06-11 10:30   ` Joonas Lahtinen
2018-06-10 19:43 ` [PATCH 05/17] drm/i915/gtt: Invalidate GGTT caches after writing the gen6 page directories Chris Wilson
2018-06-11 10:37   ` Joonas Lahtinen
2018-06-10 19:43 ` [PATCH 06/17] drm/i915/ringbuffer: Serialize load of PD_DIR Chris Wilson
2018-06-11 10:12   ` Chris Wilson
2018-06-11 10:43   ` Joonas Lahtinen
2018-06-11 10:47     ` Chris Wilson
2018-06-10 19:43 ` [PATCH 07/17] drm/i915/gtt: Subclass gen6_hw_ppgtt Chris Wilson
2018-06-10 19:43 ` [PATCH 08/17] drm/i915/gtt: Onionify error handling for gen6_ppgtt_create Chris Wilson
2018-06-11 10:48   ` Joonas Lahtinen
2018-06-10 19:43 ` [PATCH 09/17] drm/i915/gtt: Make gen6 page directories evictable Chris Wilson
2018-06-11 10:56   ` Joonas Lahtinen
2018-06-10 19:43 ` [PATCH 10/17] drm/i915/gtt: Only keep gen6 page directories pinned while active Chris Wilson
2018-06-10 19:43 ` [PATCH 11/17] drm/i915/gtt: Lazily allocate page directories for gen7 Chris Wilson
2018-06-11 11:03   ` Joonas Lahtinen
2018-06-10 19:43 ` [PATCH 12/17] drm/i915/gtt: Free unused page tables on unbind the context Chris Wilson
2018-06-11 11:09   ` Joonas Lahtinen
2018-06-10 19:43 ` [PATCH 13/17] drm/i915/gtt: Skip initializing PT with scratch if full Chris Wilson
2018-06-10 19:43 ` [PATCH 14/17] drm/i915/gtt: Cache the PTE encoding of the scratch page Chris Wilson
2018-06-10 19:43 ` [PATCH 15/17] drm/i915/gtt: Reduce a pair of runtime asserts Chris Wilson
2018-06-10 19:43 ` [PATCH 16/17] drm/i915/gtt: Skip clearing the GGTT under gen6+ full-ppgtt Chris Wilson
2018-06-11  9:28   ` Matthew Auld
2018-06-10 19:43 ` [PATCH 17/17] drm/i915/gtt: Enable full-ppgtt by default everywhere Chris Wilson
2018-06-10 20:00 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/17] drm/i915: Apply batch location restrictions before pinning Patchwork
2018-06-10 20:05 ` ✗ Fi.CI.SPARSE: " Patchwork
2018-06-10 20:18 ` ✓ Fi.CI.BAT: success " Patchwork
2018-06-10 21:10 ` ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.