All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/6] Add Per-context WA using WA batch buffers
@ 2015-06-18 13:07 Arun Siluvery
  2015-06-18 13:07 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 13:07 UTC (permalink / raw)
  To: intel-gfx

From Gen8+ we have some workarounds that are applied Per context and
they are applied using special batch buffers called as WA batch buffers.
HW executes them at specific stages during context save/restore.
The patches in this series adds this framework to i915.

I did some basic testing on BDW by running glmark2 and didn't see any issues.
These WA are mainly required when preemption is enabled.

All of the previous comments are addressed in latest revision v5

[v1] http://lists.freedesktop.org/archives/intel-gfx/2015-February/060707.html
[v2] http://www.spinics.net/lists/intel-gfx/msg67804.html

[v3] In v2, two separate ring_buffer objects were used to load WA instructions
and they were part of every context which is not really required.
Chris suggested a better approach of adding a page to context itself and using
it for this purpose. Since GuC is also planning to do the same it can probably
be shared with GuC. But after discussions it is agreed to use an independent
page as GuC area might grow in future. Independent page also makes sense because
these WA are only initialized once and not changed afterwards so we can share
them across all contexts.

[v4] Changes in this revision,
In the previous version the size of batch buffers are fixed during
initialization which is not a good idea. This is corrected by updating the
functions that load WA to return the number of dwords written and caller
updates the size once all WA are initialized. The functions now also accept
offset field which allows us to have multiple batches so that required batch
can be selected based on a criteria. This is not a requirement at this point
but could be useful in future.

WaFlushCoherentL3CacheLinesAtContextSwitch implementation was incomplete which
is fixed and programming restrictions correctly applied.

http://www.spinics.net/lists/intel-gfx/msg68947.html

[v5] No major changes in this revision but switched to new revision as changes
affected all patches. Introduced macro to add commands which also checks for
page overflow. Moved code around to simplify, indentation fixes and other
improvements suggested by Chris.

Since we don't know the number of WA applied upfront, Chris suggested a two-pass
approach but that brings additional complexity which is not necessary.
Discussed with Chris and agreed upon on single page setup as simpler code wins
and also single page is sufficient for our requirement.

Please see the patches for more details.

Arun Siluvery (6):
  drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  drm/i915/gen8: Re-order init pipe_control in lrc mode
  drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
  drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch
    workaround
  drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
  drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

 drivers/gpu/drm/i915/i915_reg.h         |  32 +++-
 drivers/gpu/drm/i915/intel_lrc.c        | 298 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  18 ++
 3 files changed, 341 insertions(+), 7 deletions(-)

-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-18 13:07 [PATCH v5 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
@ 2015-06-18 13:07 ` Arun Siluvery
  2015-06-18 16:06   ` Chris Wilson
  2015-06-18 13:07 ` [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 13:07 UTC (permalink / raw)
  To: intel-gfx

Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins. Moved around functions to simplify it further, add comments.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

(Thanks to Chris, Dave and Thomas for their reviews and inputs)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 204 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  18 +++
 2 files changed, 218 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..ad0b189 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -211,6 +211,7 @@ enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 #define GEN8_CTX_ID_SHIFT 32
+#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
 		struct intel_context *ctx);
@@ -1077,6 +1078,173 @@ static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring,
 	return 0;
 }
 
+#define wa_ctx_emit(batch, cmd) {	\
+		if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t)))) {	\
+			return -ENOSPC;					\
+		}							\
+		batch[index++] = (cmd);					\
+	}
+
+/**
+ * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
+ *
+ * @ring: only applicable for RCS
+ * @wa_ctx_batch: page in which WA are loaded
+ * @offset: This is for future use in case if we would like to have multiple
+ *          batches at different offsets and select them based on a criteria.
+ * @num_dwords: The number of WA applied are known at the beginning, it returns
+ * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END
+ * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be
+ * added to perctx batch and both of them together makes a complete batch buffer.
+ *
+ * Return: non-zero if we exceed the PAGE_SIZE limit.
+ */
+
+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
+				    uint32_t **wa_ctx_batch,
+				    uint32_t offset,
+				    uint32_t *num_dwords)
+{
+	uint32_t index;
+	uint32_t *batch = *wa_ctx_batch;
+
+	index = offset;
+
+	/* FIXME: fill one cacheline with NOOPs.
+	 * Replace these instructions with WA
+	 */
+	while (index < (offset + 16))
+		wa_ctx_emit(batch, MI_NOOP);
+
+	/*
+	 * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+	 * execution depends on the length specified in terms of cache lines
+	 * in the register CTX_RCS_INDIRECT_CTX
+	 */
+
+	*num_dwords = index - offset;
+
+	return 0;
+}
+
+/*
+ * gen8_init_perctx_bb() - initialize per ctx batch with WA
+ *
+ * This function doesn't add any padding at the end as it contains
+ * MI_BATCH_BUFFER_END and padding after it is redundant.
+ */
+static int gen8_init_perctx_bb(struct intel_engine_cs *ring,
+			       uint32_t **wa_ctx_batch,
+			       uint32_t offset,
+			       uint32_t *num_dwords)
+{
+	uint32_t index;
+	uint32_t *batch = *wa_ctx_batch;
+
+	index = offset;
+
+	/* FIXME: fill one cacheline with NOOPs.
+	 * Replace these instructions with WA
+	 */
+	while (index < (offset + 16))
+		wa_ctx_emit(batch, MI_NOOP);
+
+	batch[index - 1] = MI_BATCH_BUFFER_END;
+
+	*num_dwords = index - offset;
+
+	return 0;
+}
+
+static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
+{
+	int ret;
+
+	WARN_ON(ring->id != RCS);
+
+	ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
+	if (!ring->wa_ctx.obj) {
+		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
+		return -ENOMEM;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
+				 ret);
+		drm_gem_object_unreference(&ring->wa_ctx.obj->base);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void lrc_destroy_wa_ctx_obj(struct intel_engine_cs *ring)
+{
+	WARN_ON(ring->id != RCS);
+
+	i915_gem_object_ggtt_unpin(ring->wa_ctx.obj);
+	drm_gem_object_unreference(&ring->wa_ctx.obj->base);
+	ring->wa_ctx.obj = NULL;
+}
+
+static int intel_init_workaround_bb(struct intel_engine_cs *ring)
+{
+	int ret = 0;
+	uint32_t *batch;
+	uint32_t num_dwords;
+	struct page *page;
+	struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
+
+	WARN_ON(ring->id != RCS);
+
+	if (ring->scratch.obj == NULL) {
+		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+		return -EINVAL;
+	}
+
+	ret = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Failed to setup context WA page: %d\n", ret);
+		return ret;
+	}
+
+	page = i915_gem_object_get_page(wa_ctx->obj, 0);
+	batch = kmap_atomic(page);
+
+	if (INTEL_INFO(ring->dev)->gen == 8) {
+		wa_ctx->indctx_batch_offset = 0;
+
+		ret = gen8_init_indirectctx_bb(ring,
+					       &batch,
+					       wa_ctx->indctx_batch_offset,
+					       &num_dwords);
+		if (ret)
+			goto out;
+
+		wa_ctx->indctx_batch_size = round_up(num_dwords, CACHELINE_DWORDS);
+		wa_ctx->perctx_batch_offset = wa_ctx->indctx_batch_size;
+
+		ret = gen8_init_perctx_bb(ring,
+					  &batch,
+					  wa_ctx->perctx_batch_offset,
+					  &num_dwords);
+		if (ret)
+			goto out;
+	} else {
+		WARN(INTEL_INFO(ring->dev)->gen >= 8,
+		     "WA batch buffer is not initialized for Gen%d\n",
+		     INTEL_INFO(ring->dev)->gen);
+	}
+
+out:
+	kunmap_atomic(batch);
+	if (ret)
+		lrc_destroy_wa_ctx_obj(ring);
+
+	return ret;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -1411,6 +1579,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		kunmap(sg_page(ring->status_page.obj->pages->sgl));
 		ring->status_page.obj = NULL;
 	}
+
+	if (ring->wa_ctx.obj)
+		lrc_destroy_wa_ctx_obj(ring);
 }
 
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
@@ -1474,7 +1645,21 @@ static int logical_render_ring_init(struct drm_device *dev)
 	if (ret)
 		return ret;
 
-	return intel_init_pipe_control(ring);
+	if (INTEL_INFO(ring->dev)->gen >= 8) {
+		ret = intel_init_workaround_bb(ring);
+		if (ret) {
+			DRM_ERROR("WA batch buffers are not initialized: %d\n",
+				  ret);
+		}
+	}
+
+	ret = intel_init_pipe_control(ring);
+	if (ret) {
+		if (ring->wa_ctx.obj)
+			lrc_destroy_wa_ctx_obj(ring);
+	}
+
+	return ret;
 }
 
 static int logical_bsd_ring_init(struct drm_device *dev)
@@ -1754,15 +1939,26 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
 	reg_state[CTX_SECOND_BB_STATE+1] = 0;
 	if (ring->id == RCS) {
-		/* TODO: according to BSpec, the register state context
-		 * for CHV does not have these. OTOH, these registers do
-		 * exist in CHV. I'm waiting for a clarification */
 		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
 		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
 		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+		if (ring->wa_ctx.obj) {
+			reg_state[CTX_RCS_INDIRECT_CTX+1] =
+				(i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
+				 ring->wa_ctx.indctx_batch_offset * sizeof(uint32_t)) |
+				(ring->wa_ctx.indctx_batch_size / CACHELINE_DWORDS);
+
+			reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] =
+				CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT << 6;
+
+			reg_state[CTX_BB_PER_CTX_PTR+1] =
+				(i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
+				 ring->wa_ctx.perctx_batch_offset * sizeof(uint32_t)) |
+				0x01;
+		}
 	}
 	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
 	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39f6dfc..1f38af3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -12,6 +12,7 @@
  * workarounds!
  */
 #define CACHELINE_BYTES 64
+#define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(uint32_t))
 
 /*
  * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
@@ -119,6 +120,22 @@ struct intel_ringbuffer {
 
 struct	intel_context;
 
+/*
+ * we use a single page to load ctx workarounds so all of these
+ * values are referred in terms of dwords
+ *
+ * offset field - helpful in case if we want to have multiple batches
+ * at different offsets based on some conditions. It is not a requirement
+ * at the moment but provides an option for future use.
+ * indctx_batch_size - HW expects this value in terms of cachelines
+ */
+struct  i915_ctx_workarounds {
+	u32 indctx_batch_offset;
+	u32 indctx_batch_size;
+	u32 perctx_batch_offset;
+	struct drm_i915_gem_object *obj;
+};
+
 struct  intel_engine_cs {
 	const char	*name;
 	enum intel_ring_id {
@@ -142,6 +159,7 @@ struct  intel_engine_cs {
 	struct i915_gem_batch_pool batch_pool;
 
 	struct intel_hw_status_page status_page;
+	struct i915_ctx_workarounds wa_ctx;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
  2015-06-18 13:07 [PATCH v5 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
  2015-06-18 13:07 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
@ 2015-06-18 13:07 ` Arun Siluvery
  2015-06-18 13:07 ` [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 13:07 UTC (permalink / raw)
  To: intel-gfx

Some of the WA applied using WA batch buffers perform writes to scratch page.
In the current flow WA are initialized before scratch obj is allocated.
This patch reorders intel_init_pipe_control() to have a valid scratch obj
before we initialize WA.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ad0b189..1d31eb5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1641,7 +1641,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	ring->dev = dev;
-	ret = logical_ring_init(dev, ring);
+
+	ret = intel_init_pipe_control(ring);
 	if (ret)
 		return ret;
 
@@ -1653,7 +1654,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 		}
 	}
 
-	ret = intel_init_pipe_control(ring);
+	ret = logical_ring_init(dev, ring);
 	if (ret) {
 		if (ring->wa_ctx.obj)
 			lrc_destroy_wa_ctx_obj(ring);
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
  2015-06-18 13:07 [PATCH v5 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
  2015-06-18 13:07 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
  2015-06-18 13:07 ` [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
@ 2015-06-18 13:07 ` Arun Siluvery
  2015-06-18 13:07 ` [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 13:07 UTC (permalink / raw)
  To: intel-gfx

In Indirect and Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 1d31eb5..8d5932a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1110,10 +1110,11 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
 
 	index = offset;
 
-	/* FIXME: fill one cacheline with NOOPs.
-	 * Replace these instructions with WA
-	 */
-	while (index < (offset + 16))
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE);
+
+	/* padding */
+        while (((unsigned long) (batch + index) % CACHELINE_BYTES) != 0)
 		wa_ctx_emit(batch, MI_NOOP);
 
 	/*
@@ -1143,13 +1144,10 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring,
 
 	index = offset;
 
-	/* FIXME: fill one cacheline with NOOPs.
-	 * Replace these instructions with WA
-	 */
-	while (index < (offset + 16))
-		wa_ctx_emit(batch, MI_NOOP);
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);
 
-	batch[index - 1] = MI_BATCH_BUFFER_END;
+	wa_ctx_emit(batch, MI_BATCH_BUFFER_END);
 
 	*num_dwords = index - offset;
 
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
  2015-06-18 13:07 [PATCH v5 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
                   ` (2 preceding siblings ...)
  2015-06-18 13:07 ` [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
@ 2015-06-18 13:07 ` Arun Siluvery
  2015-06-18 13:07 ` [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 13:07 UTC (permalink / raw)
  To: intel-gfx

In Indirect context w/a batch buffer,
+WaFlushCoherentL3CacheLinesAtContextSwitch:bdw

v2: Add LRI commands to set/reset bit that invalidates coherent lines,
update WA to include programming restrictions and exclude CHV as
it is not required (Ville)

v3: Avoid unnecessary read when it can be done by reading register once (Chris).

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 84af255..d14ad20 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -426,6 +426,7 @@
 #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
 #define   PIPE_CONTROL_NOTIFY				(1<<8)
 #define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
+#define   PIPE_CONTROL_DC_FLUSH_ENABLE			(1<<5)
 #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
 #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
 #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
@@ -5788,6 +5789,7 @@ enum skl_disp_power_wells {
 
 #define GEN8_L3SQCREG4				0xb118
 #define  GEN8_LQSC_RO_PERF_DIS			(1<<27)
+#define  GEN8_LQSC_FLUSH_COHERENT_LINES		(1<<21)
 
 /* GEN8 chicken */
 #define HDC_CHICKEN0				0x7300
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8d5932a..dff8303 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1113,6 +1113,29 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE);
 
+	/* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */
+	if (IS_BROADWELL(ring->dev)) {
+		struct drm_i915_private *dev_priv = to_i915(ring->dev);
+		uint32_t l3sqc4_flush = (I915_READ(GEN8_L3SQCREG4) |
+					 GEN8_LQSC_FLUSH_COHERENT_LINES);
+
+		wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+		wa_ctx_emit(batch, GEN8_L3SQCREG4);
+		wa_ctx_emit(batch, l3sqc4_flush);
+
+		wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6));
+		wa_ctx_emit(batch, (PIPE_CONTROL_CS_STALL |
+				    PIPE_CONTROL_DC_FLUSH_ENABLE));
+		wa_ctx_emit(batch, 0);
+		wa_ctx_emit(batch, 0);
+		wa_ctx_emit(batch, 0);
+		wa_ctx_emit(batch, 0);
+
+		wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+		wa_ctx_emit(batch, GEN8_L3SQCREG4);
+		wa_ctx_emit(batch, l3sqc4_flush & ~GEN8_LQSC_FLUSH_COHERENT_LINES);
+	}
+
 	/* padding */
         while (((unsigned long) (batch + index) % CACHELINE_BYTES) != 0)
 		wa_ctx_emit(batch, MI_NOOP);
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
  2015-06-18 13:07 [PATCH v5 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
                   ` (3 preceding siblings ...)
  2015-06-18 13:07 ` [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
@ 2015-06-18 13:07 ` Arun Siluvery
  2015-06-18 13:07 ` [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
  2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
  6 siblings, 0 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 13:07 UTC (permalink / raw)
  To: intel-gfx

In Indirect context w/a batch buffer,
WaClearSlmSpaceAtContextSwitch

v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index d14ad20..7637e64 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -410,6 +410,7 @@
 #define   DISPLAY_PLANE_A           (0<<20)
 #define   DISPLAY_PLANE_B           (1<<20)
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
+#define   PIPE_CONTROL_FLUSH_L3				(1<<27)
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
 #define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
 #define   PIPE_CONTROL_STORE_DATA_INDEX			(1<<21)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index dff8303..792d559 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1106,6 +1106,7 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
 				    uint32_t *num_dwords)
 {
 	uint32_t index;
+	uint32_t scratch_addr;
 	uint32_t *batch = *wa_ctx_batch;
 
 	index = offset;
@@ -1136,6 +1137,21 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
 		wa_ctx_emit(batch, l3sqc4_flush & ~GEN8_LQSC_FLUSH_COHERENT_LINES);
 	}
 
+	/* WaClearSlmSpaceAtContextSwitch:bdw,chv */
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
+	wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6));
+	wa_ctx_emit(batch, (PIPE_CONTROL_FLUSH_L3 |
+			    PIPE_CONTROL_GLOBAL_GTT_IVB |
+			    PIPE_CONTROL_CS_STALL |
+			    PIPE_CONTROL_QW_WRITE));
+	wa_ctx_emit(batch, scratch_addr);
+	wa_ctx_emit(batch, 0);
+	wa_ctx_emit(batch, 0);
+	wa_ctx_emit(batch, 0);
+
 	/* padding */
         while (((unsigned long) (batch + index) % CACHELINE_BYTES) != 0)
 		wa_ctx_emit(batch, MI_NOOP);
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-18 13:07 [PATCH v5 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
                   ` (4 preceding siblings ...)
  2015-06-18 13:07 ` [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
@ 2015-06-18 13:07 ` Arun Siluvery
  2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
  6 siblings, 0 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 13:07 UTC (permalink / raw)
  To: intel-gfx

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

v3: Length defined in current definitions of LRM, LRR instructions was specified
as 0. It seems it is common convention for instructions whose length vary between
platforms. This is not an issue so far because they are not used anywhere except
command parser; now that we use in this patch update them with correct length
and also move them out of command parser placeholder to appropriate place.
remove unnecessary padding and follow the WA programming sequence exactly
as mentioned in spec which is essential for this WA (Dave).

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  | 29 +++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.c | 54 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7637e64..208620d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,31 @@
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
 #define   MI_FLUSH_DW_USE_PPGTT		(0<<2)
+#define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 1)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
+#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
+#define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 1)
+#define MI_ATOMIC(len)	MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT	(1<<22)
+#define   MI_ATOMIC_INLINE_DATA		(1<<18)
+#define   MI_ATOMIC_CS_STALL		(1<<17)
+#define   MI_ATOMIC_RETURN_DATA_CTL	(1<<16)
+#define MI_ATOMIC_OP_MASK(op)  ((op) << 8)
+#define MI_ATOMIC_AND	MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR	MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR	MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE	MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC	MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC	MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD	MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB	MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB	MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX	MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN	MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX	MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN	MI_ATOMIC_OP_MASK(0x0D)
+
 #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
 #define   MI_BATCH_NON_SECURE		(1)
 /* for snb/ivb/vlv this also means "batch in ppgtt" when ppgtt is enabled. */
@@ -451,8 +476,6 @@
 #define MI_CLFLUSH              MI_INSTR(0x27, 0)
 #define MI_REPORT_PERF_COUNT    MI_INSTR(0x28, 0)
 #define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
-#define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
-#define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
 #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
 #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
 #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
@@ -1799,6 +1822,8 @@ enum skl_disp_power_wells {
 #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE	(1 << 12)
 #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE	(1<<10)
 
+#define GEN8_RS_PREEMPT_STATUS		0x215C
+
 /* Fuse readout registers for GT */
 #define CHV_FUSE_GT			(VLV_DISPLAY_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0		(1 << 10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 792d559..19a3460 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1179,13 +1179,67 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring,
 			       uint32_t *num_dwords)
 {
 	uint32_t index;
+	uint32_t scratch_addr;
 	uint32_t *batch = *wa_ctx_batch;
 
 	index = offset;
 
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);
 
+	/*
+	 * As per Bspec, to workaround a known HW issue, SW must perform the
+	 * below programming sequence prior to programming MI_BATCH_BUFFER_END.
+	 *
+	 * This is only applicable for Gen8.
+	 */
+
+	/* WaRsRestoreWithPerCtxtBb:bdw,chv */
+	wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+	wa_ctx_emit(batch, INSTPM);
+	wa_ctx_emit(batch, _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING));
+
+	wa_ctx_emit(batch, (MI_ATOMIC(5) |
+			    MI_ATOMIC_MEMORY_TYPE_GGTT |
+			    MI_ATOMIC_INLINE_DATA |
+			    MI_ATOMIC_CS_STALL |
+			    MI_ATOMIC_RETURN_DATA_CTL |
+			    MI_ATOMIC_MOVE));
+	wa_ctx_emit(batch, scratch_addr);
+	wa_ctx_emit(batch, 0);
+	wa_ctx_emit(batch, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
+	wa_ctx_emit(batch, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
+
+	/*
+	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
+	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
+	 * in the same cacheline. To satisfy this case even if more WA are
+	 * added in future, pad current cacheline and start remaining sequence
+	 * in new cacheline.
+	 */
+	while (((unsigned long) (batch + index) % CACHELINE_BYTES) != 0)
+		wa_ctx_emit(batch, MI_NOOP);
+
+	wa_ctx_emit(batch, (MI_LOAD_REGISTER_MEM_GEN8 |
+			    MI_LRM_USE_GLOBAL_GTT |
+			    MI_LRM_ASYNC_MODE_ENABLE));
+	wa_ctx_emit(batch, INSTPM);
+	wa_ctx_emit(batch, scratch_addr);
+	wa_ctx_emit(batch, 0);
+
+	/*
+	 * BSpec says there should not be any commands programmed
+	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
+	 * do not add any new commands
+	 */
+	wa_ctx_emit(batch, MI_LOAD_REGISTER_REG);
+	wa_ctx_emit(batch, GEN8_RS_PREEMPT_STATUS);
+	wa_ctx_emit(batch, GEN8_RS_PREEMPT_STATUS);
+
 	wa_ctx_emit(batch, MI_BATCH_BUFFER_END);
 
 	*num_dwords = index - offset;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-18 13:07 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
@ 2015-06-18 16:06   ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2015-06-18 16:06 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

I'm pretty happy with the code, I was just confused by the series
changing the setup halfway through

On Thu, Jun 18, 2015 at 02:07:30PM +0100, Arun Siluvery wrote:
> +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
> +				    uint32_t **wa_ctx_batch,
> +				    uint32_t offset,
> +				    uint32_t *num_dwords)
> +{
> +	uint32_t index;
> +	uint32_t *batch = *wa_ctx_batch;
> +
> +	index = offset;
> +
> +	/* FIXME: fill one cacheline with NOOPs.
> +	 * Replace these instructions with WA
> +	 */
> +	while (index < (offset + 16))
> +		wa_ctx_emit(batch, MI_NOOP);

If this was

/* Replace me with WA */
wa_ctx_emit(batch, MI_NOOP)

/* Pad to end of cacheline */
while (index % 16)
	wa_ctx_emit(batch, MI_NOOP);

You then don't need to alter the code when yo add the real w/a. Note
that using (unsigned long)batch as you do later for cacheline
calculation is wrong, as that is a local physical CPU address (not the
virtual address used by the cache in the GPU) and was page aligned
anyway.

Similary,

> +static int gen8_init_perctx_bb(struct intel_engine_cs *ring,
> +			       uint32_t **wa_ctx_batch,
> +			       uint32_t offset,
> +			       uint32_t *num_dwords)
> +{
> +	uint32_t index;
> +	uint32_t *batch = *wa_ctx_batch;
> +
> +	index = offset;
> +

If this just did
		wa_ctx_emit(batch, MI_BATCH_BUFFER_END);
rather than insert a cacheline of noops, again you wouldn't need to
touch this infrastructure as you added the w/a.

As it stands, I was a little worried halfway through when the cache
alignment suddenly disappeared - but this patch implied to me that it
was necessary.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-18 13:07 [PATCH v5 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
                   ` (5 preceding siblings ...)
  2015-06-18 13:07 ` [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
@ 2015-06-18 17:33 ` Arun Siluvery
  2015-06-18 17:33   ` [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
                     ` (5 more replies)
  6 siblings, 6 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 17:33 UTC (permalink / raw)
  To: intel-gfx

Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins. Moved around functions to simplify it further, add comments
and fix alignment check.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 199 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  18 +++
 2 files changed, 213 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..8cc851dd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -211,6 +211,7 @@ enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 #define GEN8_CTX_ID_SHIFT 32
+#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
 		struct intel_context *ctx);
@@ -1077,6 +1078,168 @@ static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring,
 	return 0;
 }
 
+#define wa_ctx_emit(batch, cmd) {	\
+		if (WARN_ON(index >= (PAGE_SIZE / sizeof(uint32_t)))) {	\
+			return -ENOSPC;					\
+		}							\
+		batch[index++] = (cmd);					\
+	}
+
+/**
+ * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
+ *
+ * @ring: only applicable for RCS
+ * @wa_ctx_batch: page in which WA are loaded
+ * @offset: This is for future use in case if we would like to have multiple
+ *          batches at different offsets and select them based on a criteria.
+ * @num_dwords: The number of WA applied are known at the beginning, it returns
+ * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END
+ * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be
+ * added to perctx batch and both of them together makes a complete batch buffer.
+ *
+ * Return: non-zero if we exceed the PAGE_SIZE limit.
+ */
+
+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
+				    uint32_t **wa_ctx_batch,
+				    uint32_t offset,
+				    uint32_t *num_dwords)
+{
+	uint32_t index;
+	uint32_t *batch = *wa_ctx_batch;
+
+	index = offset;
+
+	/* FIXME: Replace me with WA */
+	wa_ctx_emit(batch, MI_NOOP);
+
+	/* Pad to end of cacheline */
+	while (index % CACHELINE_DWORDS)
+		wa_ctx_emit(batch, MI_NOOP);
+
+	/*
+	 * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+	 * execution depends on the length specified in terms of cache lines
+	 * in the register CTX_RCS_INDIRECT_CTX
+	 */
+
+	*num_dwords = index - offset;
+
+	return 0;
+}
+
+/*
+ * gen8_init_perctx_bb() - initialize per ctx batch with WA
+ *
+ * This function doesn't add any padding to cacheline as it contains
+ * MI_BATCH_BUFFER_END and padding after it is redundant.
+ */
+static int gen8_init_perctx_bb(struct intel_engine_cs *ring,
+			       uint32_t **wa_ctx_batch,
+			       uint32_t offset,
+			       uint32_t *num_dwords)
+{
+	uint32_t index;
+	uint32_t *batch = *wa_ctx_batch;
+
+	index = offset;
+
+	wa_ctx_emit(batch, MI_BATCH_BUFFER_END);
+
+	*num_dwords = index - offset;
+
+	return 0;
+}
+
+static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
+{
+	int ret;
+
+	WARN_ON(ring->id != RCS);
+
+	ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
+	if (!ring->wa_ctx.obj) {
+		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
+		return -ENOMEM;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
+				 ret);
+		drm_gem_object_unreference(&ring->wa_ctx.obj->base);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void lrc_destroy_wa_ctx_obj(struct intel_engine_cs *ring)
+{
+	WARN_ON(ring->id != RCS);
+
+	i915_gem_object_ggtt_unpin(ring->wa_ctx.obj);
+	drm_gem_object_unreference(&ring->wa_ctx.obj->base);
+	ring->wa_ctx.obj = NULL;
+}
+
+static int intel_init_workaround_bb(struct intel_engine_cs *ring)
+{
+	int ret = 0;
+	uint32_t *batch;
+	uint32_t num_dwords;
+	struct page *page;
+	struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
+
+	WARN_ON(ring->id != RCS);
+
+	if (ring->scratch.obj == NULL) {
+		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+		return -EINVAL;
+	}
+
+	ret = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Failed to setup context WA page: %d\n", ret);
+		return ret;
+	}
+
+	page = i915_gem_object_get_page(wa_ctx->obj, 0);
+	batch = kmap_atomic(page);
+
+	if (INTEL_INFO(ring->dev)->gen == 8) {
+		wa_ctx->indctx_batch_offset = 0;
+
+		ret = gen8_init_indirectctx_bb(ring,
+					       &batch,
+					       wa_ctx->indctx_batch_offset,
+					       &num_dwords);
+		if (ret)
+			goto out;
+
+		wa_ctx->indctx_batch_size = round_up(num_dwords, CACHELINE_DWORDS);
+		wa_ctx->perctx_batch_offset = wa_ctx->indctx_batch_size;
+
+		ret = gen8_init_perctx_bb(ring,
+					  &batch,
+					  wa_ctx->perctx_batch_offset,
+					  &num_dwords);
+		if (ret)
+			goto out;
+	} else {
+		WARN(INTEL_INFO(ring->dev)->gen >= 8,
+		     "WA batch buffer is not initialized for Gen%d\n",
+		     INTEL_INFO(ring->dev)->gen);
+	}
+
+out:
+	kunmap_atomic(batch);
+	if (ret)
+		lrc_destroy_wa_ctx_obj(ring);
+
+	return ret;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -1411,6 +1574,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		kunmap(sg_page(ring->status_page.obj->pages->sgl));
 		ring->status_page.obj = NULL;
 	}
+
+	if (ring->wa_ctx.obj)
+		lrc_destroy_wa_ctx_obj(ring);
 }
 
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
@@ -1474,7 +1640,21 @@ static int logical_render_ring_init(struct drm_device *dev)
 	if (ret)
 		return ret;
 
-	return intel_init_pipe_control(ring);
+	if (INTEL_INFO(ring->dev)->gen >= 8) {
+		ret = intel_init_workaround_bb(ring);
+		if (ret) {
+			DRM_ERROR("WA batch buffers are not initialized: %d\n",
+				  ret);
+		}
+	}
+
+	ret = intel_init_pipe_control(ring);
+	if (ret) {
+		if (ring->wa_ctx.obj)
+			lrc_destroy_wa_ctx_obj(ring);
+	}
+
+	return ret;
 }
 
 static int logical_bsd_ring_init(struct drm_device *dev)
@@ -1754,15 +1934,26 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
 	reg_state[CTX_SECOND_BB_STATE+1] = 0;
 	if (ring->id == RCS) {
-		/* TODO: according to BSpec, the register state context
-		 * for CHV does not have these. OTOH, these registers do
-		 * exist in CHV. I'm waiting for a clarification */
 		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
 		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
 		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+		if (ring->wa_ctx.obj) {
+			reg_state[CTX_RCS_INDIRECT_CTX+1] =
+				(i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
+				 ring->wa_ctx.indctx_batch_offset * sizeof(uint32_t)) |
+				(ring->wa_ctx.indctx_batch_size / CACHELINE_DWORDS);
+
+			reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] =
+				CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT << 6;
+
+			reg_state[CTX_BB_PER_CTX_PTR+1] =
+				(i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
+				 ring->wa_ctx.perctx_batch_offset * sizeof(uint32_t)) |
+				0x01;
+		}
 	}
 	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
 	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39f6dfc..1f38af3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -12,6 +12,7 @@
  * workarounds!
  */
 #define CACHELINE_BYTES 64
+#define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(uint32_t))
 
 /*
  * Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
@@ -119,6 +120,22 @@ struct intel_ringbuffer {
 
 struct	intel_context;
 
+/*
+ * we use a single page to load ctx workarounds so all of these
+ * values are referred in terms of dwords
+ *
+ * offset field - helpful in case if we want to have multiple batches
+ * at different offsets based on some conditions. It is not a requirement
+ * at the moment but provides an option for future use.
+ * indctx_batch_size - HW expects this value in terms of cachelines
+ */
+struct  i915_ctx_workarounds {
+	u32 indctx_batch_offset;
+	u32 indctx_batch_size;
+	u32 perctx_batch_offset;
+	struct drm_i915_gem_object *obj;
+};
+
 struct  intel_engine_cs {
 	const char	*name;
 	enum intel_ring_id {
@@ -142,6 +159,7 @@ struct  intel_engine_cs {
 	struct i915_gem_batch_pool batch_pool;
 
 	struct intel_hw_status_page status_page;
+	struct i915_ctx_workarounds wa_ctx;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
  2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
@ 2015-06-18 17:33   ` Arun Siluvery
  2015-06-18 17:33   ` [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 17:33 UTC (permalink / raw)
  To: intel-gfx

Some of the WA applied using WA batch buffers perform writes to scratch page.
In the current flow WA are initialized before scratch obj is allocated.
This patch reorders intel_init_pipe_control() to have a valid scratch obj
before we initialize WA.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8cc851dd..62486cd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1636,7 +1636,8 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	ring->dev = dev;
-	ret = logical_ring_init(dev, ring);
+
+	ret = intel_init_pipe_control(ring);
 	if (ret)
 		return ret;
 
@@ -1648,7 +1649,7 @@ static int logical_render_ring_init(struct drm_device *dev)
 		}
 	}
 
-	ret = intel_init_pipe_control(ring);
+	ret = logical_ring_init(dev, ring);
 	if (ret) {
 		if (ring->wa_ctx.obj)
 			lrc_destroy_wa_ctx_obj(ring);
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
  2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
  2015-06-18 17:33   ` [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
@ 2015-06-18 17:33   ` Arun Siluvery
  2015-06-18 17:33   ` [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 17:33 UTC (permalink / raw)
  To: intel-gfx

In Indirect and Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 62486cd..c4b3493 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1110,8 +1110,8 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
 
 	index = offset;
 
-	/* FIXME: Replace me with WA */
-	wa_ctx_emit(batch, MI_NOOP);
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE);
 
 	/* Pad to end of cacheline */
 	while (index % CACHELINE_DWORDS)
@@ -1144,6 +1144,9 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring,
 
 	index = offset;
 
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);
+
 	wa_ctx_emit(batch, MI_BATCH_BUFFER_END);
 
 	*num_dwords = index - offset;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
  2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
  2015-06-18 17:33   ` [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
  2015-06-18 17:33   ` [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
@ 2015-06-18 17:33   ` Arun Siluvery
  2015-06-18 17:33   ` [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 17:33 UTC (permalink / raw)
  To: intel-gfx

In Indirect context w/a batch buffer,
+WaFlushCoherentL3CacheLinesAtContextSwitch:bdw

v2: Add LRI commands to set/reset bit that invalidates coherent lines,
update WA to include programming restrictions and exclude CHV as
it is not required (Ville)

v3: Avoid unnecessary read when it can be done by reading register once (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 84af255..d14ad20 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -426,6 +426,7 @@
 #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
 #define   PIPE_CONTROL_NOTIFY				(1<<8)
 #define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
+#define   PIPE_CONTROL_DC_FLUSH_ENABLE			(1<<5)
 #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
 #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
 #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
@@ -5788,6 +5789,7 @@ enum skl_disp_power_wells {
 
 #define GEN8_L3SQCREG4				0xb118
 #define  GEN8_LQSC_RO_PERF_DIS			(1<<27)
+#define  GEN8_LQSC_FLUSH_COHERENT_LINES		(1<<21)
 
 /* GEN8 chicken */
 #define HDC_CHICKEN0				0x7300
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c4b3493..3291ef4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1113,6 +1113,29 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_DISABLE);
 
+	/* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw */
+	if (IS_BROADWELL(ring->dev)) {
+		struct drm_i915_private *dev_priv = to_i915(ring->dev);
+		uint32_t l3sqc4_flush = (I915_READ(GEN8_L3SQCREG4) |
+					 GEN8_LQSC_FLUSH_COHERENT_LINES);
+
+		wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+		wa_ctx_emit(batch, GEN8_L3SQCREG4);
+		wa_ctx_emit(batch, l3sqc4_flush);
+
+		wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6));
+		wa_ctx_emit(batch, (PIPE_CONTROL_CS_STALL |
+				    PIPE_CONTROL_DC_FLUSH_ENABLE));
+		wa_ctx_emit(batch, 0);
+		wa_ctx_emit(batch, 0);
+		wa_ctx_emit(batch, 0);
+		wa_ctx_emit(batch, 0);
+
+		wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+		wa_ctx_emit(batch, GEN8_L3SQCREG4);
+		wa_ctx_emit(batch, l3sqc4_flush & ~GEN8_LQSC_FLUSH_COHERENT_LINES);
+	}
+
 	/* Pad to end of cacheline */
 	while (index % CACHELINE_DWORDS)
 		wa_ctx_emit(batch, MI_NOOP);
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
  2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
                     ` (2 preceding siblings ...)
  2015-06-18 17:33   ` [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
@ 2015-06-18 17:33   ` Arun Siluvery
  2015-06-19  9:29     ` Chris Wilson
  2015-06-18 17:33   ` [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
  2015-06-19  9:27   ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Chris Wilson
  5 siblings, 1 reply; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 17:33 UTC (permalink / raw)
  To: intel-gfx

In Indirect context w/a batch buffer,
WaClearSlmSpaceAtContextSwitch

v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 16 ++++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index d14ad20..7637e64 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -410,6 +410,7 @@
 #define   DISPLAY_PLANE_A           (0<<20)
 #define   DISPLAY_PLANE_B           (1<<20)
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
+#define   PIPE_CONTROL_FLUSH_L3				(1<<27)
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
 #define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
 #define   PIPE_CONTROL_STORE_DATA_INDEX			(1<<21)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3291ef4..b631390 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1106,6 +1106,7 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
 				    uint32_t *num_dwords)
 {
 	uint32_t index;
+	uint32_t scratch_addr;
 	uint32_t *batch = *wa_ctx_batch;
 
 	index = offset;
@@ -1136,6 +1137,21 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
 		wa_ctx_emit(batch, l3sqc4_flush & ~GEN8_LQSC_FLUSH_COHERENT_LINES);
 	}
 
+	/* WaClearSlmSpaceAtContextSwitch:bdw,chv */
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
+	wa_ctx_emit(batch, GFX_OP_PIPE_CONTROL(6));
+	wa_ctx_emit(batch, (PIPE_CONTROL_FLUSH_L3 |
+			    PIPE_CONTROL_GLOBAL_GTT_IVB |
+			    PIPE_CONTROL_CS_STALL |
+			    PIPE_CONTROL_QW_WRITE));
+	wa_ctx_emit(batch, scratch_addr);
+	wa_ctx_emit(batch, 0);
+	wa_ctx_emit(batch, 0);
+	wa_ctx_emit(batch, 0);
+
 	/* Pad to end of cacheline */
 	while (index % CACHELINE_DWORDS)
 		wa_ctx_emit(batch, MI_NOOP);
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
                     ` (3 preceding siblings ...)
  2015-06-18 17:33   ` [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
@ 2015-06-18 17:33   ` Arun Siluvery
  2015-06-19  9:35     ` Chris Wilson
  2015-06-19  9:27   ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Chris Wilson
  5 siblings, 1 reply; 19+ messages in thread
From: Arun Siluvery @ 2015-06-18 17:33 UTC (permalink / raw)
  To: intel-gfx

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

v3: Length defined in current definitions of LRM, LRR instructions was specified
as 0. It seems it is common convention for instructions whose length vary between
platforms. This is not an issue so far because they are not used anywhere except
command parser; now that we use in this patch update them with correct length
and also move them out of command parser placeholder to appropriate place.
remove unnecessary padding and follow the WA programming sequence exactly
as mentioned in spec which is essential for this WA (Dave).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  | 29 +++++++++++++++++++--
 drivers/gpu/drm/i915/intel_lrc.c | 54 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7637e64..208620d 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,31 @@
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
 #define   MI_FLUSH_DW_USE_PPGTT		(0<<2)
+#define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 1)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
+#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
+#define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 1)
+#define MI_ATOMIC(len)	MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT	(1<<22)
+#define   MI_ATOMIC_INLINE_DATA		(1<<18)
+#define   MI_ATOMIC_CS_STALL		(1<<17)
+#define   MI_ATOMIC_RETURN_DATA_CTL	(1<<16)
+#define MI_ATOMIC_OP_MASK(op)  ((op) << 8)
+#define MI_ATOMIC_AND	MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR	MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR	MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE	MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC	MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC	MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD	MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB	MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB	MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX	MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN	MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX	MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN	MI_ATOMIC_OP_MASK(0x0D)
+
 #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
 #define   MI_BATCH_NON_SECURE		(1)
 /* for snb/ivb/vlv this also means "batch in ppgtt" when ppgtt is enabled. */
@@ -451,8 +476,6 @@
 #define MI_CLFLUSH              MI_INSTR(0x27, 0)
 #define MI_REPORT_PERF_COUNT    MI_INSTR(0x28, 0)
 #define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
-#define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
-#define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
 #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
 #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
 #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
@@ -1799,6 +1822,8 @@ enum skl_disp_power_wells {
 #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE	(1 << 12)
 #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE	(1<<10)
 
+#define GEN8_RS_PREEMPT_STATUS		0x215C
+
 /* Fuse readout registers for GT */
 #define CHV_FUSE_GT			(VLV_DISPLAY_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0		(1 << 10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b631390..281aec6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1179,13 +1179,67 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring,
 			       uint32_t *num_dwords)
 {
 	uint32_t index;
+	uint32_t scratch_addr;
 	uint32_t *batch = *wa_ctx_batch;
 
 	index = offset;
 
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	wa_ctx_emit(batch, MI_ARB_ON_OFF | MI_ARB_ENABLE);
 
+	/*
+	 * As per Bspec, to workaround a known HW issue, SW must perform the
+	 * below programming sequence prior to programming MI_BATCH_BUFFER_END.
+	 *
+	 * This is only applicable for Gen8.
+	 */
+
+	/* WaRsRestoreWithPerCtxtBb:bdw,chv */
+	wa_ctx_emit(batch, MI_LOAD_REGISTER_IMM(1));
+	wa_ctx_emit(batch, INSTPM);
+	wa_ctx_emit(batch, _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING));
+
+	wa_ctx_emit(batch, (MI_ATOMIC(5) |
+			    MI_ATOMIC_MEMORY_TYPE_GGTT |
+			    MI_ATOMIC_INLINE_DATA |
+			    MI_ATOMIC_CS_STALL |
+			    MI_ATOMIC_RETURN_DATA_CTL |
+			    MI_ATOMIC_MOVE));
+	wa_ctx_emit(batch, scratch_addr);
+	wa_ctx_emit(batch, 0);
+	wa_ctx_emit(batch, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
+	wa_ctx_emit(batch, _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING));
+
+	/*
+	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
+	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
+	 * in the same cacheline. To satisfy this case even if more WA are
+	 * added in future, pad current cacheline and start remaining sequence
+	 * in new cacheline.
+	 */
+	while (index % CACHELINE_DWORDS)
+		wa_ctx_emit(batch, MI_NOOP);
+
+	wa_ctx_emit(batch, (MI_LOAD_REGISTER_MEM_GEN8 |
+			    MI_LRM_USE_GLOBAL_GTT |
+			    MI_LRM_ASYNC_MODE_ENABLE));
+	wa_ctx_emit(batch, INSTPM);
+	wa_ctx_emit(batch, scratch_addr);
+	wa_ctx_emit(batch, 0);
+
+	/*
+	 * BSpec says there should not be any commands programmed
+	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
+	 * do not add any new commands
+	 */
+	wa_ctx_emit(batch, MI_LOAD_REGISTER_REG);
+	wa_ctx_emit(batch, GEN8_RS_PREEMPT_STATUS);
+	wa_ctx_emit(batch, GEN8_RS_PREEMPT_STATUS);
+
 	wa_ctx_emit(batch, MI_BATCH_BUFFER_END);
 
 	*num_dwords = index - offset;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
                     ` (4 preceding siblings ...)
  2015-06-18 17:33   ` [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
@ 2015-06-19  9:27   ` Chris Wilson
  2015-06-19  9:48     ` Siluvery, Arun
  5 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2015-06-19  9:27 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 06:33:24PM +0100, Arun Siluvery wrote:
Totally minor worries now.

> +/**
> + * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
> + *
> + * @ring: only applicable for RCS
> + * @wa_ctx_batch: page in which WA are loaded
> + * @offset: This is for future use in case if we would like to have multiple
> + *          batches at different offsets and select them based on a criteria.
> + * @num_dwords: The number of WA applied are known at the beginning, it returns
> + * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END
> + * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be
> + * added to perctx batch and both of them together makes a complete batch buffer.
> + *
> + * Return: non-zero if we exceed the PAGE_SIZE limit.
> + */
> +
> +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
> +				    uint32_t **wa_ctx_batch,
> +				    uint32_t offset,
> +				    uint32_t *num_dwords)
> +{
> +	uint32_t index;
> +	uint32_t *batch = *wa_ctx_batch;
> +
> +	index = offset;

I worry that offset need not be cacheline aligned on entry (for example
if indirectctx and perctx were switched, or someone else stuffed more
controls into the per-ring object). Like perctx, there is no mention of
any alignment worries for the starting location, but here you tell use
that the INDIRECT_CTX length is specified in cacheline, so I also presume
the start needs to be aligned.

> +static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
> +{
> +	int ret;
> +
> +	WARN_ON(ring->id != RCS);
> +
> +	ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
> +	if (!ring->wa_ctx.obj) {
> +		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
> +		return -ENOMEM;
> +	}
> +
> +	ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);

One day _pin() will return the vma being pinned and I will rejoice as it
makes reviewing pinning much easier! Not a problem for you right now.

> +static int intel_init_workaround_bb(struct intel_engine_cs *ring)
> +{
> +	int ret = 0;
> +	uint32_t *batch;
> +	uint32_t num_dwords;
> +	struct page *page;
> +	struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
> +
> +	WARN_ON(ring->id != RCS);
> +
> +	if (ring->scratch.obj == NULL) {
> +		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
> +		return -EINVAL;
> +	}

I haven't found the dependence upon scratch.obj, could you explain it?
Does it appear later?

> @@ -1754,15 +1934,26 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>  	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
>  	reg_state[CTX_SECOND_BB_STATE+1] = 0;
>  	if (ring->id == RCS) {
> -		/* TODO: according to BSpec, the register state context
> -		 * for CHV does not have these. OTOH, these registers do
> -		 * exist in CHV. I'm waiting for a clarification */
>  		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
>  		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
>  		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
>  		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
>  		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
>  		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
> +		if (ring->wa_ctx.obj) {
> +			reg_state[CTX_RCS_INDIRECT_CTX+1] =
> +				(i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
> +				 ring->wa_ctx.indctx_batch_offset * sizeof(uint32_t)) |
> +				(ring->wa_ctx.indctx_batch_size / CACHELINE_DWORDS);

Ok, this really does impose alignment conditions on indctx_batch_offset

Oh, if I do get a chance to complain, spell out indirect_ctx, make it a
struct or even just precalculate the reg value, just indctx's only value
is that is the same length as perctx, but otherwise quite obtuse.

Other than that, I couldn't punch any holes in its robustness, and the
series is starting to look quite good and very neat.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
  2015-06-18 17:33   ` [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
@ 2015-06-19  9:29     ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2015-06-19  9:29 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 06:33:28PM +0100, Arun Siluvery wrote:
> In Indirect context w/a batch buffer,
> WaClearSlmSpaceAtContextSwitch
> 
> v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

Ok, so this is the first use of scratch object. I would like to move the
test for existence from patch 1 to here, and into indirectctx_bb so that
we know why we need the obj. It's just trying to keep the patches and
code as self-contained and as descriptive as possible (including their
requirements).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-18 17:33   ` [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
@ 2015-06-19  9:35     ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2015-06-19  9:35 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

On Thu, Jun 18, 2015 at 06:33:29PM +0100, Arun Siluvery wrote:
> In Per context w/a batch buffer,
> WaRsRestoreWithPerCtxtBb
> 
> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
> so as to not break any future users of existing definitions (Michel)
> 
> v3: Length defined in current definitions of LRM, LRR instructions was specified
> as 0. It seems it is common convention for instructions whose length vary between
> platforms. This is not an issue so far because they are not used anywhere except
> command parser; now that we use in this patch update them with correct length
> and also move them out of command parser placeholder to appropriate place.
> remove unnecessary padding and follow the WA programming sequence exactly
> as mentioned in spec which is essential for this WA (Dave).

Similarly with the indirect ctx from patch 5, having the documentation
that we need scratch_obj in the preamble for these workarounds I think
is worth its weight in the extra typing.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-19  9:27   ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Chris Wilson
@ 2015-06-19  9:48     ` Siluvery, Arun
  2015-06-19  9:55       ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Siluvery, Arun @ 2015-06-19  9:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Dave Gordon, Rafael Barbalho

On 19/06/2015 10:27, Chris Wilson wrote:
> On Thu, Jun 18, 2015 at 06:33:24PM +0100, Arun Siluvery wrote:
> Totally minor worries now.
>
>> +/**
>> + * gen8_init_indirectctx_bb() - initialize indirect ctx batch with WA
>> + *
>> + * @ring: only applicable for RCS
>> + * @wa_ctx_batch: page in which WA are loaded
>> + * @offset: This is for future use in case if we would like to have multiple
>> + *          batches at different offsets and select them based on a criteria.
>> + * @num_dwords: The number of WA applied are known at the beginning, it returns
>> + * the no of DWORDS written. This batch does not contain MI_BATCH_BUFFER_END
>> + * so it adds padding to make it cacheline aligned. MI_BATCH_BUFFER_END will be
>> + * added to perctx batch and both of them together makes a complete batch buffer.
>> + *
>> + * Return: non-zero if we exceed the PAGE_SIZE limit.
>> + */
>> +
>> +static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring,
>> +				    uint32_t **wa_ctx_batch,
>> +				    uint32_t offset,
>> +				    uint32_t *num_dwords)
>> +{
>> +	uint32_t index;
>> +	uint32_t *batch = *wa_ctx_batch;
>> +
>> +	index = offset;
>
> I worry that offset need not be cacheline aligned on entry (for example
> if indirectctx and perctx were switched, or someone else stuffed more
> controls into the per-ring object). Like perctx, there is no mention of
> any alignment worries for the starting location, but here you tell use
> that the INDIRECT_CTX length is specified in cacheline, so I also presume
> the start needs to be aligned.
>
offset need to be cachealigned, I will update the comments.

>> +static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
>> +{
>> +	int ret;
>> +
>> +	WARN_ON(ring->id != RCS);
>> +
>> +	ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
>> +	if (!ring->wa_ctx.obj) {
>> +		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
>> +		return -ENOMEM;
>> +	}
>> +
>> +	ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);
>
> One day _pin() will return the vma being pinned and I will rejoice as it
> makes reviewing pinning much easier! Not a problem for you right now.
>
>> +static int intel_init_workaround_bb(struct intel_engine_cs *ring)
>> +{
>> +	int ret = 0;
>> +	uint32_t *batch;
>> +	uint32_t num_dwords;
>> +	struct page *page;
>> +	struct i915_ctx_workarounds *wa_ctx = &ring->wa_ctx;
>> +
>> +	WARN_ON(ring->id != RCS);
>> +
>> +	if (ring->scratch.obj == NULL) {
>> +		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
>> +		return -EINVAL;
>> +	}
>
> I haven't found the dependence upon scratch.obj, could you explain it?
> Does it appear later?
>
yes it does in patch 2 which rearranges init_pipe_control().
I will move this check to that patch as per your comment.

>> @@ -1754,15 +1934,26 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>>   	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
>>   	reg_state[CTX_SECOND_BB_STATE+1] = 0;
>>   	if (ring->id == RCS) {
>> -		/* TODO: according to BSpec, the register state context
>> -		 * for CHV does not have these. OTOH, these registers do
>> -		 * exist in CHV. I'm waiting for a clarification */
>>   		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
>>   		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
>>   		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
>>   		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
>>   		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
>>   		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
>> +		if (ring->wa_ctx.obj) {
>> +			reg_state[CTX_RCS_INDIRECT_CTX+1] =
>> +				(i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
>> +				 ring->wa_ctx.indctx_batch_offset * sizeof(uint32_t)) |
>> +				(ring->wa_ctx.indctx_batch_size / CACHELINE_DWORDS);
>
> Ok, this really does impose alignment conditions on indctx_batch_offset
>
> Oh, if I do get a chance to complain, spell out indirect_ctx, make it a
> struct or even just precalculate the reg value, just indctx's only value
> is that is the same length as perctx, but otherwise quite obtuse.
>
variable names were getting too long and caused difficulties in 
indentation so tried to shorten them, will change this part.

regards
Arun

> Other than that, I couldn't punch any holes in its robustness, and the
> series is starting to look quite good and very neat.
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-19  9:48     ` Siluvery, Arun
@ 2015-06-19  9:55       ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2015-06-19  9:55 UTC (permalink / raw)
  To: Siluvery, Arun; +Cc: intel-gfx

On Fri, Jun 19, 2015 at 10:48:24AM +0100, Siluvery, Arun wrote:
> variable names were getting too long and caused difficulties in
> indentation so tried to shorten them, will change this part.

It's a trade off. I was thinking we wouldn't need to use the full form
that often so the extra characters wouldn't be too much of an issue, but
having the names be consistent is most valuable. You can always play
around with temporary variables, using short structs, to make it
readable in places if it gets too ugly.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-06-19  9:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-18 13:07 [PATCH v5 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
2015-06-18 13:07 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
2015-06-18 16:06   ` Chris Wilson
2015-06-18 13:07 ` [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
2015-06-18 13:07 ` [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
2015-06-18 13:07 ` [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
2015-06-18 13:07 ` [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
2015-06-18 13:07 ` [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
2015-06-18 17:33 ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Arun Siluvery
2015-06-18 17:33   ` [PATCH v5 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
2015-06-18 17:33   ` [PATCH v5 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
2015-06-18 17:33   ` [PATCH v5 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
2015-06-18 17:33   ` [PATCH v5 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
2015-06-19  9:29     ` Chris Wilson
2015-06-18 17:33   ` [PATCH v5 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
2015-06-19  9:35     ` Chris Wilson
2015-06-19  9:27   ` [PATCH v5 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers Chris Wilson
2015-06-19  9:48     ` Siluvery, Arun
2015-06-19  9:55       ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.