All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] Add Per-context WA using WA batch buffers
@ 2015-06-05 10:34 Arun Siluvery
  2015-06-05 10:34 ` [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 10:34 UTC (permalink / raw)
  To: intel-gfx

From Gen8+ we have some workarounds that are applied Per context and
they are applied using special batch buffers called as WA batch buffers.
HW executes them at specific stages during context save/restore.
The patches in this series adds this framework to i915.

I did some basic testing on BDW by running glmark2 and didn't see any issues.
These WA are mainly required when preemption is enabled.

These patches were sent to mailing list before and this series incorporates
previous review feedback.

[v1] http://lists.freedesktop.org/archives/intel-gfx/2015-February/060707.html
[v2] http://www.spinics.net/lists/intel-gfx/msg67804.html
[v3] In v2, two separate ring_buffer objects were used to load WA instructions
and they were part of every context which is not really required.
Chris suggested a better approach of adding a page to context itself and using
it for this purpose. Since GuC is also planning to do the same it can probably
be shared with GuC. But after discussions it is agreed to use an independent
page as GuC area might grow in future. Independent page also makes sense because
these WA are only initialized once and not changed afterwards so we can share
them across all contexts.

Please see the patches for more details.
I would really appreciate any review comments on the approach, variable naming
or any other comments to upstream these changes.

Arun Siluvery (6):
  drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  drm/i915/gen8: Re-order init pipe_control in lrc mode
  drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
  drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch
    workaround
  drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
  drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround

 drivers/gpu/drm/i915/i915_reg.h         |  28 ++++
 drivers/gpu/drm/i915/intel_lrc.c        | 268 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   9 ++
 3 files changed, 301 insertions(+), 4 deletions(-)

-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-05 10:34 [PATCH v3 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
@ 2015-06-05 10:34 ` Arun Siluvery
  2015-06-05 10:56   ` Chris Wilson
                     ` (2 more replies)
  2015-06-05 10:34 ` [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 10:34 UTC (permalink / raw)
  To: intel-gfx

Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.
This version uses this approach.
(Thanks to Chris, Dave and Thomas for their inputs)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 168 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   9 ++
 2 files changed, 174 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..0b3422a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -211,9 +211,11 @@ enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 #define GEN8_CTX_ID_SHIFT 32
+#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
 		struct intel_context *ctx);
+static void lrc_destroy_ctx_wa_obj(struct intel_engine_cs *ring);
 
 /**
  * intel_sanitize_enable_execlists() - sanitize i915.enable_execlists
@@ -1077,6 +1079,96 @@ static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring,
 	return 0;
 }
 
+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
+{
+	int index;
+	int end;
+	struct page *page;
+	uint32_t *reg_state;
+
+	page = i915_gem_object_get_page(ring->ctx_wa.obj, 0);
+	reg_state = kmap_atomic(page);
+
+	index = ring->ctx_wa.indctx_batch_offset / sizeof(uint32_t);
+	end = index + (ring->ctx_wa.indctx_batch_size *
+		       CACHELINE_BYTES) / sizeof(uint32_t);
+
+	if ((end * sizeof(uint32_t)) > PAGE_SIZE) {
+		DRM_ERROR("context WA instruction exceeding alloted size\n");
+		kunmap_atomic(reg_state);
+		return -EINVAL;
+	}
+
+	/* FIXME: fill unused locations with NOOPs.
+	 * Replace these instructions with WA
+	 */
+        while (index < end)
+		reg_state[index++] = MI_NOOP;
+
+	/*
+	 * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+	 * execution depends on the length specified in terms of cache lines
+	 * in the register CTX_RCS_INDIRECT_CTX
+	 */
+
+	kunmap_atomic(reg_state);
+
+	return 0;
+}
+
+static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
+{
+	int index;
+	int end;
+	struct page *page;
+	uint32_t *reg_state;
+
+	page = i915_gem_object_get_page(ring->ctx_wa.obj, 0);
+	reg_state = kmap_atomic(page);
+
+	index = ring->ctx_wa.perctx_batch_offset / sizeof(uint32_t);
+	end = index + (ring->ctx_wa.perctx_batch_size *
+		       CACHELINE_BYTES) / sizeof(uint32_t);
+
+	if ((end * sizeof(uint32_t)) > PAGE_SIZE) {
+		DRM_ERROR("context WA instruction exceeding alloted size\n");
+		kunmap_atomic(reg_state);
+		return -EINVAL;
+	}
+
+	/* FIXME: fill unused locations with NOOPs.
+	 * Replace these instructions with WA
+	 */
+        while (index < end)
+		reg_state[index++] = MI_NOOP;
+
+	reg_state[index - 1] = MI_BATCH_BUFFER_END;
+	kunmap_atomic(reg_state);
+
+	return 0;
+}
+
+static int intel_init_workaround_bb(struct intel_engine_cs *ring)
+{
+	int ret;
+	struct drm_device *dev = ring->dev;
+
+	if (IS_GEN8(dev)) {
+		ret = gen8_init_indirectctx_bb(ring);
+		if (ret)
+			return ret;
+
+		ret = gen8_init_perctx_bb(ring);
+		if (ret)
+			return ret;
+	} else {
+		WARN_ONCE(INTEL_INFO(ring->dev)->gen >= 9,
+			"WA batch buffer is not initialized\n");
+	}
+
+	return 0;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -1754,15 +1846,25 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
 	reg_state[CTX_SECOND_BB_STATE+1] = 0;
 	if (ring->id == RCS) {
-		/* TODO: according to BSpec, the register state context
-		 * for CHV does not have these. OTOH, these registers do
-		 * exist in CHV. I'm waiting for a clarification */
 		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
 		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
 		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+		if (ring->ctx_wa.obj) {
+			reg_state[CTX_RCS_INDIRECT_CTX+1] =
+				(i915_gem_obj_ggtt_offset(ring->ctx_wa.obj) +
+				 ring->ctx_wa.indctx_batch_offset) |
+				ring->ctx_wa.indctx_batch_size;
+
+			reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] =
+				CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT << 6;
+
+			reg_state[CTX_BB_PER_CTX_PTR+1] =
+				(i915_gem_obj_ggtt_offset(ring->ctx_wa.obj) +
+				 ring->ctx_wa.perctx_batch_offset) | 0x01;
+		}
 	}
 	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
 	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
@@ -1822,6 +1924,8 @@ void intel_lr_context_free(struct intel_context *ctx)
 			if (ctx == ring->default_context) {
 				intel_unpin_ringbuffer_obj(ringbuf);
 				i915_gem_object_ggtt_unpin(ctx_obj);
+				if (ring->id == RCS)
+					lrc_destroy_ctx_wa_obj(ring);
 			}
 			WARN_ON(ctx->engine[ring->id].pin_count);
 			intel_destroy_ringbuffer_obj(ringbuf);
@@ -1872,6 +1976,46 @@ static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 	POSTING_READ(RING_HWS_PGA(ring->mmio_base));
 }
 
+static int lrc_setup_ctx_wa_obj(struct intel_engine_cs *ring, u32 size)
+{
+	int ret;
+	struct drm_device *dev = ring->dev;
+
+	WARN_ON(ring->id != RCS);
+
+	size = roundup(size, PAGE_SIZE);
+	ring->ctx_wa.obj = i915_gem_alloc_object(dev, size);
+	if (!ring->ctx_wa.obj) {
+		DRM_DEBUG_DRIVER("Alloc LRC Ctx WA backing obj failed.\n");
+		return -ENOMEM;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(ring->ctx_wa.obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("Pin LRC Ctx WA backing obj failed: %d\n",
+				 ret);
+		drm_gem_object_unreference(&ring->ctx_wa.obj->base);
+		return ret;
+	}
+
+	ring->ctx_wa.indctx_batch_offset = 0;
+	ring->ctx_wa.indctx_batch_size = 4; /* in cache lines */
+	ring->ctx_wa.perctx_batch_offset =
+		ring->ctx_wa.indctx_batch_size * CACHELINE_BYTES;
+	ring->ctx_wa.perctx_batch_size = 2;
+
+	return 0;
+}
+
+static void lrc_destroy_ctx_wa_obj(struct intel_engine_cs *ring)
+{
+	WARN_ON(ring->id != RCS);
+
+	i915_gem_object_ggtt_unpin(ring->ctx_wa.obj);
+	drm_gem_object_unreference(&ring->ctx_wa.obj->base);
+	ring->ctx_wa.obj = NULL;
+}
+
 /**
  * intel_lr_context_deferred_create() - create the LRC specific bits of a context
  * @ctx: LR context to create.
@@ -1954,6 +2098,22 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	}
 
+	if (ring->id == RCS && is_global_default_ctx) {
+		ret = lrc_setup_ctx_wa_obj(ring, PAGE_SIZE);
+		if (ret) {
+			DRM_DEBUG_DRIVER(
+				"Failed to setup context WA page: %d\n", ret);
+			goto error;
+		}
+
+		ret = intel_init_workaround_bb(ring);
+		if (ret) {
+			lrc_destroy_ctx_wa_obj(ring);
+			DRM_ERROR("WA batch buffers are not initialized: %d\n",
+				  ret);
+		}
+	}
+
 	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
@@ -1982,6 +2142,8 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 	return 0;
 
 error:
+	if (ring->id == RCS && is_global_default_ctx)
+		lrc_destroy_ctx_wa_obj(ring);
 	if (is_global_default_ctx)
 		intel_unpin_ringbuffer_obj(ringbuf);
 error_destroy_rbuf:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39f6dfc..61c1402 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -119,6 +119,14 @@ struct intel_ringbuffer {
 
 struct	intel_context;
 
+struct  i915_ctx_workarounds {
+	u32 indctx_batch_offset;
+	u32 indctx_batch_size;
+	u32 perctx_batch_offset;
+	u32 perctx_batch_size;
+	struct drm_i915_gem_object *obj;
+};
+
 struct  intel_engine_cs {
 	const char	*name;
 	enum intel_ring_id {
@@ -142,6 +150,7 @@ struct  intel_engine_cs {
 	struct i915_gem_batch_pool batch_pool;
 
 	struct intel_hw_status_page status_page;
+	struct i915_ctx_workarounds ctx_wa;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
  2015-06-05 10:34 [PATCH v3 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
  2015-06-05 10:34 ` [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
@ 2015-06-05 10:34 ` Arun Siluvery
  2015-06-05 13:55   ` Arun Siluvery
  2015-06-09 15:27   ` Dave Gordon
  2015-06-05 10:34 ` [PATCH v3 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 10:34 UTC (permalink / raw)
  To: intel-gfx

Some of the WA applied using WA batch buffers perform writes to scratch page.
In the current flow WA are initialized before scratch obj is allocated.
This patch reorders intel_init_pipe_control() to have a valid scratch obj
before we initialize WA.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0b3422a..20c56e4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1562,11 +1562,16 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	ring->dev = dev;
+
+	ret = intel_init_pipe_control(ring);
+	if (ret)
+		return ret;
+
 	ret = logical_ring_init(dev, ring);
 	if (ret)
 		return ret;
 
-	return intel_init_pipe_control(ring);
+	return 0;
 }
 
 static int logical_bsd_ring_init(struct drm_device *dev)
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
  2015-06-05 10:34 [PATCH v3 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
  2015-06-05 10:34 ` [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
  2015-06-05 10:34 ` [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
@ 2015-06-05 10:34 ` Arun Siluvery
  2015-06-05 13:56   ` Arun Siluvery
  2015-06-05 10:34 ` [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 10:34 UTC (permalink / raw)
  To: intel-gfx

In Indirect and Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 20c56e4..2fdb3da 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1099,9 +1099,10 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
 		return -EINVAL;
 	}
 
-	/* FIXME: fill unused locations with NOOPs.
-	 * Replace these instructions with WA
-	 */
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	reg_state[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+
+	/* padding */
         while (index < end)
 		reg_state[index++] = MI_NOOP;
 
@@ -1136,9 +1137,10 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
 		return -EINVAL;
 	}
 
-	/* FIXME: fill unused locations with NOOPs.
-	 * Replace these instructions with WA
-	 */
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	reg_state[index++] = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
+	/* padding */
         while (index < end)
 		reg_state[index++] = MI_NOOP;
 
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
  2015-06-05 10:34 [PATCH v3 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
                   ` (2 preceding siblings ...)
  2015-06-05 10:34 ` [PATCH v3 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
@ 2015-06-05 10:34 ` Arun Siluvery
  2015-06-05 13:56   ` Arun Siluvery
  2015-06-09 17:06   ` Dave Gordon
  2015-06-05 10:34 ` [PATCH v3 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
  2015-06-05 10:34 ` [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
  5 siblings, 2 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 10:34 UTC (permalink / raw)
  To: intel-gfx

In Indirect context w/a batch buffer,
+WaFlushCoherentL3CacheLinesAtContextSwitch

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  | 1 +
 drivers/gpu/drm/i915/intel_lrc.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 84af255..5203c79 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -426,6 +426,7 @@
 #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
 #define   PIPE_CONTROL_NOTIFY				(1<<8)
 #define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
+#define   PIPE_CONTROL_DC_FLUSH_ENABLE			(1<<5)
 #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
 #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
 #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2fdb3da..5b6c9db 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1102,6 +1102,14 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	reg_state[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE;
 
+	/* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw,chv */
+	reg_state[index++] = GFX_OP_PIPE_CONTROL(6);
+	reg_state[index++] = PIPE_CONTROL_DC_FLUSH_ENABLE;
+	reg_state[index++] = 0;
+	reg_state[index++] = 0;
+	reg_state[index++] = 0;
+	reg_state[index++] = 0;
+
 	/* padding */
         while (index < end)
 		reg_state[index++] = MI_NOOP;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
  2015-06-05 10:34 [PATCH v3 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
                   ` (3 preceding siblings ...)
  2015-06-05 10:34 ` [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
@ 2015-06-05 10:34 ` Arun Siluvery
  2015-06-05 13:57   ` Arun Siluvery
  2015-06-05 10:34 ` [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
  5 siblings, 1 reply; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 10:34 UTC (permalink / raw)
  To: intel-gfx

In Indirect context w/a batch buffer,
WaClearSlmSpaceAtContextSwitch

v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 5203c79..33b0ff1 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -410,6 +410,7 @@
 #define   DISPLAY_PLANE_A           (0<<20)
 #define   DISPLAY_PLANE_B           (1<<20)
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
+#define   PIPE_CONTROL_FLUSH_L3				(1<<27)
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
 #define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
 #define   PIPE_CONTROL_STORE_DATA_INDEX			(1<<21)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5b6c9db..bca137e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1085,6 +1085,13 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
 	int end;
 	struct page *page;
 	uint32_t *reg_state;
+	u32 scratch_addr;
+	unsigned long flags = 0;
+
+	if (ring->scratch.obj == NULL) {
+		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+		return -EINVAL;
+	}
 
 	page = i915_gem_object_get_page(ring->ctx_wa.obj, 0);
 	reg_state = kmap_atomic(page);
@@ -1110,6 +1117,23 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
 	reg_state[index++] = 0;
 	reg_state[index++] = 0;
 
+	/* WaClearSlmSpaceAtContextSwitch:bdw,chv */
+	flags = PIPE_CONTROL_FLUSH_L3 |
+		PIPE_CONTROL_GLOBAL_GTT_IVB |
+		PIPE_CONTROL_CS_STALL |
+		PIPE_CONTROL_QW_WRITE;
+
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
+	reg_state[index++] = GFX_OP_PIPE_CONTROL(6);
+	reg_state[index++] = flags;
+	reg_state[index++] = scratch_addr;
+	reg_state[index++] = 0;
+	reg_state[index++] = 0;
+	reg_state[index++] = 0;
+
 	/* padding */
         while (index < end)
 		reg_state[index++] = MI_NOOP;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-05 10:34 [PATCH v3 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
                   ` (4 preceding siblings ...)
  2015-06-05 10:34 ` [PATCH v3 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
@ 2015-06-05 10:34 ` Arun Siluvery
  2015-06-05 13:57   ` Arun Siluvery
  2015-06-06  8:20   ` shuang.he
  5 siblings, 2 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 10:34 UTC (permalink / raw)
  To: intel-gfx

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 33b0ff1..6928162 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,26 @@
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
 #define   MI_FLUSH_DW_USE_PPGTT		(0<<2)
+#define MI_ATOMIC(len)	MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT	(1<<22)
+#define   MI_ATOMIC_INLINE_DATA		(1<<18)
+#define   MI_ATOMIC_CS_STALL		(1<<17)
+#define   MI_ATOMIC_RETURN_DATA_CTL	(1<<16)
+#define MI_ATOMIC_OP_MASK(op)  ((op) << 8)
+#define MI_ATOMIC_AND	MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR	MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR	MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE	MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC	MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC	MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD	MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB	MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB	MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX	MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN	MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX	MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN	MI_ATOMIC_OP_MASK(0x0D)
+
 #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
 #define   MI_BATCH_NON_SECURE		(1)
 /* for snb/ivb/vlv this also means "batch in ppgtt" when ppgtt is enabled. */
@@ -453,6 +473,10 @@
 #define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
 #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
 #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
+#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
+#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
 #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
 #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
 #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
@@ -1799,6 +1823,8 @@ enum skl_disp_power_wells {
 #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE	(1 << 12)
 #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE	(1<<10)
 
+#define GEN8_RS_PREEMPT_STATUS		0x215C
+
 /* Fuse readout registers for GT */
 #define CHV_FUSE_GT			(VLV_DISPLAY_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0		(1 << 10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index bca137e..61b1e22 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1155,6 +1155,13 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
 	int end;
 	struct page *page;
 	uint32_t *reg_state;
+	u32 scratch_addr;
+	unsigned long flags = 0;
+
+	if (ring->scratch.obj == NULL) {
+		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+		return -EINVAL;
+	}
 
 	page = i915_gem_object_get_page(ring->ctx_wa.obj, 0);
 	reg_state = kmap_atomic(page);
@@ -1169,9 +1176,61 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
 		return -EINVAL;
 	}
 
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	reg_state[index++] = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 
+	/*
+	 * As per Bspec, to workaround a known HW issue, SW must perform the
+	 * below programming sequence prior to programming MI_BATCH_BUFFER_END.
+	 *
+	 * This is only applicable for Gen8.
+	 */
+
+	/* WaRsRestoreWithPerCtxtBb:bdw,chv */
+	reg_state[index++] = MI_LOAD_REGISTER_IMM(1);
+	reg_state[index++] = INSTPM;
+	reg_state[index++] = _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING);
+
+	flags = MI_ATOMIC_MEMORY_TYPE_GGTT |
+		MI_ATOMIC_INLINE_DATA |
+		MI_ATOMIC_CS_STALL |
+		MI_ATOMIC_RETURN_DATA_CTL |
+		MI_ATOMIC_MOVE;
+
+	reg_state[index++] = MI_ATOMIC(5) | flags;
+	reg_state[index++] = scratch_addr;
+	reg_state[index++] = 0;
+	reg_state[index++] = _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING);
+	reg_state[index++] = _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING);
+
+	/*
+	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
+	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
+	 * in the same cacheline.
+	 */
+	while (((unsigned long) (reg_state + index) % CACHELINE_BYTES) != 0)
+		reg_state[index++] = MI_NOOP;
+
+	reg_state[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
+		MI_LRM_USE_GLOBAL_GTT |
+		MI_LRM_ASYNC_MODE_ENABLE;
+	reg_state[index++] = INSTPM;
+	reg_state[index++] = scratch_addr;
+	reg_state[index++] = 0;
+
+	/*
+	 * BSpec says there should not be any commands programmed
+	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
+	 * do not add any new commands
+	 */
+	reg_state[index++] = MI_LOAD_REGISTER_REG_GEN8;
+	reg_state[index++] = GEN8_RS_PREEMPT_STATUS;
+	reg_state[index++] = GEN8_RS_PREEMPT_STATUS;
+
 	/* padding */
         while (index < end)
 		reg_state[index++] = MI_NOOP;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-05 10:34 ` [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
@ 2015-06-05 10:56   ` Chris Wilson
  2015-06-05 11:24     ` Siluvery, Arun
  2015-06-05 11:00   ` Chris Wilson
  2015-06-05 13:54   ` Arun Siluvery
  2 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2015-06-05 10:56 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
> Some of the WA are to be applied during context save but before restore and
> some at the end of context save/restore but before executing the instructions
> in the ring, WA batch buffers are created for this purpose and these WA cannot
> be applied using normal means. Each context has two registers to load the
> offsets of these batch buffers. If they are non-zero, HW understands that it
> need to execute these batches.
> 
> v1: In this version two separate ring_buffer objects were used to load WA
> instructions for indirect and per context batch buffers and they were part
> of every context.
> 
> v2: Chris suggested to include additional page in context and use it to load
> these WA instead of creating separate objects. This will simplify lot of things
> as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
> is planning to use a similar setup to share data between GuC and driver and
> WA batch buffers can probably share that page. However after discussions with
> Dave who is implementing GuC changes, he suggested to use an independent page
> for the reasons - GuC area might grow and these WA are initialized only once and
> are not changed afterwards so we can share them share across all contexts.
> This version uses this approach.
> (Thanks to Chris, Dave and Thomas for their inputs)

Having moved to a shared wa_ctx for all lrc, I think it makes sense to
then do the allocation during ring_init itself, next to the scratch/hws
status pages. The advantage is that we don't then need to add more
special cases to the default ctx on RCS, and its permanence is far more
prominent. It will also be more consistent with calling it ring->wa_ctx.

Since you have it already plumbed into ring init/fini, why is it partly
done during default ctx init? Maybe all that is required a little bit of
code and changelog explanation.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-05 10:34 ` [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
  2015-06-05 10:56   ` Chris Wilson
@ 2015-06-05 11:00   ` Chris Wilson
  2015-06-15 15:22     ` Daniel Vetter
  2015-06-05 13:54   ` Arun Siluvery
  2 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2015-06-05 11:00 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
> +	/* FIXME: fill unused locations with NOOPs.
> +	 * Replace these instructions with WA
> +	 */
> +        while (index < end)
> +		reg_state[index++] = MI_NOOP;

I found calling it reg_state was very confusing. Maybe batch, bb, data or cmd?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-05 10:56   ` Chris Wilson
@ 2015-06-05 11:24     ` Siluvery, Arun
  2015-06-05 11:36       ` Chris Wilson
  0 siblings, 1 reply; 33+ messages in thread
From: Siluvery, Arun @ 2015-06-05 11:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 05/06/2015 11:56, Chris Wilson wrote:
> On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
>> Some of the WA are to be applied during context save but before restore and
>> some at the end of context save/restore but before executing the instructions
>> in the ring, WA batch buffers are created for this purpose and these WA cannot
>> be applied using normal means. Each context has two registers to load the
>> offsets of these batch buffers. If they are non-zero, HW understands that it
>> need to execute these batches.
>>
>> v1: In this version two separate ring_buffer objects were used to load WA
>> instructions for indirect and per context batch buffers and they were part
>> of every context.
>>
>> v2: Chris suggested to include additional page in context and use it to load
>> these WA instead of creating separate objects. This will simplify lot of things
>> as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
>> is planning to use a similar setup to share data between GuC and driver and
>> WA batch buffers can probably share that page. However after discussions with
>> Dave who is implementing GuC changes, he suggested to use an independent page
>> for the reasons - GuC area might grow and these WA are initialized only once and
>> are not changed afterwards so we can share them share across all contexts.
>> This version uses this approach.
>> (Thanks to Chris, Dave and Thomas for their inputs)
>
> Having moved to a shared wa_ctx for all lrc, I think it makes sense to
> then do the allocation during ring_init itself, next to the scratch/hws
> status pages. The advantage is that we don't then need to add more
> special cases to the default ctx on RCS, and its permanence is far more
> prominent. It will also be more consistent with calling it ring->wa_ctx.
>
> Since you have it already plumbed into ring init/fini, why is it partly
> done during default ctx init? Maybe all that is required a little bit of
> code and changelog explanation.
> -Chris
>
ok, it is possible to do the allocation and setup in logical_ring_init() 
itself. I wanted to group it with other wa which are setup in 
init_context().

I will also change s/reg_state/cmd.

regards
Arun


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-05 11:24     ` Siluvery, Arun
@ 2015-06-05 11:36       ` Chris Wilson
  2015-06-05 11:56         ` Siluvery, Arun
  0 siblings, 1 reply; 33+ messages in thread
From: Chris Wilson @ 2015-06-05 11:36 UTC (permalink / raw)
  To: Siluvery, Arun; +Cc: intel-gfx

On Fri, Jun 05, 2015 at 12:24:58PM +0100, Siluvery, Arun wrote:
> ok, it is possible to do the allocation and setup in
> logical_ring_init() itself. I wanted to group it with other wa which
> are setup in init_context().

Phew, I had worried I had missed something. The issue the current split
between ring init and default context init raises in my mind is that the
content has context dependencies upon it - whereas the w/a so far can be
reused globally. So imo the current split is more confusing than just
creating the w/a buffer entirely during ring init.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-05 11:36       ` Chris Wilson
@ 2015-06-05 11:56         ` Siluvery, Arun
  0 siblings, 0 replies; 33+ messages in thread
From: Siluvery, Arun @ 2015-06-05 11:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

On 05/06/2015 12:36, Chris Wilson wrote:
> On Fri, Jun 05, 2015 at 12:24:58PM +0100, Siluvery, Arun wrote:
>> ok, it is possible to do the allocation and setup in
>> logical_ring_init() itself. I wanted to group it with other wa which
>> are setup in init_context().
>
> Phew, I had worried I had missed something. The issue the current split
> between ring init and default context init raises in my mind is that the
> content has context dependencies upon it - whereas the w/a so far can be
> reused globally. So imo the current split is more confusing than just
> creating the w/a buffer entirely during ring init.
> -Chris
>
I am moving it to logical_render_ring_init() as these are only 
applicable for RCS.

regards
Arun

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-05 10:34 ` [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
  2015-06-05 10:56   ` Chris Wilson
  2015-06-05 11:00   ` Chris Wilson
@ 2015-06-05 13:54   ` Arun Siluvery
  2 siblings, 0 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 13:54 UTC (permalink / raw)
  To: intel-gfx

Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_contenxt.

(Thanks to Chris, Dave and Thomas for their inputs)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 177 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   9 ++
 2 files changed, 181 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..4e68b54 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -211,6 +211,7 @@ enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 #define GEN8_CTX_ID_SHIFT 32
+#define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
 static int intel_lr_context_pin(struct intel_engine_cs *ring,
 		struct intel_context *ctx);
@@ -1077,6 +1078,96 @@ static int intel_logical_ring_workarounds_emit(struct intel_engine_cs *ring,
 	return 0;
 }
 
+static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
+{
+	int index;
+	int end;
+	struct page *page;
+	uint32_t *cmd;
+
+	page = i915_gem_object_get_page(ring->wa_ctx.obj, 0);
+	cmd = kmap_atomic(page);
+
+	index = ring->wa_ctx.indctx_batch_offset / sizeof(uint32_t);
+	end = index + (ring->wa_ctx.indctx_batch_size *
+		       CACHELINE_BYTES) / sizeof(uint32_t);
+
+	if ((end * sizeof(uint32_t)) > PAGE_SIZE) {
+		DRM_ERROR("context WA instruction exceeding alloted size\n");
+		kunmap_atomic(cmd);
+		return -EINVAL;
+	}
+
+	/* FIXME: fill unused locations with NOOPs.
+	 * Replace these instructions with WA
+	 */
+        while (index < end)
+		cmd[index++] = MI_NOOP;
+
+	/*
+	 * MI_BATCH_BUFFER_END is not required in Indirect ctx BB because
+	 * execution depends on the length specified in terms of cache lines
+	 * in the register CTX_RCS_INDIRECT_CTX
+	 */
+
+	kunmap_atomic(cmd);
+
+	return 0;
+}
+
+static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
+{
+	int index;
+	int end;
+	struct page *page;
+	uint32_t *cmd;
+
+	page = i915_gem_object_get_page(ring->wa_ctx.obj, 0);
+	cmd = kmap_atomic(page);
+
+	index = ring->wa_ctx.perctx_batch_offset / sizeof(uint32_t);
+	end = index + (ring->wa_ctx.perctx_batch_size *
+		       CACHELINE_BYTES) / sizeof(uint32_t);
+
+	if ((end * sizeof(uint32_t)) > PAGE_SIZE) {
+		DRM_ERROR("context WA instruction exceeding alloted size\n");
+		kunmap_atomic(cmd);
+		return -EINVAL;
+	}
+
+	/* FIXME: fill unused locations with NOOPs.
+	 * Replace these instructions with WA
+	 */
+        while (index < end)
+		cmd[index++] = MI_NOOP;
+
+	cmd[index - 1] = MI_BATCH_BUFFER_END;
+	kunmap_atomic(cmd);
+
+	return 0;
+}
+
+static int intel_init_workaround_bb(struct intel_engine_cs *ring)
+{
+	int ret;
+
+	if (INTEL_INFO(ring->dev)->gen == 8) {
+		ret = gen8_init_indirectctx_bb(ring);
+		if (ret)
+			return ret;
+
+		ret = gen8_init_perctx_bb(ring);
+		if (ret)
+			return ret;
+	} else {
+		WARN(INTEL_INFO(ring->dev)->gen >= 9,
+		     "WA batch buffer is not initialized for Gen%d\n",
+		     INTEL_INFO(ring->dev)->gen);
+	}
+
+	return 0;
+}
+
 static int gen8_init_common_ring(struct intel_engine_cs *ring)
 {
 	struct drm_device *dev = ring->dev;
@@ -1382,6 +1473,46 @@ static int gen8_init_rcs_context(struct intel_engine_cs *ring,
 	return intel_lr_context_render_state_init(ring, ctx);
 }
 
+static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
+{
+	int ret;
+	struct drm_device *dev = ring->dev;
+
+	WARN_ON(ring->id != RCS);
+
+	size = roundup(size, PAGE_SIZE);
+	ring->wa_ctx.obj = i915_gem_alloc_object(dev, size);
+	if (!ring->wa_ctx.obj) {
+		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
+		return -ENOMEM;
+	}
+
+	ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, GEN8_LR_CONTEXT_ALIGN, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("pin LRC WA ctx backing obj failed: %d\n",
+				 ret);
+		drm_gem_object_unreference(&ring->wa_ctx.obj->base);
+		return ret;
+	}
+
+	ring->wa_ctx.indctx_batch_offset = 0;
+	ring->wa_ctx.indctx_batch_size = 4; /* in cache lines */
+	ring->wa_ctx.perctx_batch_offset =
+		ring->wa_ctx.indctx_batch_size * CACHELINE_BYTES;
+	ring->wa_ctx.perctx_batch_size = 2;
+
+	return 0;
+}
+
+static void lrc_destroy_wa_ctx_obj(struct intel_engine_cs *ring)
+{
+	WARN_ON(ring->id != RCS);
+
+	i915_gem_object_ggtt_unpin(ring->wa_ctx.obj);
+	drm_gem_object_unreference(&ring->wa_ctx.obj->base);
+	ring->wa_ctx.obj = NULL;
+}
+
 /**
  * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer
  *
@@ -1470,11 +1601,37 @@ static int logical_render_ring_init(struct drm_device *dev)
 	ring->emit_bb_start = gen8_emit_bb_start;
 
 	ring->dev = dev;
+
+	if (INTEL_INFO(ring->dev)->gen >= 8) {
+		ret = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE);
+		if (ret) {
+			DRM_DEBUG_DRIVER("Failed to setup context WA page: %d\n",
+					 ret);
+			return ret;
+		}
+
+		ret = intel_init_workaround_bb(ring);
+		if (ret) {
+			lrc_destroy_wa_ctx_obj(ring);
+			DRM_ERROR("WA batch buffers are not initialized: %d\n",
+				  ret);
+		}
+	}
+
 	ret = logical_ring_init(dev, ring);
 	if (ret)
-		return ret;
+		goto clear_wa_ctx;
+
+	ret = intel_init_pipe_control(ring);
+	if (ret)
+		goto clear_wa_ctx;
+
+	return 0;
 
-	return intel_init_pipe_control(ring);
+clear_wa_ctx:
+	if (ring->wa_ctx.obj)
+		lrc_destroy_wa_ctx_obj(ring);
+	return ret;
 }
 
 static int logical_bsd_ring_init(struct drm_device *dev)
@@ -1754,15 +1911,25 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_SECOND_BB_STATE] = ring->mmio_base + 0x118;
 	reg_state[CTX_SECOND_BB_STATE+1] = 0;
 	if (ring->id == RCS) {
-		/* TODO: according to BSpec, the register state context
-		 * for CHV does not have these. OTOH, these registers do
-		 * exist in CHV. I'm waiting for a clarification */
 		reg_state[CTX_BB_PER_CTX_PTR] = ring->mmio_base + 0x1c0;
 		reg_state[CTX_BB_PER_CTX_PTR+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX] = ring->mmio_base + 0x1c4;
 		reg_state[CTX_RCS_INDIRECT_CTX+1] = 0;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET] = ring->mmio_base + 0x1c8;
 		reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] = 0;
+		if (ring->wa_ctx.obj) {
+			reg_state[CTX_RCS_INDIRECT_CTX+1] =
+				(i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
+				 ring->wa_ctx.indctx_batch_offset) |
+				ring->wa_ctx.indctx_batch_size;
+
+			reg_state[CTX_RCS_INDIRECT_CTX_OFFSET+1] =
+				CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT << 6;
+
+			reg_state[CTX_BB_PER_CTX_PTR+1] =
+				(i915_gem_obj_ggtt_offset(ring->wa_ctx.obj) +
+				 ring->wa_ctx.perctx_batch_offset) | 0x01;
+		}
 	}
 	reg_state[CTX_LRI_HEADER_1] = MI_LOAD_REGISTER_IMM(9);
 	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39f6dfc..2a0f7ce 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -119,6 +119,14 @@ struct intel_ringbuffer {
 
 struct	intel_context;
 
+struct  i915_ctx_workarounds {
+	u32 indctx_batch_offset;
+	u32 indctx_batch_size;
+	u32 perctx_batch_offset;
+	u32 perctx_batch_size;
+	struct drm_i915_gem_object *obj;
+};
+
 struct  intel_engine_cs {
 	const char	*name;
 	enum intel_ring_id {
@@ -142,6 +150,7 @@ struct  intel_engine_cs {
 	struct i915_gem_batch_pool batch_pool;
 
 	struct intel_hw_status_page status_page;
+	struct i915_ctx_workarounds wa_ctx;
 
 	unsigned irq_refcount; /* protected by dev_priv->irq_lock */
 	u32		irq_enable_mask;	/* bitmask to enable ring interrupt */
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
  2015-06-05 10:34 ` [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
@ 2015-06-05 13:55   ` Arun Siluvery
  2015-06-09 15:27   ` Dave Gordon
  1 sibling, 0 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 13:55 UTC (permalink / raw)
  To: intel-gfx

Some of the WA applied using WA batch buffers perform writes to scratch page.
In the current flow WA are initialized before scratch obj is allocated.
This patch reorders intel_init_pipe_control() to have a valid scratch obj
before we initialize WA.

Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4e68b54..68df878 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1602,6 +1602,10 @@ static int logical_render_ring_init(struct drm_device *dev)
 
 	ring->dev = dev;
 
+	ret = intel_init_pipe_control(ring);
+	if (ret)
+		return ret;
+
 	if (INTEL_INFO(ring->dev)->gen >= 8) {
 		ret = lrc_setup_wa_ctx_obj(ring, PAGE_SIZE);
 		if (ret) {
@@ -1622,10 +1626,6 @@ static int logical_render_ring_init(struct drm_device *dev)
 	if (ret)
 		goto clear_wa_ctx;
 
-	ret = intel_init_pipe_control(ring);
-	if (ret)
-		goto clear_wa_ctx;
-
 	return 0;
 
 clear_wa_ctx:
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround
  2015-06-05 10:34 ` [PATCH v3 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
@ 2015-06-05 13:56   ` Arun Siluvery
  0 siblings, 0 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 13:56 UTC (permalink / raw)
  To: intel-gfx

In Indirect and Per context w/a batch buffer,
+WaDisableCtxRestoreArbitration

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 68df878..a71eb81 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1098,9 +1098,10 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
 		return -EINVAL;
 	}
 
-	/* FIXME: fill unused locations with NOOPs.
-	 * Replace these instructions with WA
-	 */
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	cmd[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+
+	/* padding */
         while (index < end)
 		cmd[index++] = MI_NOOP;
 
@@ -1135,9 +1136,10 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
 		return -EINVAL;
 	}
 
-	/* FIXME: fill unused locations with NOOPs.
-	 * Replace these instructions with WA
-	 */
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	cmd[index++] = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
+	/* padding */
         while (index < end)
 		cmd[index++] = MI_NOOP;
 
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
  2015-06-05 10:34 ` [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
@ 2015-06-05 13:56   ` Arun Siluvery
  2015-06-05 14:48     ` Ville Syrjälä
  2015-06-09 17:06   ` Dave Gordon
  1 sibling, 1 reply; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 13:56 UTC (permalink / raw)
  To: intel-gfx

In Indirect context w/a batch buffer,
+WaFlushCoherentL3CacheLinesAtContextSwitch

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  | 1 +
 drivers/gpu/drm/i915/intel_lrc.c | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 84af255..5203c79 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -426,6 +426,7 @@
 #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
 #define   PIPE_CONTROL_NOTIFY				(1<<8)
 #define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
+#define   PIPE_CONTROL_DC_FLUSH_ENABLE			(1<<5)
 #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
 #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
 #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a71eb81..9d8cf65c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1101,6 +1101,14 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	cmd[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE;
 
+	/* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw,chv */
+	cmd[index++] = GFX_OP_PIPE_CONTROL(6);
+	cmd[index++] = PIPE_CONTROL_DC_FLUSH_ENABLE;
+	cmd[index++] = 0;
+	cmd[index++] = 0;
+	cmd[index++] = 0;
+	cmd[index++] = 0;
+
 	/* padding */
         while (index < end)
 		cmd[index++] = MI_NOOP;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround
  2015-06-05 10:34 ` [PATCH v3 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
@ 2015-06-05 13:57   ` Arun Siluvery
  0 siblings, 0 replies; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 13:57 UTC (permalink / raw)
  To: intel-gfx

In Indirect context w/a batch buffer,
WaClearSlmSpaceAtContextSwitch

v2: s/PIPE_CONTROL_FLUSH_RO_CACHES/PIPE_CONTROL_FLUSH_L3 (Ville)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 5203c79..33b0ff1 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -410,6 +410,7 @@
 #define   DISPLAY_PLANE_A           (0<<20)
 #define   DISPLAY_PLANE_B           (1<<20)
 #define GFX_OP_PIPE_CONTROL(len)	((0x3<<29)|(0x3<<27)|(0x2<<24)|(len-2))
+#define   PIPE_CONTROL_FLUSH_L3				(1<<27)
 #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24) /* gen7+ */
 #define   PIPE_CONTROL_MMIO_WRITE			(1<<23)
 #define   PIPE_CONTROL_STORE_DATA_INDEX			(1<<21)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9d8cf65c..5f6279b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1084,6 +1084,13 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
 	int end;
 	struct page *page;
 	uint32_t *cmd;
+	u32 scratch_addr;
+	unsigned long flags = 0;
+
+	if (ring->scratch.obj == NULL) {
+		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+		return -EINVAL;
+	}
 
 	page = i915_gem_object_get_page(ring->wa_ctx.obj, 0);
 	cmd = kmap_atomic(page);
@@ -1109,6 +1116,23 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
 	cmd[index++] = 0;
 	cmd[index++] = 0;
 
+	/* WaClearSlmSpaceAtContextSwitch:bdw,chv */
+	flags = PIPE_CONTROL_FLUSH_L3 |
+		PIPE_CONTROL_GLOBAL_GTT_IVB |
+		PIPE_CONTROL_CS_STALL |
+		PIPE_CONTROL_QW_WRITE;
+
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
+	cmd[index++] = GFX_OP_PIPE_CONTROL(6);
+	cmd[index++] = flags;
+	cmd[index++] = scratch_addr;
+	cmd[index++] = 0;
+	cmd[index++] = 0;
+	cmd[index++] = 0;
+
 	/* padding */
         while (index < end)
 		cmd[index++] = MI_NOOP;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-05 10:34 ` [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
@ 2015-06-05 13:57   ` Arun Siluvery
  2015-06-09 18:43     ` Dave Gordon
  2015-06-06  8:20   ` shuang.he
  1 sibling, 1 reply; 33+ messages in thread
From: Arun Siluvery @ 2015-06-05 13:57 UTC (permalink / raw)
  To: intel-gfx

In Per context w/a batch buffer,
WaRsRestoreWithPerCtxtBb

v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
so as to not break any future users of existing definitions (Michel)

Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 33b0ff1..6928162 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -347,6 +347,26 @@
 #define   MI_INVALIDATE_BSD		(1<<7)
 #define   MI_FLUSH_DW_USE_GTT		(1<<2)
 #define   MI_FLUSH_DW_USE_PPGTT		(0<<2)
+#define MI_ATOMIC(len)	MI_INSTR(0x2F, (len-2))
+#define   MI_ATOMIC_MEMORY_TYPE_GGTT	(1<<22)
+#define   MI_ATOMIC_INLINE_DATA		(1<<18)
+#define   MI_ATOMIC_CS_STALL		(1<<17)
+#define   MI_ATOMIC_RETURN_DATA_CTL	(1<<16)
+#define MI_ATOMIC_OP_MASK(op)  ((op) << 8)
+#define MI_ATOMIC_AND	MI_ATOMIC_OP_MASK(0x01)
+#define MI_ATOMIC_OR	MI_ATOMIC_OP_MASK(0x02)
+#define MI_ATOMIC_XOR	MI_ATOMIC_OP_MASK(0x03)
+#define MI_ATOMIC_MOVE	MI_ATOMIC_OP_MASK(0x04)
+#define MI_ATOMIC_INC	MI_ATOMIC_OP_MASK(0x05)
+#define MI_ATOMIC_DEC	MI_ATOMIC_OP_MASK(0x06)
+#define MI_ATOMIC_ADD	MI_ATOMIC_OP_MASK(0x07)
+#define MI_ATOMIC_SUB	MI_ATOMIC_OP_MASK(0x08)
+#define MI_ATOMIC_RSUB	MI_ATOMIC_OP_MASK(0x09)
+#define MI_ATOMIC_IMAX	MI_ATOMIC_OP_MASK(0x0A)
+#define MI_ATOMIC_IMIN	MI_ATOMIC_OP_MASK(0x0B)
+#define MI_ATOMIC_UMAX	MI_ATOMIC_OP_MASK(0x0C)
+#define MI_ATOMIC_UMIN	MI_ATOMIC_OP_MASK(0x0D)
+
 #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
 #define   MI_BATCH_NON_SECURE		(1)
 /* for snb/ivb/vlv this also means "batch in ppgtt" when ppgtt is enabled. */
@@ -453,6 +473,10 @@
 #define   MI_REPORT_PERF_COUNT_GGTT (1<<0)
 #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
 #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
+#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
+#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
+#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
+#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
 #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
 #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
 #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
@@ -1799,6 +1823,8 @@ enum skl_disp_power_wells {
 #define   GEN8_RC_SEMA_IDLE_MSG_DISABLE	(1 << 12)
 #define   GEN8_FF_DOP_CLOCK_GATE_DISABLE	(1<<10)
 
+#define GEN8_RS_PREEMPT_STATUS		0x215C
+
 /* Fuse readout registers for GT */
 #define CHV_FUSE_GT			(VLV_DISPLAY_BASE + 0x2168)
 #define   CHV_FGT_DISABLE_SS0		(1 << 10)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5f6279b..98335c6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1154,6 +1154,13 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
 	int end;
 	struct page *page;
 	uint32_t *cmd;
+	u32 scratch_addr;
+	unsigned long flags = 0;
+
+	if (ring->scratch.obj == NULL) {
+		DRM_ERROR("scratch page not allocated for %s\n", ring->name);
+		return -EINVAL;
+	}
 
 	page = i915_gem_object_get_page(ring->wa_ctx.obj, 0);
 	cmd = kmap_atomic(page);
@@ -1168,9 +1175,61 @@ static int gen8_init_perctx_bb(struct intel_engine_cs *ring)
 		return -EINVAL;
 	}
 
+	/* Actual scratch location is at 128 bytes offset */
+	scratch_addr = ring->scratch.gtt_offset + 2*CACHELINE_BYTES;
+	scratch_addr |= PIPE_CONTROL_GLOBAL_GTT;
+
 	/* WaDisableCtxRestoreArbitration:bdw,chv */
 	cmd[index++] = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 
+	/*
+	 * As per Bspec, to workaround a known HW issue, SW must perform the
+	 * below programming sequence prior to programming MI_BATCH_BUFFER_END.
+	 *
+	 * This is only applicable for Gen8.
+	 */
+
+	/* WaRsRestoreWithPerCtxtBb:bdw,chv */
+	cmd[index++] = MI_LOAD_REGISTER_IMM(1);
+	cmd[index++] = INSTPM;
+	cmd[index++] = _MASKED_BIT_DISABLE(INSTPM_FORCE_ORDERING);
+
+	flags = MI_ATOMIC_MEMORY_TYPE_GGTT |
+		MI_ATOMIC_INLINE_DATA |
+		MI_ATOMIC_CS_STALL |
+		MI_ATOMIC_RETURN_DATA_CTL |
+		MI_ATOMIC_MOVE;
+
+	cmd[index++] = MI_ATOMIC(5) | flags;
+	cmd[index++] = scratch_addr;
+	cmd[index++] = 0;
+	cmd[index++] = _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING);
+	cmd[index++] = _MASKED_BIT_ENABLE(INSTPM_FORCE_ORDERING);
+
+	/*
+	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
+	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
+	 * in the same cacheline.
+	 */
+	while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
+		cmd[index++] = MI_NOOP;
+
+	cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
+		MI_LRM_USE_GLOBAL_GTT |
+		MI_LRM_ASYNC_MODE_ENABLE;
+	cmd[index++] = INSTPM;
+	cmd[index++] = scratch_addr;
+	cmd[index++] = 0;
+
+	/*
+	 * BSpec says there should not be any commands programmed
+	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
+	 * do not add any new commands
+	 */
+	cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
+	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
+
 	/* padding */
         while (index < end)
 		cmd[index++] = MI_NOOP;
-- 
2.3.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
  2015-06-05 13:56   ` Arun Siluvery
@ 2015-06-05 14:48     ` Ville Syrjälä
  2015-06-12 11:51       ` Siluvery, Arun
  0 siblings, 1 reply; 33+ messages in thread
From: Ville Syrjälä @ 2015-06-05 14:48 UTC (permalink / raw)
  To: Arun Siluvery; +Cc: intel-gfx

On Fri, Jun 05, 2015 at 02:56:48PM +0100, Arun Siluvery wrote:
> In Indirect context w/a batch buffer,
> +WaFlushCoherentL3CacheLinesAtContextSwitch
> 
> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_reg.h  | 1 +
>  drivers/gpu/drm/i915/intel_lrc.c | 8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 84af255..5203c79 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -426,6 +426,7 @@
>  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
>  #define   PIPE_CONTROL_NOTIFY				(1<<8)
>  #define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
> +#define   PIPE_CONTROL_DC_FLUSH_ENABLE			(1<<5)
>  #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
>  #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
>  #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a71eb81..9d8cf65c 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1101,6 +1101,14 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
>  	/* WaDisableCtxRestoreArbitration:bdw,chv */
>  	cmd[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE;
>  
> +	/* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw,chv */
> +	cmd[index++] = GFX_OP_PIPE_CONTROL(6);
> +	cmd[index++] = PIPE_CONTROL_DC_FLUSH_ENABLE;
> +	cmd[index++] = 0;
> +	cmd[index++] = 0;
> +	cmd[index++] = 0;
> +	cmd[index++] = 0;
> +

This looks incomplete. Seems like you should have LRIs around this
guy to enable/disable the L3SQCREG4 coherent line flush bit.

And chv shouldn't do coherent L3, so this might not be needed there.

Also do we need a CS stall here too?
"DC Flush Enable 5 Requires stall bit ([20] of DW) set for all GPGPU and Media Workloads."

Supposedly we should add the DC flush to the normal ring flush hooks
too. But that's a separate issue.

>  	/* padding */
>          while (index < end)
>  		cmd[index++] = MI_NOOP;
> -- 
> 2.3.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-05 10:34 ` [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
  2015-06-05 13:57   ` Arun Siluvery
@ 2015-06-06  8:20   ` shuang.he
  1 sibling, 0 replies; 33+ messages in thread
From: shuang.he @ 2015-06-06  8:20 UTC (permalink / raw)
  To: shuang.he, lei.a.liu, intel-gfx, arun.siluvery

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6545
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                                  276/276              276/276
ILK                                  303/303              303/303
SNB                                  312/312              312/312
IVB                                  343/343              343/343
BYT                                  287/287              287/287
BDW                                  321/321              321/321
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
  2015-06-05 10:34 ` [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
  2015-06-05 13:55   ` Arun Siluvery
@ 2015-06-09 15:27   ` Dave Gordon
  2015-06-09 15:34     ` Siluvery, Arun
  1 sibling, 1 reply; 33+ messages in thread
From: Dave Gordon @ 2015-06-09 15:27 UTC (permalink / raw)
  To: Arun Siluvery, intel-gfx

On 05/06/15 11:34, Arun Siluvery wrote:
> Some of the WA applied using WA batch buffers perform writes to scratch page.
> In the current flow WA are initialized before scratch obj is allocated.
> This patch reorders intel_init_pipe_control() to have a valid scratch obj
> before we initialize WA.
> 
> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0b3422a..20c56e4 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1562,11 +1562,16 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->emit_bb_start = gen8_emit_bb_start;
>  
>  	ring->dev = dev;
> +
> +	ret = intel_init_pipe_control(ring);
> +	if (ret)
> +		return ret;
> +
>  	ret = logical_ring_init(dev, ring);
>  	if (ret)
>  		return ret;
>  
> -	return intel_init_pipe_control(ring);
> +	return 0;
>  }

You could squash the last several lines down to just:

	return logical_ring_init(dev, ring);

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode
  2015-06-09 15:27   ` Dave Gordon
@ 2015-06-09 15:34     ` Siluvery, Arun
  0 siblings, 0 replies; 33+ messages in thread
From: Siluvery, Arun @ 2015-06-09 15:34 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx

On 09/06/2015 16:27, Dave Gordon wrote:
> On 05/06/15 11:34, Arun Siluvery wrote:
>> Some of the WA applied using WA batch buffers perform writes to scratch page.
>> In the current flow WA are initialized before scratch obj is allocated.
>> This patch reorders intel_init_pipe_control() to have a valid scratch obj
>> before we initialize WA.
>>
>> Signed-off-by: Michel Thierry <michel.thierry@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_lrc.c | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 0b3422a..20c56e4 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1562,11 +1562,16 @@ static int logical_render_ring_init(struct drm_device *dev)
>>   	ring->emit_bb_start = gen8_emit_bb_start;
>>
>>   	ring->dev = dev;
>> +
>> +	ret = intel_init_pipe_control(ring);
>> +	if (ret)
>> +		return ret;
>> +
>>   	ret = logical_ring_init(dev, ring);
>>   	if (ret)
>>   		return ret;
>>
>> -	return intel_init_pipe_control(ring);
>> +	return 0;
>>   }
>
> You could squash the last several lines down to just:
>
> 	return logical_ring_init(dev, ring);
>
> .Dave.
>

yes but this is updated based on suggestion from Chris to allocate 
wa_ctx page during ring init itself; in the updated version it is not 
possible as we need to free that page also if ring_init fails.
I sent the updated patches in the same series so as to collate all 
reviews instead of resending as a separate series.

regards
Arun

>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
  2015-06-05 10:34 ` [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
  2015-06-05 13:56   ` Arun Siluvery
@ 2015-06-09 17:06   ` Dave Gordon
  1 sibling, 0 replies; 33+ messages in thread
From: Dave Gordon @ 2015-06-09 17:06 UTC (permalink / raw)
  To: Arun Siluvery, intel-gfx

On 05/06/15 11:34, Arun Siluvery wrote:
> In Indirect context w/a batch buffer,
> +WaFlushCoherentL3CacheLinesAtContextSwitch
> 
> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_reg.h  | 1 +
>  drivers/gpu/drm/i915/intel_lrc.c | 8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 84af255..5203c79 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -426,6 +426,7 @@
>  #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
>  #define   PIPE_CONTROL_NOTIFY				(1<<8)
>  #define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
> +#define   PIPE_CONTROL_DC_FLUSH_ENABLE			(1<<5)
>  #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
>  #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
>  #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 2fdb3da..5b6c9db 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1102,6 +1102,14 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
>  	/* WaDisableCtxRestoreArbitration:bdw,chv */
>  	reg_state[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE;
>  
> +	/* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw,chv */
> +	reg_state[index++] = GFX_OP_PIPE_CONTROL(6);
> +	reg_state[index++] = PIPE_CONTROL_DC_FLUSH_ENABLE;
> +	reg_state[index++] = 0;
> +	reg_state[index++] = 0;
> +	reg_state[index++] = 0;
> +	reg_state[index++] = 0;

Is DC_FLUSH valid without setting the CS_STALL bit?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-05 13:57   ` Arun Siluvery
@ 2015-06-09 18:43     ` Dave Gordon
  2015-06-12 11:58       ` Siluvery, Arun
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Gordon @ 2015-06-09 18:43 UTC (permalink / raw)
  To: Arun Siluvery, intel-gfx

On 05/06/15 14:57, Arun Siluvery wrote:
> In Per context w/a batch buffer,
> WaRsRestoreWithPerCtxtBb
> 
> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
> so as to not break any future users of existing definitions (Michel)
> 
> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 33b0ff1..6928162 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
[snip]
>  #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>  #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)

Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
a two-operand instruction, each of which is a one-word MMIO register
address, hence always 3 words total. The length bias is 2, so the
so-called 'flags' field must be 1. The original definition (where the
second argument of the MI_INSTR macro is 0) shouldn't work.

So just correct the original definition of MI_LOAD_REGISTER_REG; this
isn't something that's actually changed on GEN8.

While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.

>  #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
>  #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
>  #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)

And these are wrong too! In fact all of these instructions have been
added under a comment which says "Commands used only by the command
parser". Looks like they were added as placeholders without the proper
length fields, and then people have started using them as though they
were complete definitions :(

Time update them all, perhaps ...

[snip]

> +	/*
> +	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
> +	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
> +	 * in the same cacheline.
> +	 */
> +	while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
> +		cmd[index++] = MI_NOOP;
> +
> +	cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
> +		MI_LRM_USE_GLOBAL_GTT |
> +		MI_LRM_ASYNC_MODE_ENABLE;
> +	cmd[index++] = INSTPM;
> +	cmd[index++] = scratch_addr;
> +	cmd[index++] = 0;
> +
> +	/*
> +	 * BSpec says there should not be any commands programmed
> +	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
> +	 * do not add any new commands
> +	 */
> +	cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
> +	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
> +	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
> +
>  	/* padding */
>          while (index < end)
>  		cmd[index++] = MI_NOOP;
> 

Where's the MI_BATCH_BUFFER_END referred to in the comment?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround
  2015-06-05 14:48     ` Ville Syrjälä
@ 2015-06-12 11:51       ` Siluvery, Arun
  0 siblings, 0 replies; 33+ messages in thread
From: Siluvery, Arun @ 2015-06-12 11:51 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On 05/06/2015 15:48, Ville Syrjälä wrote:
> On Fri, Jun 05, 2015 at 02:56:48PM +0100, Arun Siluvery wrote:
>> In Indirect context w/a batch buffer,
>> +WaFlushCoherentL3CacheLinesAtContextSwitch
>>
>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_reg.h  | 1 +
>>   drivers/gpu/drm/i915/intel_lrc.c | 8 ++++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index 84af255..5203c79 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -426,6 +426,7 @@
>>   #define   PIPE_CONTROL_INDIRECT_STATE_DISABLE		(1<<9)
>>   #define   PIPE_CONTROL_NOTIFY				(1<<8)
>>   #define   PIPE_CONTROL_FLUSH_ENABLE			(1<<7) /* gen7+ */
>> +#define   PIPE_CONTROL_DC_FLUSH_ENABLE			(1<<5)
>>   #define   PIPE_CONTROL_VF_CACHE_INVALIDATE		(1<<4)
>>   #define   PIPE_CONTROL_CONST_CACHE_INVALIDATE		(1<<3)
>>   #define   PIPE_CONTROL_STATE_CACHE_INVALIDATE		(1<<2)
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index a71eb81..9d8cf65c 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1101,6 +1101,14 @@ static int gen8_init_indirectctx_bb(struct intel_engine_cs *ring)
>>   	/* WaDisableCtxRestoreArbitration:bdw,chv */
>>   	cmd[index++] = MI_ARB_ON_OFF | MI_ARB_DISABLE;
>>
>> +	/* WaFlushCoherentL3CacheLinesAtContextSwitch:bdw,chv */
>> +	cmd[index++] = GFX_OP_PIPE_CONTROL(6);
>> +	cmd[index++] = PIPE_CONTROL_DC_FLUSH_ENABLE;
>> +	cmd[index++] = 0;
>> +	cmd[index++] = 0;
>> +	cmd[index++] = 0;
>> +	cmd[index++] = 0;
>> +
>
> This looks incomplete. Seems like you should have LRIs around this
> guy to enable/disable the L3SQCREG4 coherent line flush bit.
>
> And chv shouldn't do coherent L3, so this might not be needed there.
>

I checked with HW team and yes I need to add LRIs to enable/disable 
L3SQCREG4 coherent line flush bit.
As you mentioned, it is not required for CHV.

> Also do we need a CS stall here too?
> "DC Flush Enable 5 Requires stall bit ([20] of DW) set for all GPGPU and Media Workloads."
>
I didn't check the restrictions of this bit, will check again and correc 
this.

regards
Arun

> Supposedly we should add the DC flush to the normal ring flush hooks
> too. But that's a separate issue.
>
>>   	/* padding */
>>           while (index < end)
>>   		cmd[index++] = MI_NOOP;
>> --
>> 2.3.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-09 18:43     ` Dave Gordon
@ 2015-06-12 11:58       ` Siluvery, Arun
  2015-06-12 17:03         ` Dave Gordon
  0 siblings, 1 reply; 33+ messages in thread
From: Siluvery, Arun @ 2015-06-12 11:58 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx

On 09/06/2015 19:43, Dave Gordon wrote:
> On 05/06/15 14:57, Arun Siluvery wrote:
>> In Per context w/a batch buffer,
>> WaRsRestoreWithPerCtxtBb
>>
>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>> so as to not break any future users of existing definitions (Michel)
>>
>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c | 59 ++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 85 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index 33b0ff1..6928162 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
> [snip]
>>   #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>>   #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
>> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
>> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
>
> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
> a two-operand instruction, each of which is a one-word MMIO register
> address, hence always 3 words total. The length bias is 2, so the
> so-called 'flags' field must be 1. The original definition (where the
> second argument of the MI_INSTR macro is 0) shouldn't work.
>
> So just correct the original definition of MI_LOAD_REGISTER_REG; this
> isn't something that's actually changed on GEN8.
>
I did notice that the original instructions are odd but thought I might 
be wrong hence I created new ones to not disturb the original ones.
ok I will just correct original one and reuse it.

> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.
>
ok.
>>   #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
>>   #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
>>   #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
>
> And these are wrong too! In fact all of these instructions have been
> added under a comment which says "Commands used only by the command
> parser". Looks like they were added as placeholders without the proper
> length fields, and then people have started using them as though they
> were complete definitions :(
>
> Time update them all, perhaps ...
these are not related to this patch, so it can be taken up as a 
different patch.
>
> [snip]
>
>> +	/*
>> +	 * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>> +	 * MI_BATCH_BUFFER_END instructions in this sequence need to be
>> +	 * in the same cacheline.
>> +	 */
>> +	while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>> +		cmd[index++] = MI_NOOP;
>> +
>> +	cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>> +		MI_LRM_USE_GLOBAL_GTT |
>> +		MI_LRM_ASYNC_MODE_ENABLE;
>> +	cmd[index++] = INSTPM;
>> +	cmd[index++] = scratch_addr;
>> +	cmd[index++] = 0;
>> +
>> +	/*
>> +	 * BSpec says there should not be any commands programmed
>> +	 * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>> +	 * do not add any new commands
>> +	 */
>> +	cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>> +	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>> +	cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>> +
>>   	/* padding */
>>           while (index < end)
>>   		cmd[index++] = MI_NOOP;
>>
>
> Where's the MI_BATCH_BUFFER_END referred to in the comment?

MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
Since the diff context is only few lines it didn't showup in the diff.

regards
Arun

>
> .Dave.
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-12 11:58       ` Siluvery, Arun
@ 2015-06-12 17:03         ` Dave Gordon
  2015-06-15 14:10           ` Siluvery, Arun
  2015-06-15 15:27           ` Daniel Vetter
  0 siblings, 2 replies; 33+ messages in thread
From: Dave Gordon @ 2015-06-12 17:03 UTC (permalink / raw)
  To: Siluvery, Arun, intel-gfx

On 12/06/15 12:58, Siluvery, Arun wrote:
> On 09/06/2015 19:43, Dave Gordon wrote:
>> On 05/06/15 14:57, Arun Siluvery wrote:
>>> In Per context w/a batch buffer,
>>> WaRsRestoreWithPerCtxtBb
>>>
>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>> so as to not break any future users of existing definitions (Michel)
>>>
>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>   drivers/gpu/drm/i915/intel_lrc.c | 59
>>> ++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 85 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>> b/drivers/gpu/drm/i915/i915_reg.h
>>> index 33b0ff1..6928162 100644
>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> [snip]
>>>   #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>>>   #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
>>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
>>> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
>>> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
>>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
>>
>> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
>> a two-operand instruction, each of which is a one-word MMIO register
>> address, hence always 3 words total. The length bias is 2, so the
>> so-called 'flags' field must be 1. The original definition (where the
>> second argument of the MI_INSTR macro is 0) shouldn't work.
>>
>> So just correct the original definition of MI_LOAD_REGISTER_REG; this
>> isn't something that's actually changed on GEN8.
>>
> I did notice that the original instructions are odd but thought I might
> be wrong hence I created new ones to not disturb the original ones.
> ok I will just correct original one and reuse it.
> 
>> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
>> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.
>>
> ok.
>>>   #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
>>>   #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
>>>   #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
>>
>> And these are wrong too! In fact all of these instructions have been
>> added under a comment which says "Commands used only by the command
>> parser". Looks like they were added as placeholders without the proper
>> length fields, and then people have started using them as though they
>> were complete definitions :(
>>
>> Time update them all, perhaps ...
> these are not related to this patch, so it can be taken up as a
> different patch.

As a minimum, you should move these updated #defines out of the section
under the comment "Commands used only by the command parser" and put
them in the appropriate place in the regular list of MI_ commnds,
preferably in numerical order. Then the ones that are genuinely only
used by the command parser could be left for another patch ...

>> [snip]
>>
>>> +    /*
>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>> +     * in the same cacheline.
>>> +     */
>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>> +        cmd[index++] = MI_NOOP;
>>> +
>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>> +        MI_LRM_USE_GLOBAL_GTT |
>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>> +    cmd[index++] = INSTPM;
>>> +    cmd[index++] = scratch_addr;
>>> +    cmd[index++] = 0;
>>> +
>>> +    /*
>>> +     * BSpec says there should not be any commands programmed
>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>> +     * do not add any new commands
>>> +     */
>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>> +
>>>       /* padding */
>>>           while (index < end)
>>>           cmd[index++] = MI_NOOP;
>>>
>>
>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
> 
> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
> Since the diff context is only few lines it didn't showup in the diff.

The second comment above says "no commands between LOAD_REG_REG and
BB_END", so the point of my comment was that the BB_END is *NOT*
immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!

And therefore also, these instructions do *not* all end up in the same
cacheline, thus contradicting the first comment above.

Padding *after* a BB_END would be redundant.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-12 17:03         ` Dave Gordon
@ 2015-06-15 14:10           ` Siluvery, Arun
  2015-06-15 17:29             ` Dave Gordon
  2015-06-15 15:27           ` Daniel Vetter
  1 sibling, 1 reply; 33+ messages in thread
From: Siluvery, Arun @ 2015-06-15 14:10 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx

On 12/06/2015 18:03, Dave Gordon wrote:
> On 12/06/15 12:58, Siluvery, Arun wrote:
>> On 09/06/2015 19:43, Dave Gordon wrote:
>>> On 05/06/15 14:57, Arun Siluvery wrote:
>>>> In Per context w/a batch buffer,
>>>> WaRsRestoreWithPerCtxtBb
>>>>
>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>>> so as to not break any future users of existing definitions (Michel)
>>>>
>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>>    drivers/gpu/drm/i915/intel_lrc.c | 59
>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>    2 files changed, 85 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>>> b/drivers/gpu/drm/i915/i915_reg.h
>>>> index 33b0ff1..6928162 100644
>>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>> [snip]
>>>>    #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
>>>>    #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
>>>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
>>>> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
>>>> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
>>>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
>>>
>>> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
>>> a two-operand instruction, each of which is a one-word MMIO register
>>> address, hence always 3 words total. The length bias is 2, so the
>>> so-called 'flags' field must be 1. The original definition (where the
>>> second argument of the MI_INSTR macro is 0) shouldn't work.
>>>
>>> So just correct the original definition of MI_LOAD_REGISTER_REG; this
>>> isn't something that's actually changed on GEN8.
>>>
>> I did notice that the original instructions are odd but thought I might
>> be wrong hence I created new ones to not disturb the original ones.
>> ok I will just correct original one and reuse it.
>>
>>> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
>>> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.
>>>
>> ok.
>>>>    #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
>>>>    #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
>>>>    #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
>>>
>>> And these are wrong too! In fact all of these instructions have been
>>> added under a comment which says "Commands used only by the command
>>> parser". Looks like they were added as placeholders without the proper
>>> length fields, and then people have started using them as though they
>>> were complete definitions :(
>>>
>>> Time update them all, perhaps ...
>> these are not related to this patch, so it can be taken up as a
>> different patch.
>
> As a minimum, you should move these updated #defines out of the section
> under the comment "Commands used only by the command parser" and put
> them in the appropriate place in the regular list of MI_ commnds,
> preferably in numerical order. Then the ones that are genuinely only
> used by the command parser could be left for another patch ...
>
>>> [snip]
>>>
>>>> +    /*
>>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>>> +     * in the same cacheline.
>>>> +     */
>>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>>> +        cmd[index++] = MI_NOOP;
>>>> +
>>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>>> +        MI_LRM_USE_GLOBAL_GTT |
>>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>>> +    cmd[index++] = INSTPM;
>>>> +    cmd[index++] = scratch_addr;
>>>> +    cmd[index++] = 0;
>>>> +
>>>> +    /*
>>>> +     * BSpec says there should not be any commands programmed
>>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>>> +     * do not add any new commands
>>>> +     */
>>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>> +
>>>>        /* padding */
>>>>            while (index < end)
>>>>            cmd[index++] = MI_NOOP;
>>>>
>>>
>>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
>>
>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
>> Since the diff context is only few lines it didn't showup in the diff.
>
> The second comment above says "no commands between LOAD_REG_REG and
> BB_END", so the point of my comment was that the BB_END is *NOT*
> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
true, but they are no-ops so they shouldn't really affect anything. I 
guess the spec means no valid commands.

>
> And therefore also, these instructions do *not* all end up in the same
> cacheline, thus contradicting the first comment above.
I don't understand why. As per the requirement the commands from the 
first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be 
part of same cacheline (in this case second line).

>
> Padding *after* a BB_END would be redundant.

yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of 
abruptly terminating the batch which is why I am padding with no-ops, I 
can change this if that is preferred.
>
> .Dave.
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-05 11:00   ` Chris Wilson
@ 2015-06-15 15:22     ` Daniel Vetter
  2015-06-15 15:23       ` Siluvery, Arun
  0 siblings, 1 reply; 33+ messages in thread
From: Daniel Vetter @ 2015-06-15 15:22 UTC (permalink / raw)
  To: Chris Wilson, Arun Siluvery, intel-gfx

On Fri, Jun 05, 2015 at 12:00:54PM +0100, Chris Wilson wrote:
> On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
> > +	/* FIXME: fill unused locations with NOOPs.
> > +	 * Replace these instructions with WA
> > +	 */
> > +        while (index < end)
> > +		reg_state[index++] = MI_NOOP;
> 
> I found calling it reg_state was very confusing. Maybe batch, bb, data or cmd?

Concurred, reg_state sounds like an mmio dump not a batchbuffer. wa_batch
would be my naming bikeshed, but I'd go with either. If this is all that
needs changing I can do that while applying patches.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize WA batch buffers
  2015-06-15 15:22     ` Daniel Vetter
@ 2015-06-15 15:23       ` Siluvery, Arun
  0 siblings, 0 replies; 33+ messages in thread
From: Siluvery, Arun @ 2015-06-15 15:23 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, intel-gfx

On 15/06/2015 16:22, Daniel Vetter wrote:
> On Fri, Jun 05, 2015 at 12:00:54PM +0100, Chris Wilson wrote:
>> On Fri, Jun 05, 2015 at 11:34:01AM +0100, Arun Siluvery wrote:
>>> +	/* FIXME: fill unused locations with NOOPs.
>>> +	 * Replace these instructions with WA
>>> +	 */
>>> +        while (index < end)
>>> +		reg_state[index++] = MI_NOOP;
>>
>> I found calling it reg_state was very confusing. Maybe batch, bb, data or cmd?
>
> Concurred, reg_state sounds like an mmio dump not a batchbuffer. wa_batch
> would be my naming bikeshed, but I'd go with either. If this is all that
> needs changing I can do that while applying patches.
> -Daniel
>
I have already changed this to cmd.
There are some more comments from Dave which I am addressing now, I will 
send them soon.

regards
Arun

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-12 17:03         ` Dave Gordon
  2015-06-15 14:10           ` Siluvery, Arun
@ 2015-06-15 15:27           ` Daniel Vetter
  1 sibling, 0 replies; 33+ messages in thread
From: Daniel Vetter @ 2015-06-15 15:27 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Fri, Jun 12, 2015 at 06:03:55PM +0100, Dave Gordon wrote:
> On 12/06/15 12:58, Siluvery, Arun wrote:
> > On 09/06/2015 19:43, Dave Gordon wrote:
> >> On 05/06/15 14:57, Arun Siluvery wrote:
> >>> In Per context w/a batch buffer,
> >>> WaRsRestoreWithPerCtxtBb
> >>>
> >>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
> >>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
> >>> so as to not break any future users of existing definitions (Michel)
> >>>
> >>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
> >>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> >>> ---
> >>>   drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
> >>>   drivers/gpu/drm/i915/intel_lrc.c | 59
> >>> ++++++++++++++++++++++++++++++++++++++++
> >>>   2 files changed, 85 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> >>> b/drivers/gpu/drm/i915/i915_reg.h
> >>> index 33b0ff1..6928162 100644
> >>> --- a/drivers/gpu/drm/i915/i915_reg.h
> >>> +++ b/drivers/gpu/drm/i915/i915_reg.h
> >> [snip]
> >>>   #define MI_LOAD_REGISTER_MEM    MI_INSTR(0x29, 0)
> >>>   #define MI_LOAD_REGISTER_REG    MI_INSTR(0x2A, 0)
> >>> +#define MI_LOAD_REGISTER_MEM_GEN8 MI_INSTR(0x29, 2)
> >>> +#define   MI_LRM_USE_GLOBAL_GTT (1<<22)
> >>> +#define   MI_LRM_ASYNC_MODE_ENABLE (1<<21)
> >>> +#define MI_LOAD_REGISTER_REG_GEN8 MI_INSTR(0x2A, 1)
> >>
> >> Isn't the original definition of MI_LOAD_REGISTER_REG wrong anyway? It's
> >> a two-operand instruction, each of which is a one-word MMIO register
> >> address, hence always 3 words total. The length bias is 2, so the
> >> so-called 'flags' field must be 1. The original definition (where the
> >> second argument of the MI_INSTR macro is 0) shouldn't work.
> >>
> >> So just correct the original definition of MI_LOAD_REGISTER_REG; this
> >> isn't something that's actually changed on GEN8.
> >>
> > I did notice that the original instructions are odd but thought I might
> > be wrong hence I created new ones to not disturb the original ones.
> > ok I will just correct original one and reuse it.
> > 
> >> While we're mentioning it, I think the above MI_LOAD_REGISTER_MEM is
> >> wrong too. The length should be 1 pre-GEN8, and 2 in GEN8+.
> >>
> > ok.
> >>>   #define MI_RS_STORE_DATA_IMM    MI_INSTR(0x2B, 0)
> >>>   #define MI_LOAD_URB_MEM         MI_INSTR(0x2C, 0)
> >>>   #define MI_STORE_URB_MEM        MI_INSTR(0x2D, 0)
> >>
> >> And these are wrong too! In fact all of these instructions have been
> >> added under a comment which says "Commands used only by the command
> >> parser". Looks like they were added as placeholders without the proper
> >> length fields, and then people have started using them as though they
> >> were complete definitions :(
> >>
> >> Time update them all, perhaps ...
> > these are not related to this patch, so it can be taken up as a
> > different patch.
> 
> As a minimum, you should move these updated #defines out of the section
> under the comment "Commands used only by the command parser" and put
> them in the appropriate place in the regular list of MI_ commnds,
> preferably in numerical order. Then the ones that are genuinely only
> used by the command parser could be left for another patch ...

Please just correct the #defines while at it, this really is way to
tempting a trap to keep it hot. Can be done in a separate patch ofc, but
imo not fixing an obvious issue when we spot it because its not perfectly
directly related to the feature work at hand is bad practice leading to
piles of technical debt.

And that's the kind of stuff that robs me of my sleep at night ;-)

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-15 14:10           ` Siluvery, Arun
@ 2015-06-15 17:29             ` Dave Gordon
  2015-06-15 18:09               ` Siluvery, Arun
  0 siblings, 1 reply; 33+ messages in thread
From: Dave Gordon @ 2015-06-15 17:29 UTC (permalink / raw)
  To: Siluvery, Arun, intel-gfx

On 15/06/15 15:10, Siluvery, Arun wrote:
> On 12/06/2015 18:03, Dave Gordon wrote:
>> On 12/06/15 12:58, Siluvery, Arun wrote:
>>> On 09/06/2015 19:43, Dave Gordon wrote:
>>>> On 05/06/15 14:57, Arun Siluvery wrote:
>>>>> In Per context w/a batch buffer,
>>>>> WaRsRestoreWithPerCtxtBb
>>>>>
>>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>>>> so as to not break any future users of existing definitions (Michel)
>>>>>
>>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>>>    drivers/gpu/drm/i915/intel_lrc.c | 59
>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>    2 files changed, 85 insertions(+)

[snip]

>>>>> +    /*
>>>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>>>> +     * in the same cacheline.
>>>>> +     */
>>>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>>>> +        cmd[index++] = MI_NOOP;
>>>>> +
>>>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>>>> +        MI_LRM_USE_GLOBAL_GTT |
>>>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>>>> +    cmd[index++] = INSTPM;
>>>>> +    cmd[index++] = scratch_addr;
>>>>> +    cmd[index++] = 0;
>>>>> +
>>>>> +    /*
>>>>> +     * BSpec says there should not be any commands programmed
>>>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>>>> +     * do not add any new commands
>>>>> +     */
>>>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>> +
>>>>>        /* padding */
>>>>>            while (index < end)
>>>>>            cmd[index++] = MI_NOOP;
>>>>>
>>>>
>>>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
>>>
>>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
>>> Since the diff context is only few lines it didn't showup in the diff.
>>
>> The second comment above says "no commands between LOAD_REG_REG and
>> BB_END", so the point of my comment was that the BB_END is *NOT*
>> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
>
> true, but they are no-ops so they shouldn't really affect anything. I
> guess the spec means no valid commands.

I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command,
and I've even seen cases where a workaround specifically requires "a
NOOP with the set-no-op-id-register bit set" to fix some particular bug.
The only special thing about NOOP is that it doesn't get captured in IPEHR.

I think the w/a requires this:

0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE)

no gaps, no insertions, all together, all on one cacheline. Those
instructions take up 8 DWords (32 bytes) so the sequence doesn't
necessarily have to start on a cacheline boundary, as long as it's
entirely within the same line. But it's simpler to start on a new line.
You seem to have:

0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END

so the condition in the comment is not fulfilled. If this works, maybe
the comment is wrong.

>> And therefore also, these instructions do *not* all end up in the same
>> cacheline, thus contradicting the first comment above.
>
> I don't understand why. As per the requirement the commands from the
> first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be
> part of same cacheline (in this case second line).

OK, they're all in the same line; I didn't look back at the full context
enough and thought 'end' would point to the end of the buffer, rather
than the end of the cacheline .. because it /does/ point to the end of
the buffer, it just happens to be the end of the very same cacheline as
well.

So I really don't like the way the sizes of the two workaround batches
have been defined in terms of cache lines. Also I'm not keen on one bit
of code allocating the object and defining the sizes of the sub-areas
within it, and then separate functions filling in each of the sequences
within these areas, "knowing" that the areas are /just the right size/.
It would be simpler to maintain if the "size in cachelines" values in
lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync
with the number of instructions written by gen8_init_perctx_bb() and
gen8_init_indirectctx_bb().

How about having each of these return the number of bytes they've
appended to the (u32 *)buffer that they've been given, and let the
caller manage mapping/unmapping, alignment, padding, etc, and fill in
the size fields accordingly *after* the content has been defined?

.Dave.

>> Padding *after* a BB_END would be redundant.
> 
> yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of
> abruptly terminating the batch which is why I am padding with no-ops, I
> can change this if that is preferred.
>>
>> .Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround
  2015-06-15 17:29             ` Dave Gordon
@ 2015-06-15 18:09               ` Siluvery, Arun
  0 siblings, 0 replies; 33+ messages in thread
From: Siluvery, Arun @ 2015-06-15 18:09 UTC (permalink / raw)
  To: Dave Gordon, intel-gfx

On 15/06/2015 18:29, Dave Gordon wrote:
> On 15/06/15 15:10, Siluvery, Arun wrote:
>> On 12/06/2015 18:03, Dave Gordon wrote:
>>> On 12/06/15 12:58, Siluvery, Arun wrote:
>>>> On 09/06/2015 19:43, Dave Gordon wrote:
>>>>> On 05/06/15 14:57, Arun Siluvery wrote:
>>>>>> In Per context w/a batch buffer,
>>>>>> WaRsRestoreWithPerCtxtBb
>>>>>>
>>>>>> v2: This patches modifies definitions of MI_LOAD_REGISTER_MEM and
>>>>>> MI_LOAD_REGISTER_REG; Add GEN8 specific defines for these instructions
>>>>>> so as to not break any future users of existing definitions (Michel)
>>>>>>
>>>>>> Signed-off-by: Rafael Barbalho <rafael.barbalho@intel.com>
>>>>>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/i915_reg.h  | 26 ++++++++++++++++++
>>>>>>     drivers/gpu/drm/i915/intel_lrc.c | 59
>>>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>>>     2 files changed, 85 insertions(+)
>
> [snip]
>
>>>>>> +    /*
>>>>>> +     * BSpec says MI_LOAD_REGISTER_MEM, MI_LOAD_REGISTER_REG and
>>>>>> +     * MI_BATCH_BUFFER_END instructions in this sequence need to be
>>>>>> +     * in the same cacheline.
>>>>>> +     */
>>>>>> +    while (((unsigned long) (cmd + index) % CACHELINE_BYTES) != 0)
>>>>>> +        cmd[index++] = MI_NOOP;
>>>>>> +
>>>>>> +    cmd[index++] = MI_LOAD_REGISTER_MEM_GEN8 |
>>>>>> +        MI_LRM_USE_GLOBAL_GTT |
>>>>>> +        MI_LRM_ASYNC_MODE_ENABLE;
>>>>>> +    cmd[index++] = INSTPM;
>>>>>> +    cmd[index++] = scratch_addr;
>>>>>> +    cmd[index++] = 0;
>>>>>> +
>>>>>> +    /*
>>>>>> +     * BSpec says there should not be any commands programmed
>>>>>> +     * between MI_LOAD_REGISTER_REG and MI_BATCH_BUFFER_END so
>>>>>> +     * do not add any new commands
>>>>>> +     */
>>>>>> +    cmd[index++] = MI_LOAD_REGISTER_REG_GEN8;
>>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>>> +    cmd[index++] = GEN8_RS_PREEMPT_STATUS;
>>>>>> +
>>>>>>         /* padding */
>>>>>>             while (index < end)
>>>>>>             cmd[index++] = MI_NOOP;
>>>>>>
>>>>>
>>>>> Where's the MI_BATCH_BUFFER_END referred to in the comment?
>>>>
>>>> MI_BATCH_BUFFER_END is just below while loop, it is in patch [v3 1/6].
>>>> Since the diff context is only few lines it didn't showup in the diff.
>>>
>>> The second comment above says "no commands between LOAD_REG_REG and
>>> BB_END", so the point of my comment was that the BB_END is *NOT*
>>> immediately after the LOAD_REG_REG -- there are a bunch of no-ops there!
>>
>> true, but they are no-ops so they shouldn't really affect anything. I
>> guess the spec means no valid commands.
>
> I guess the spec means "NO COMMANDS". NOOP is a perfectly valid command,
> and I've even seen cases where a workaround specifically requires "a
> NOOP with the set-no-op-id-register bit set" to fix some particular bug.
> The only special thing about NOOP is that it doesn't get captured in IPEHR.
>
> I think the w/a requires this:
>
> 0%CLSIZE: ... LRM (reg, addr, 0) LRR (reg, reg) BB_END ... (63%CLSIZE)
>
> no gaps, no insertions, all together, all on one cacheline. Those
> instructions take up 8 DWords (32 bytes) so the sequence doesn't
> necessarily have to start on a cacheline boundary, as long as it's
> entirely within the same line. But it's simpler to start on a new line.
> You seem to have:
>
> 0%CLSIZE: LRM (reg, mem, 0) LRR (reg, reg) NOP NOP NOP BB_END
>
> so the condition in the comment is not fulfilled. If this works, maybe
> the comment is wrong.
>
>>> And therefore also, these instructions do *not* all end up in the same
>>> cacheline, thus contradicting the first comment above.
>>
>> I don't understand why. As per the requirement the commands from the
>> first MI_LOAD_REGISTER_MEM_GEN8 (after while) through BB_END will be
>> part of same cacheline (in this case second line).
>
> OK, they're all in the same line; I didn't look back at the full context
> enough and thought 'end' would point to the end of the buffer, rather
> than the end of the cacheline .. because it /does/ point to the end of
> the buffer, it just happens to be the end of the very same cacheline as
> well.
>
> So I really don't like the way the sizes of the two workaround batches
> have been defined in terms of cache lines. Also I'm not keen on one bit
> of code allocating the object and defining the sizes of the sub-areas
> within it, and then separate functions filling in each of the sequences
> within these areas, "knowing" that the areas are /just the right size/.
> It would be simpler to maintain if the "size in cachelines" values in
> lrc_setup_ctx_wa_obj() didn't have to be hand-edited to stay in sync
> with the number of instructions written by gen8_init_perctx_bb() and
> gen8_init_indirectctx_bb().
>
> How about having each of these return the number of bytes they've
> appended to the (u32 *)buffer that they've been given, and let the
> caller manage mapping/unmapping, alignment, padding, etc, and fill in
> the size fields accordingly *after* the content has been defined?

This is an issue, editing the size if more WA are added is not good, it 
can be changed as you suggested.

regards
Arun

>
> .Dave.
>
>>> Padding *after* a BB_END would be redundant.
>>
>> yes, I just wanted to keep MI_BATCH_BUFFER_END at the end instead of
>> abruptly terminating the batch which is why I am padding with no-ops, I
>> can change this if that is preferred.
>>>
>>> .Dave.
>
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-06-15 18:11 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-05 10:34 [PATCH v3 0/6] Add Per-context WA using WA batch buffers Arun Siluvery
2015-06-05 10:34 ` [PATCH v3 1/6] drm/i915/gen8: Add infrastructure to initialize " Arun Siluvery
2015-06-05 10:56   ` Chris Wilson
2015-06-05 11:24     ` Siluvery, Arun
2015-06-05 11:36       ` Chris Wilson
2015-06-05 11:56         ` Siluvery, Arun
2015-06-05 11:00   ` Chris Wilson
2015-06-15 15:22     ` Daniel Vetter
2015-06-15 15:23       ` Siluvery, Arun
2015-06-05 13:54   ` Arun Siluvery
2015-06-05 10:34 ` [PATCH v3 2/6] drm/i915/gen8: Re-order init pipe_control in lrc mode Arun Siluvery
2015-06-05 13:55   ` Arun Siluvery
2015-06-09 15:27   ` Dave Gordon
2015-06-09 15:34     ` Siluvery, Arun
2015-06-05 10:34 ` [PATCH v3 3/6] drm/i915/gen8: Add WaDisableCtxRestoreArbitration workaround Arun Siluvery
2015-06-05 13:56   ` Arun Siluvery
2015-06-05 10:34 ` [PATCH v3 4/6] drm/i915/gen8: Add WaFlushCoherentL3CacheLinesAtContextSwitch workaround Arun Siluvery
2015-06-05 13:56   ` Arun Siluvery
2015-06-05 14:48     ` Ville Syrjälä
2015-06-12 11:51       ` Siluvery, Arun
2015-06-09 17:06   ` Dave Gordon
2015-06-05 10:34 ` [PATCH v3 5/6] drm/i915/gen8: Add WaClearSlmSpaceAtContextSwitch workaround Arun Siluvery
2015-06-05 13:57   ` Arun Siluvery
2015-06-05 10:34 ` [PATCH v3 6/6] drm/i915/gen8: Add WaRsRestoreWithPerCtxtBb workaround Arun Siluvery
2015-06-05 13:57   ` Arun Siluvery
2015-06-09 18:43     ` Dave Gordon
2015-06-12 11:58       ` Siluvery, Arun
2015-06-12 17:03         ` Dave Gordon
2015-06-15 14:10           ` Siluvery, Arun
2015-06-15 17:29             ` Dave Gordon
2015-06-15 18:09               ` Siluvery, Arun
2015-06-15 15:27           ` Daniel Vetter
2015-06-06  8:20   ` shuang.he

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.