[CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD

All of lore.kernel.org
 help / color / mirror / Atom feed

* [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD
@ 2017-10-03 20:34 Chris Wilson
  2017-10-03 20:34 ` [CI 2/9] drm/i915/preempt: Default to disabled mid-command preemption levels Chris Wilson
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

From: Jeff McGee <jeff.mcgee@intel.com>

The WA applies to all production Gen9 and requires both enabling and
whitelisting of the per-context preemption control register.

v2: Extend to Cannonlake.

Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index a28e2a864cf1..040c85e496fa 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1076,8 +1076,10 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	/* WaEnablePreemptionGranularityControlByUMD:skl,bxt,kbl,cfl */
-	ret= wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
+	/* WaEnablePreemptionGranularityControlByUMD:skl,bxt,kbl,cfl,[cnl] */
+	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
+		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
+	ret = wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
 	if (ret)
 		return ret;
 
@@ -1139,14 +1141,6 @@ static int skl_init_workarounds(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	/*
-	 * Actual WA is to disable percontext preemption granularity control
-	 * until D0 which is the default case so this is equivalent to
-	 * !WaDisablePerCtxtPreemptionGranularityControl:skl
-	 */
-	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
-		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
-
 	/* WaEnableGapsTsvCreditFix:skl */
 	I915_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) |
 				   GEN9_GAPS_TSV_CREDIT_DISABLE));
@@ -1279,6 +1273,8 @@ static int cnl_init_workarounds(struct intel_engine_cs *engine)
 	WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3, CNL_FAST_ANISO_L1_BANKING_FIX);
 
 	/* WaEnablePreemptionGranularityControlByUMD:cnl */
+	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
+		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
 	ret= wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
 	if (ret)
 		return ret;
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 2/9] drm/i915/preempt: Default to disabled mid-command preemption levels
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
@ 2017-10-03 20:34 ` Chris Wilson
  2017-10-03 20:34 ` [CI 3/9] drm/i915/execlists: Distinguish the incomplete context notifies Chris Wilson
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

From: Michał Winiarski <michal.winiarski@intel.com>

Supporting fine-granularity preemption levels may require changes in
userspace batch buffer programming. Therefore, we need to fallback to
safe default values, rather that use hardware defaults. Userspace is
still able to enable fine-granularity, since we're whitelisting the
register controlling it in WaEnablePreemptionGranularityControlByUMD.

v2: Extend w/a to cover Cannonlake
v3: Fix commentary to include both fake w/a names.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_reg.h        |  6 ++++++
 drivers/gpu/drm/i915/intel_engine_cs.c | 25 +++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 39ad9327e2a0..e7dba5539b11 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -7003,6 +7003,12 @@ enum {
 #define GEN9_CS_DEBUG_MODE1		_MMIO(0x20ec)
 #define GEN9_CTX_PREEMPT_REG		_MMIO(0x2248)
 #define GEN8_CS_CHICKEN1		_MMIO(0x2580)
+#define GEN9_PREEMPT_3D_OBJECT_LEVEL		(1<<0)
+#define GEN9_PREEMPT_GPGPU_LEVEL(hi, lo)	(((hi) << 2) | ((lo) << 1))
+#define GEN9_PREEMPT_GPGPU_MID_THREAD_LEVEL	GEN9_PREEMPT_GPGPU_LEVEL(0, 0)
+#define GEN9_PREEMPT_GPGPU_THREAD_GROUP_LEVEL	GEN9_PREEMPT_GPGPU_LEVEL(0, 1)
+#define GEN9_PREEMPT_GPGPU_COMMAND_LEVEL	GEN9_PREEMPT_GPGPU_LEVEL(1, 0)
+#define GEN9_PREEMPT_GPGPU_LEVEL_MASK		GEN9_PREEMPT_GPGPU_LEVEL(1, 1)
 
 /* GEN7 chicken */
 #define GEN7_COMMON_SLICE_CHICKEN1		_MMIO(0x7010)
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 040c85e496fa..5615f033c807 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1071,6 +1071,24 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
 	I915_WRITE(GEN8_L3SQCREG4, (I915_READ(GEN8_L3SQCREG4) |
 				    GEN8_LQSC_FLUSH_COHERENT_LINES));
 
+	/*
+	 * Supporting preemption with fine-granularity requires changes in the
+	 * batch buffer programming. Since we can't break old userspace, we
+	 * need to set our default preemption level to safe value. Userspace is
+	 * still able to use more fine-grained preemption levels, since in
+	 * WaEnablePreemptionGranularityControlByUMD we're whitelisting the
+	 * per-ctx register. As such, WaDisable{3D,GPGPU}MidCmdPreemption are
+	 * not real HW workarounds, but merely a way to start using preemption
+	 * while maintaining old contract with userspace.
+	 */
+
+	/* WaDisable3DMidCmdPreemption:skl,bxt,glk,cfl,[cnl] */
+	WA_CLR_BIT_MASKED(GEN8_CS_CHICKEN1, GEN9_PREEMPT_3D_OBJECT_LEVEL);
+
+	/* WaDisableGPGPUMidCmdPreemption:skl,bxt,blk,cfl,[cnl] */
+	WA_SET_FIELD_MASKED(GEN8_CS_CHICKEN1, GEN9_PREEMPT_GPGPU_LEVEL_MASK,
+			    GEN9_PREEMPT_GPGPU_COMMAND_LEVEL);
+
 	/* WaVFEStateAfterPipeControlwithMediaStateClear:skl,bxt,glk,cfl */
 	ret = wa_ring_whitelist_reg(engine, GEN9_CTX_PREEMPT_REG);
 	if (ret)
@@ -1272,6 +1290,13 @@ static int cnl_init_workarounds(struct intel_engine_cs *engine)
 	/* FtrEnableFastAnisoL1BankingFix: cnl */
 	WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3, CNL_FAST_ANISO_L1_BANKING_FIX);
 
+	/* WaDisable3DMidCmdPreemption:cnl */
+	WA_CLR_BIT_MASKED(GEN8_CS_CHICKEN1, GEN9_PREEMPT_3D_OBJECT_LEVEL);
+
+	/* WaDisableGPGPUMidCmdPreemption:cnl */
+	WA_SET_FIELD_MASKED(GEN8_CS_CHICKEN1, GEN9_PREEMPT_GPGPU_LEVEL_MASK,
+			    GEN9_PREEMPT_GPGPU_COMMAND_LEVEL);
+
 	/* WaEnablePreemptionGranularityControlByUMD:cnl */
 	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
 		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 3/9] drm/i915/execlists: Distinguish the incomplete context notifies
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
  2017-10-03 20:34 ` [CI 2/9] drm/i915/preempt: Default to disabled mid-command preemption levels Chris Wilson
@ 2017-10-03 20:34 ` Chris Wilson
  2017-10-03 20:34 ` [CI 4/9] drm/i915: Introduce a preempt context Chris Wilson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

Let the listener know that the context we just scheduled out was not
complete, and will be scheduled back in at a later point.

v2: Handle CONTEXT_STATUS_PREEMPTED in gvt by aliasing it to
CONTEXT_STATUS_OUT for the moment, gvt can expand upon the difference
later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/gvt/scheduler.c | 1 +
 drivers/gpu/drm/i915/intel_lrc.c     | 2 +-
 drivers/gpu/drm/i915/intel_lrc.h     | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index d5892d24f0b6..f6ded475bb2c 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -174,6 +174,7 @@ static int shadow_context_status_change(struct notifier_block *nb,
 		atomic_set(&workload->shadow_ctx_active, 1);
 		break;
 	case INTEL_CONTEXT_SCHEDULE_OUT:
+	case INTEL_CONTEXT_SCHEDULE_PREEMPTED:
 		atomic_set(&workload->shadow_ctx_active, 0);
 		break;
 	default:
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7d6da130b184..0f6c839d65a1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -618,7 +618,7 @@ execlist_cancel_port_requests(struct intel_engine_execlists *execlists)
 	while (num_ports-- && port_isset(port)) {
 		struct drm_i915_gem_request *rq = port_request(port);
 
-		execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
+		execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_PREEMPTED);
 		i915_gem_request_put(rq);
 
 		memset(port, 0, sizeof(*port));
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 314adee7127a..689fde1a63a9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -61,6 +61,7 @@
 enum {
 	INTEL_CONTEXT_SCHEDULE_IN = 0,
 	INTEL_CONTEXT_SCHEDULE_OUT,
+	INTEL_CONTEXT_SCHEDULE_PREEMPTED,
 };
 
 /* Logical Rings */
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 4/9] drm/i915: Introduce a preempt context
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
  2017-10-03 20:34 ` [CI 2/9] drm/i915/preempt: Default to disabled mid-command preemption levels Chris Wilson
  2017-10-03 20:34 ` [CI 3/9] drm/i915/execlists: Distinguish the incomplete context notifies Chris Wilson
@ 2017-10-03 20:34 ` Chris Wilson
  2017-10-03 20:34 ` [CI 5/9] drm/i915/execlists: Move bdw GPGPU w/a to emit_bb Chris Wilson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

Add another perma-pinned context for using for preemption at any time.
We cannot just reuse the existing kernel context, as first and foremost
we need to ensure that we can preempt the kernel context itself, so
require a distinct context id. Similar to the kernel context, we may
want to interrupt execution and switch to the preempt context at any
time, and so it needs to be permanently pinned and available.

To compensate for yet another permanent allocation, we shrink the
existing context and the new context by reducing their ringbuffer to the
minimum.

v2: Assert that we never allocate a request from the preemption context.
v3: Limit perma-pin to engines that may preempt.
v4: Onion cleanup for early driver death
v5: Onion ordering in main driver cleanup as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  6 ++-
 drivers/gpu/drm/i915/i915_gem_context.c | 76 ++++++++++++++++++++++++---------
 drivers/gpu/drm/i915/i915_gem_request.c |  7 +++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 22 +++++++++-
 4 files changed, 87 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7ca11318ac69..31292afb961d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -783,6 +783,7 @@ struct intel_csr {
 	func(has_l3_dpf); \
 	func(has_llc); \
 	func(has_logical_ring_contexts); \
+	func(has_logical_ring_preemption); \
 	func(has_overlay); \
 	func(has_pipe_cxsr); \
 	func(has_pooled_eu); \
@@ -2251,8 +2252,11 @@ struct drm_i915_private {
 	wait_queue_head_t gmbus_wait_queue;
 
 	struct pci_dev *bridge_dev;
-	struct i915_gem_context *kernel_context;
 	struct intel_engine_cs *engine[I915_NUM_ENGINES];
+	/* Context used internally to idle the GPU and setup initial state */
+	struct i915_gem_context *kernel_context;
+	/* Context only to be used for injecting preemption commands */
+	struct i915_gem_context *preempt_context;
 	struct i915_vma *semaphore;
 
 	struct drm_dma_handle *status_page_dmah;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 921ee369c74d..2bb8e58706ba 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -416,14 +416,43 @@ i915_gem_context_create_gvt(struct drm_device *dev)
 	return ctx;
 }
 
+static struct i915_gem_context *
+create_kernel_context(struct drm_i915_private *i915, int prio)
+{
+	struct i915_gem_context *ctx;
+
+	ctx = i915_gem_create_context(i915, NULL);
+	if (IS_ERR(ctx))
+		return ctx;
+
+	i915_gem_context_clear_bannable(ctx);
+	ctx->priority = prio;
+	ctx->ring_size = PAGE_SIZE;
+
+	GEM_BUG_ON(!i915_gem_context_is_kernel(ctx));
+
+	return ctx;
+}
+
+static void
+destroy_kernel_context(struct i915_gem_context **ctxp)
+{
+	struct i915_gem_context *ctx;
+
+	/* Keep the context ref so that we can free it immediately ourselves */
+	ctx = i915_gem_context_get(fetch_and_zero(ctxp));
+	GEM_BUG_ON(!i915_gem_context_is_kernel(ctx));
+
+	context_close(ctx);
+	i915_gem_context_free(ctx);
+}
+
 int i915_gem_contexts_init(struct drm_i915_private *dev_priv)
 {
 	struct i915_gem_context *ctx;
+	int err;
 
-	/* Init should only be called once per module load. Eventually the
-	 * restriction on the context_disabled check can be loosened. */
-	if (WARN_ON(dev_priv->kernel_context))
-		return 0;
+	GEM_BUG_ON(dev_priv->kernel_context);
 
 	INIT_LIST_HEAD(&dev_priv->contexts.list);
 	INIT_WORK(&dev_priv->contexts.free_work, contexts_free_worker);
@@ -441,28 +470,38 @@ int i915_gem_contexts_init(struct drm_i915_private *dev_priv)
 	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > INT_MAX);
 	ida_init(&dev_priv->contexts.hw_ida);
 
-	ctx = i915_gem_create_context(dev_priv, NULL);
+	/* lowest priority; idle task */
+	ctx = create_kernel_context(dev_priv, I915_PRIORITY_MIN);
 	if (IS_ERR(ctx)) {
-		DRM_ERROR("Failed to create default global context (error %ld)\n",
-			  PTR_ERR(ctx));
-		return PTR_ERR(ctx);
+		DRM_ERROR("Failed to create default global context\n");
+		err = PTR_ERR(ctx);
+		goto err;
 	}
-
-	/* For easy recognisablity, we want the kernel context to be 0 and then
+	/*
+	 * For easy recognisablity, we want the kernel context to be 0 and then
 	 * all user contexts will have non-zero hw_id.
 	 */
 	GEM_BUG_ON(ctx->hw_id);
-
-	i915_gem_context_clear_bannable(ctx);
-	ctx->priority = I915_PRIORITY_MIN; /* lowest priority; idle task */
 	dev_priv->kernel_context = ctx;
 
-	GEM_BUG_ON(!i915_gem_context_is_kernel(ctx));
+	/* highest priority; preempting task */
+	ctx = create_kernel_context(dev_priv, INT_MAX);
+	if (IS_ERR(ctx)) {
+		DRM_ERROR("Failed to create default preempt context\n");
+		err = PTR_ERR(ctx);
+		goto err_kernel_context;
+	}
+	dev_priv->preempt_context = ctx;
 
 	DRM_DEBUG_DRIVER("%s context support initialized\n",
 			 dev_priv->engine[RCS]->context_size ? "logical" :
 			 "fake");
 	return 0;
+
+err_kernel_context:
+	destroy_kernel_context(&dev_priv->kernel_context);
+err:
+	return err;
 }
 
 void i915_gem_contexts_lost(struct drm_i915_private *dev_priv)
@@ -507,15 +546,10 @@ void i915_gem_contexts_lost(struct drm_i915_private *dev_priv)
 
 void i915_gem_contexts_fini(struct drm_i915_private *i915)
 {
-	struct i915_gem_context *ctx;
-
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
-	/* Keep the context so that we can free it immediately ourselves */
-	ctx = i915_gem_context_get(fetch_and_zero(&i915->kernel_context));
-	GEM_BUG_ON(!i915_gem_context_is_kernel(ctx));
-	context_close(ctx);
-	i915_gem_context_free(ctx);
+	destroy_kernel_context(&i915->preempt_context);
+	destroy_kernel_context(&i915->kernel_context);
 
 	/* Must free all deferred contexts (via flush_workqueue) first */
 	ida_destroy(&i915->contexts.hw_ida);
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index 14956d899911..b100b38f1dd2 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -587,6 +587,13 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
+	/*
+	 * Preempt contexts are reserved for exclusive use to inject a
+	 * preemption context switch. They are never to be used for any trivial
+	 * request!
+	 */
+	GEM_BUG_ON(ctx == dev_priv->preempt_context);
+
 	/* ABI: Before userspace accesses the GPU (e.g. execbuffer), report
 	 * EIO if the GPU is already wedged.
 	 */
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 5615f033c807..ae3c4084e40f 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -613,9 +613,22 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	if (IS_ERR(ring))
 		return PTR_ERR(ring);
 
+	/*
+	 * Similarly the preempt context must always be available so that
+	 * we can interrupt the engine at any time.
+	 */
+	if (INTEL_INFO(engine->i915)->has_logical_ring_preemption) {
+		ring = engine->context_pin(engine,
+					   engine->i915->preempt_context);
+		if (IS_ERR(ring)) {
+			ret = PTR_ERR(ring);
+			goto err_unpin_kernel;
+		}
+	}
+
 	ret = intel_engine_init_breadcrumbs(engine);
 	if (ret)
-		goto err_unpin;
+		goto err_unpin_preempt;
 
 	ret = i915_gem_render_state_init(engine);
 	if (ret)
@@ -634,7 +647,10 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	i915_gem_render_state_fini(engine);
 err_breadcrumbs:
 	intel_engine_fini_breadcrumbs(engine);
-err_unpin:
+err_unpin_preempt:
+	if (INTEL_INFO(engine->i915)->has_logical_ring_preemption)
+		engine->context_unpin(engine, engine->i915->preempt_context);
+err_unpin_kernel:
 	engine->context_unpin(engine, engine->i915->kernel_context);
 	return ret;
 }
@@ -660,6 +676,8 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	intel_engine_cleanup_cmd_parser(engine);
 	i915_gem_batch_pool_fini(&engine->batch_pool);
 
+	if (INTEL_INFO(engine->i915)->has_logical_ring_preemption)
+		engine->context_unpin(engine, engine->i915->preempt_context);
 	engine->context_unpin(engine, engine->i915->kernel_context);
 }
 
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 5/9] drm/i915/execlists: Move bdw GPGPU w/a to emit_bb
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
                   ` (2 preceding siblings ...)
  2017-10-03 20:34 ` [CI 4/9] drm/i915: Introduce a preempt context Chris Wilson
@ 2017-10-03 20:34 ` Chris Wilson
  2017-10-03 20:34 ` [CI 6/9] drm/i915/execlists: Keep request->priority for its lifetime Chris Wilson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

Move the re-enabling of MI arbitration from a per-bb w/a buffer to the
emission of the batch buffer itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0f6c839d65a1..e5b470ce43e8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1159,24 +1159,6 @@ static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 	return batch;
 }
 
-/*
- *  This batch is started immediately after indirect_ctx batch. Since we ensure
- *  that indirect_ctx ends on a cacheline this batch is aligned automatically.
- *
- *  The number of DWORDS written are returned using this field.
- *
- *  This batch is terminated with MI_BATCH_BUFFER_END and so we need not add padding
- *  to align it with cacheline as padding after MI_BATCH_BUFFER_END is redundant.
- */
-static u32 *gen8_init_perctx_bb(struct intel_engine_cs *engine, u32 *batch)
-{
-	/* WaDisableCtxRestoreArbitration:bdw,chv */
-	*batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	*batch++ = MI_BATCH_BUFFER_END;
-
-	return batch;
-}
-
 static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 {
 	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
@@ -1291,7 +1273,7 @@ static int intel_init_workaround_bb(struct intel_engine_cs *engine)
 		break;
 	case 8:
 		wa_bb_fn[0] = gen8_init_indirectctx_bb;
-		wa_bb_fn[1] = gen8_init_perctx_bb;
+		wa_bb_fn[1] = NULL;
 		break;
 	default:
 		MISSING_CASE(INTEL_GEN(engine->i915));
@@ -1535,13 +1517,15 @@ static int gen8_emit_bb_start(struct drm_i915_gem_request *req,
 	if (IS_ERR(cs))
 		return PTR_ERR(cs);
 
+	/* WaDisableCtxRestoreArbitration:bdw,chv */
+	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
 	/* FIXME(BDW): Address space and security selectors. */
 	*cs++ = MI_BATCH_BUFFER_START_GEN8 |
 		(flags & I915_DISPATCH_SECURE ? 0 : BIT(8)) |
 		(flags & I915_DISPATCH_RS ? MI_BATCH_RESOURCE_STREAMER : 0);
 	*cs++ = lower_32_bits(offset);
 	*cs++ = upper_32_bits(offset);
-	*cs++ = MI_NOOP;
 	intel_ring_advance(req, cs);
 
 	return 0;
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 6/9] drm/i915/execlists: Keep request->priority for its lifetime
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
                   ` (3 preceding siblings ...)
  2017-10-03 20:34 ` [CI 5/9] drm/i915/execlists: Move bdw GPGPU w/a to emit_bb Chris Wilson
@ 2017-10-03 20:34 ` Chris Wilson
  2017-10-03 20:34 ` [CI 7/9] drm/i915: Expand I915_PARAM_HAS_SCHEDULER into a capability bitmask Chris Wilson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

With preemption, we will want to "unsubmit" a request, taking it back
from the hw and returning it to the priority sorted execution list. In
order to know where to insert it into that list, we need to remember
its adjust priority (which may change even as it was being executed).

This also affects reset for execlists as we are now unsubmitting the
requests following the reset (rather than directly writing the ELSP for
the inflight contexts). This turns reset into an accidental preemption
point, as after the reset we may choose a different pair of contexts to
submit to hw.

GuC is not updated as this series doesn't add preemption to the GuC
submission, and so it can keep benefiting from the early pruning of the
DFS inside execlists_schedule() for a little longer. We also need to
find a way of reducing the cost of that DFS...

v2: Include priority in error-state

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h       |  2 ++
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 ++++++----
 drivers/gpu/drm/i915/intel_lrc.c      | 14 ++++++++++----
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 31292afb961d..1fc7080bfa7b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -982,6 +982,7 @@ struct i915_gpu_state {
 			pid_t pid;
 			u32 handle;
 			u32 hw_id;
+			int priority;
 			int ban_score;
 			int active;
 			int guilty;
@@ -1004,6 +1005,7 @@ struct i915_gpu_state {
 			long jiffies;
 			pid_t pid;
 			u32 context;
+			int priority;
 			int ban_score;
 			u32 seqno;
 			u32 head;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index c14552ab270b..dc91b32d699e 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -377,9 +377,9 @@ static void error_print_request(struct drm_i915_error_state_buf *m,
 	if (!erq->seqno)
 		return;
 
-	err_printf(m, "%s pid %d, ban score %d, seqno %8x:%08x, emitted %dms ago, head %08x, tail %08x\n",
+	err_printf(m, "%s pid %d, ban score %d, seqno %8x:%08x, prio %d, emitted %dms ago, head %08x, tail %08x\n",
 		   prefix, erq->pid, erq->ban_score,
-		   erq->context, erq->seqno,
+		   erq->context, erq->seqno, erq->priority,
 		   jiffies_to_msecs(jiffies - erq->jiffies),
 		   erq->head, erq->tail);
 }
@@ -388,9 +388,9 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
 				const char *header,
 				const struct drm_i915_error_context *ctx)
 {
-	err_printf(m, "%s%s[%d] user_handle %d hw_id %d, ban score %d guilty %d active %d\n",
+	err_printf(m, "%s%s[%d] user_handle %d hw_id %d, prio %d, ban score %d guilty %d active %d\n",
 		   header, ctx->comm, ctx->pid, ctx->handle, ctx->hw_id,
-		   ctx->ban_score, ctx->guilty, ctx->active);
+		   ctx->priority, ctx->ban_score, ctx->guilty, ctx->active);
 }
 
 static void error_print_engine(struct drm_i915_error_state_buf *m,
@@ -1271,6 +1271,7 @@ static void record_request(struct drm_i915_gem_request *request,
 			   struct drm_i915_error_request *erq)
 {
 	erq->context = request->ctx->hw_id;
+	erq->priority = request->priotree.priority;
 	erq->ban_score = atomic_read(&request->ctx->ban_score);
 	erq->seqno = request->global_seqno;
 	erq->jiffies = request->emitted_jiffies;
@@ -1364,6 +1365,7 @@ static void record_context(struct drm_i915_error_context *e,
 
 	e->handle = ctx->user_handle;
 	e->hw_id = ctx->hw_id;
+	e->priority = ctx->priority;
 	e->ban_score = atomic_read(&ctx->ban_score);
 	e->guilty = atomic_read(&ctx->guilty_count);
 	e->active = atomic_read(&ctx->active_count);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e5b470ce43e8..02ea1e4e098b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -585,8 +585,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			}
 
 			INIT_LIST_HEAD(&rq->priotree.link);
-			rq->priotree.priority = INT_MAX;
-
 			__i915_gem_request_submit(rq);
 			trace_i915_gem_request_in(rq, port_index(port, execlists));
 			last = rq;
@@ -794,6 +792,7 @@ static void intel_lrc_irq_handler(unsigned long data)
 				execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 
 				trace_i915_gem_request_out(rq);
+				rq->priotree.priority = INT_MAX;
 				i915_gem_request_put(rq);
 
 				execlists_port_complete(execlists, port);
@@ -846,11 +845,15 @@ static void execlists_submit_request(struct drm_i915_gem_request *request)
 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
 }
 
+static struct drm_i915_gem_request *pt_to_request(struct i915_priotree *pt)
+{
+	return container_of(pt, struct drm_i915_gem_request, priotree);
+}
+
 static struct intel_engine_cs *
 pt_lock_engine(struct i915_priotree *pt, struct intel_engine_cs *locked)
 {
-	struct intel_engine_cs *engine =
-		container_of(pt, struct drm_i915_gem_request, priotree)->engine;
+	struct intel_engine_cs *engine = pt_to_request(pt)->engine;
 
 	GEM_BUG_ON(!locked);
 
@@ -906,6 +909,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
 		 * engines.
 		 */
 		list_for_each_entry(p, &pt->signalers_list, signal_link) {
+			if (i915_gem_request_completed(pt_to_request(p->signaler)))
+				continue;
+
 			GEM_BUG_ON(p->signaler->priority < pt->priority);
 			if (prio > READ_ONCE(p->signaler->priority))
 				list_move_tail(&p->dfs_link, &dfs);
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 7/9] drm/i915: Expand I915_PARAM_HAS_SCHEDULER into a capability bitmask
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
                   ` (4 preceding siblings ...)
  2017-10-03 20:34 ` [CI 6/9] drm/i915/execlists: Keep request->priority for its lifetime Chris Wilson
@ 2017-10-03 20:34 ` Chris Wilson
  2017-10-03 20:34 ` [CI 8/9] drm/i915/execlists: Preemption! Chris Wilson
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

In the next few patches, we wish to enable different features for the
scheduler, some which may subtlety change ABI (e.g. allow requests to be
reordered under different circumstances). So we need to make sure
userspace is cognizant of the changes (if they care), by which we employ
the usual method of a GETPARAM. We already have an
I915_PARAM_HAS_SCHEDULER (which notes the existing ability to reorder
requests to avoid bubbles), and now we wish to extend that to be a
bitmask to describe the different capabilities implemented.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 5 +++--
 include/uapi/drm/i915_drm.h     | 9 ++++++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5cc24344c266..efbe5fc3d044 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -367,8 +367,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		value = i915_gem_mmap_gtt_version();
 		break;
 	case I915_PARAM_HAS_SCHEDULER:
-		value = dev_priv->engine[RCS] &&
-			dev_priv->engine[RCS]->schedule;
+		value = 0;
+		if (dev_priv->engine[RCS] && dev_priv->engine[RCS]->schedule)
+			value |= I915_SCHEDULER_CAP_ENABLED;
 		break;
 	case I915_PARAM_MMAP_VERSION:
 		/* Remember to bump this if the version changes! */
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index fe25a01c81f2..aa4a3b20ef6b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -397,10 +397,17 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_MIN_EU_IN_POOL	 39
 #define I915_PARAM_MMAP_GTT_VERSION	 40
 
-/* Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
+/*
+ * Query whether DRM_I915_GEM_EXECBUFFER2 supports user defined execution
  * priorities and the driver will attempt to execute batches in priority order.
+ * The param returns a capability bitmask, nonzero implies that the scheduler
+ * is enabled, with different features present according to the mask.
  */
 #define I915_PARAM_HAS_SCHEDULER	 41
+#define   I915_SCHEDULER_CAP_ENABLED	(1ul << 0)
+#define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
+#define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
+
 #define I915_PARAM_HUC_STATUS		 42
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 8/9] drm/i915/execlists: Preemption!
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
                   ` (5 preceding siblings ...)
  2017-10-03 20:34 ` [CI 7/9] drm/i915: Expand I915_PARAM_HAS_SCHEDULER into a capability bitmask Chris Wilson
@ 2017-10-03 20:34 ` Chris Wilson
  2017-10-03 20:34 ` [CI 9/9] drm/i915/scheduler: Support user-defined priorities Chris Wilson
  2017-10-03 21:14 ` ✗ Fi.CI.BAT: warning for series starting with [CI,1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Patchwork
  8 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

When we write to ELSP, it triggers a context preemption at the earliest
arbitration point (3DPRIMITIVE, some PIPECONTROLs, a few other
operations and the explicit MI_ARB_CHECK). If this is to the same
context, it triggers a LITE_RESTORE where the RING_TAIL is merely
updated (used currently to chain requests from the same context
together, avoiding bubbles). However, if it is to a different context, a
full context-switch is performed and it will start to execute the new
context saving the image of the old for later execution.

Previously we avoided preemption by only submitting a new context when
the old was idle. But now we wish embrace it, and if the new request has
a higher priority than the currently executing request, we write to the
ELSP regardless, thus triggering preemption, but we tell the GPU to
switch to our special preemption context (not the target). In the
context-switch interrupt handler, we know that the previous contexts
have finished execution and so can unwind all the incomplete requests
and compute the new highest priority request to execute.

It would be feasible to avoid the switch-to-idle intermediate by
programming the ELSP with the target context. The difficulty is in
tracking which request that should be whilst maintaining the dependency
change, the error comes in with coalesced requests. As we only track the
most recent request and its priority, we may run into the issue of being
tricked in preempting a high priority request that was followed by a
low priority request from the same context (e.g. for PI); worse still
that earlier request may be our own dependency and the order then broken
by preemption. By injecting the switch-to-idle and then recomputing the
priority queue, we avoid the issue with tracking in-flight coalesced
requests. Having tried the preempt-to-busy approach, and failed to find
a way around the coalesced priority issue, Michal's original proposal to
inject an idle context (based on handling GuC preemption) succeeds.

The current heuristic for deciding when to preempt are only if the new
request is of higher priority, and has the privileged priority of
greater than 0. Note that the scheduler remains unfair!

v2: Disable for gen8 (bdw/bsw) as we need additional w/a for GPGPU.
Since, the feature is now conditional and not always available when we
have a scheduler, make it known via the HAS_SCHEDULER GETPARAM (now a
capability mask).
v3: Stylistic tweaks.
v4: Appease Joonas with a snippet of kerneldoc, only to fuel to fire of
the preempt vs preempting debate.

Suggested-by: Michal Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c         |   9 +-
 drivers/gpu/drm/i915/i915_irq.c         |   6 +-
 drivers/gpu/drm/i915/i915_pci.c         |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        | 157 +++++++++++++++++++++++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   5 +
 5 files changed, 143 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index efbe5fc3d044..4d5a22485848 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -368,9 +368,16 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		break;
 	case I915_PARAM_HAS_SCHEDULER:
 		value = 0;
-		if (dev_priv->engine[RCS] && dev_priv->engine[RCS]->schedule)
+		if (dev_priv->engine[RCS] && dev_priv->engine[RCS]->schedule) {
 			value |= I915_SCHEDULER_CAP_ENABLED;
+
+			if (INTEL_INFO(dev_priv)->has_logical_ring_preemption &&
+			    i915_modparams.enable_execlists &&
+			    !i915_modparams.enable_guc_submission)
+				value |= I915_SCHEDULER_CAP_PREEMPTION;
+		}
 		break;
+
 	case I915_PARAM_MMAP_VERSION:
 		/* Remember to bump this if the version changes! */
 	case I915_PARAM_HAS_GEM:
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e5997e818673..de777139f6a1 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1382,10 +1382,8 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir, int test_shift)
 	bool tasklet = false;
 
 	if (iir & (GT_CONTEXT_SWITCH_INTERRUPT << test_shift)) {
-		if (port_count(&execlists->port[0])) {
-			__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
-			tasklet = true;
-		}
+		__set_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
+		tasklet = true;
 	}
 
 	if (iir & (GT_RENDER_USER_INTERRUPT << test_shift)) {
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index d3a6fe5e3c10..745b6a6e0188 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -424,6 +424,7 @@ static const struct intel_device_info intel_cherryview_info __initconst = {
 
 #define GEN9_FEATURES \
 	GEN8_FEATURES, \
+	.has_logical_ring_preemption = 1, \
 	.has_csr = 1, \
 	.has_guc = 1, \
 	.has_ipc = 1, \
@@ -477,6 +478,7 @@ static const struct intel_device_info intel_skylake_gt4_info __initconst = {
 	.has_rc6 = 1, \
 	.has_dp_mst = 1, \
 	.has_logical_ring_contexts = 1, \
+	.has_logical_ring_preemption = 1, \
 	.has_guc = 1, \
 	.has_aliasing_ppgtt = 1, \
 	.has_full_ppgtt = 1, \
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 02ea1e4e098b..cc5574ff56c8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -208,9 +208,9 @@
 
 /* Typical size of the average request (2 pipecontrols and a MI_BB) */
 #define EXECLISTS_REQUEST_SIZE 64 /* bytes */
-
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
+#define PREEMPT_ID 0x1
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine);
@@ -430,6 +430,12 @@ static u64 execlists_update_context(struct drm_i915_gem_request *rq)
 	return ce->lrc_desc;
 }
 
+static inline void elsp_write(u64 desc, u32 __iomem *elsp)
+{
+	writel(upper_32_bits(desc), elsp);
+	writel(lower_32_bits(desc), elsp);
+}
+
 static void execlists_submit_ports(struct intel_engine_cs *engine)
 {
 	struct execlist_port *port = engine->execlists.port;
@@ -455,8 +461,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 			desc = 0;
 		}
 
-		writel(upper_32_bits(desc), elsp);
-		writel(lower_32_bits(desc), elsp);
+		elsp_write(desc, elsp);
 	}
 }
 
@@ -489,26 +494,43 @@ static void port_assign(struct execlist_port *port,
 	port_set(port, port_pack(i915_gem_request_get(rq), port_count(port)));
 }
 
+static void inject_preempt_context(struct intel_engine_cs *engine)
+{
+	struct intel_context *ce =
+		&engine->i915->preempt_context->engine[engine->id];
+	u32 __iomem *elsp =
+		engine->i915->regs + i915_mmio_reg_offset(RING_ELSP(engine));
+	unsigned int n;
+
+	GEM_BUG_ON(engine->i915->preempt_context->hw_id != PREEMPT_ID);
+	GEM_BUG_ON(!IS_ALIGNED(ce->ring->size, WA_TAIL_BYTES));
+
+	memset(ce->ring->vaddr + ce->ring->tail, 0, WA_TAIL_BYTES);
+	ce->ring->tail += WA_TAIL_BYTES;
+	ce->ring->tail &= (ce->ring->size - 1);
+	ce->lrc_reg_state[CTX_RING_TAIL+1] = ce->ring->tail;
+
+	for (n = execlists_num_ports(&engine->execlists); --n; )
+		elsp_write(0, elsp);
+
+	elsp_write(ce->lrc_desc, elsp);
+}
+
+static bool can_preempt(struct intel_engine_cs *engine)
+{
+	return INTEL_INFO(engine->i915)->has_logical_ring_preemption;
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
-	struct drm_i915_gem_request *last;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct execlist_port *port = execlists->port;
 	const struct execlist_port * const last_port =
 		&execlists->port[execlists->port_mask];
+	struct drm_i915_gem_request *last = port_request(port);
 	struct rb_node *rb;
 	bool submit = false;
 
-	last = port_request(port);
-	if (last)
-		/* WaIdleLiteRestore:bdw,skl
-		 * Apply the wa NOOPs to prevent ring:HEAD == req:TAIL
-		 * as we resubmit the request. See gen8_emit_breadcrumb()
-		 * for where we prepare the padding after the end of the
-		 * request.
-		 */
-		last->tail = last->wa_tail;
-
 	/* Hardware submission is through 2 ports. Conceptually each port
 	 * has a (RING_START, RING_HEAD, RING_TAIL) tuple. RING_START is
 	 * static for a context, and unique to each, so we only execute
@@ -533,7 +555,65 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	spin_lock_irq(&engine->timeline->lock);
 	rb = execlists->first;
 	GEM_BUG_ON(rb_first(&execlists->queue) != rb);
-	while (rb) {
+	if (!rb)
+		goto unlock;
+
+	if (last) {
+		/*
+		 * Don't resubmit or switch until all outstanding
+		 * preemptions (lite-restore) are seen. Then we
+		 * know the next preemption status we see corresponds
+		 * to this ELSP update.
+		 */
+		if (port_count(&port[0]) > 1)
+			goto unlock;
+
+		if (can_preempt(engine) &&
+		    rb_entry(rb, struct i915_priolist, node)->priority >
+		    max(last->priotree.priority, 0)) {
+			/*
+			 * Switch to our empty preempt context so
+			 * the state of the GPU is known (idle).
+			 */
+			inject_preempt_context(engine);
+			execlists->preempt = true;
+			goto unlock;
+		} else {
+			/*
+			 * In theory, we could coalesce more requests onto
+			 * the second port (the first port is active, with
+			 * no preemptions pending). However, that means we
+			 * then have to deal with the possible lite-restore
+			 * of the second port (as we submit the ELSP, there
+			 * may be a context-switch) but also we may complete
+			 * the resubmission before the context-switch. Ergo,
+			 * coalescing onto the second port will cause a
+			 * preemption event, but we cannot predict whether
+			 * that will affect port[0] or port[1].
+			 *
+			 * If the second port is already active, we can wait
+			 * until the next context-switch before contemplating
+			 * new requests. The GPU will be busy and we should be
+			 * able to resubmit the new ELSP before it idles,
+			 * avoiding pipeline bubbles (momentary pauses where
+			 * the driver is unable to keep up the supply of new
+			 * work).
+			 */
+			if (port_count(&port[1]))
+				goto unlock;
+
+			/* WaIdleLiteRestore:bdw,skl
+			 * Apply the wa NOOPs to prevent
+			 * ring:HEAD == req:TAIL as we resubmit the
+			 * request. See gen8_emit_breadcrumb() for
+			 * where we prepare the padding after the
+			 * end of the request.
+			 */
+			last->tail = last->wa_tail;
+		}
+	}
+
+	do {
 		struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
 		struct drm_i915_gem_request *rq, *rn;
 
@@ -596,11 +676,12 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		INIT_LIST_HEAD(&p->requests);
 		if (p->priority != I915_PRIORITY_NORMAL)
 			kmem_cache_free(engine->i915->priorities, p);
-	}
+	} while (rb);
 done:
 	execlists->first = rb;
 	if (submit)
 		port_assign(port, last);
+unlock:
 	spin_unlock_irq(&engine->timeline->lock);
 
 	if (submit)
@@ -681,13 +762,6 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	spin_unlock_irqrestore(&engine->timeline->lock, flags);
 }
 
-static bool execlists_elsp_ready(const struct intel_engine_cs *engine)
-{
-	const struct execlist_port *port = engine->execlists.port;
-
-	return port_count(&port[0]) + port_count(&port[1]) < 2;
-}
-
 /*
  * Check the unread Context Status Buffers and manage the submission of new
  * contexts to the ELSP accordingly.
@@ -696,7 +770,7 @@ static void intel_lrc_irq_handler(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct execlist_port *port = execlists->port;
+	struct execlist_port * const port = execlists->port;
 	struct drm_i915_private *dev_priv = engine->i915;
 
 	/* We can skip acquiring intel_runtime_pm_get() here as it was taken
@@ -781,6 +855,23 @@ static void intel_lrc_irq_handler(unsigned long data)
 			if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK))
 				continue;
 
+			if (status & GEN8_CTX_STATUS_ACTIVE_IDLE &&
+			    buf[2*head + 1] == PREEMPT_ID) {
+				execlist_cancel_port_requests(execlists);
+
+				spin_lock_irq(&engine->timeline->lock);
+				unwind_incomplete_requests(engine);
+				spin_unlock_irq(&engine->timeline->lock);
+
+				GEM_BUG_ON(!execlists->preempt);
+				execlists->preempt = false;
+				continue;
+			}
+
+			if (status & GEN8_CTX_STATUS_PREEMPTED &&
+			    execlists->preempt)
+				continue;
+
 			/* Check the context/desc id for this event matches */
 			GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id);
 
@@ -812,7 +903,7 @@ static void intel_lrc_irq_handler(unsigned long data)
 		}
 	}
 
-	if (execlists_elsp_ready(engine))
+	if (!execlists->preempt)
 		execlists_dequeue(engine);
 
 	intel_uncore_forcewake_put(dev_priv, execlists->fw_domains);
@@ -825,7 +916,7 @@ static void insert_request(struct intel_engine_cs *engine,
 	struct i915_priolist *p = lookup_priolist(engine, pt, prio);
 
 	list_add_tail(&pt->link, &ptr_mask_bits(p, 1)->requests);
-	if (ptr_unmask_bits(p, 1) && execlists_elsp_ready(engine))
+	if (ptr_unmask_bits(p, 1))
 		tasklet_hi_schedule(&engine->execlists.irq_tasklet);
 }
 
@@ -955,8 +1046,6 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
 	}
 
 	spin_unlock_irq(&engine->timeline->lock);
-
-	/* XXX Do we need to preempt to make room for us and our deps? */
 }
 
 static struct intel_ring *
@@ -1152,6 +1241,8 @@ static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 				       i915_ggtt_offset(engine->scratch) +
 				       2 * CACHELINE_BYTES);
 
+	*batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
 	/* Pad to end of cacheline */
 	while ((unsigned long)batch % CACHELINE_BYTES)
 		*batch++ = MI_NOOP;
@@ -1167,6 +1258,8 @@ static u32 *gen8_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 
 static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 {
+	*batch++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
+
 	/* WaFlushCoherentL3CacheLinesAtContextSwitch:skl,bxt,glk */
 	batch = gen8_emit_flush_coherentl3_wa(engine, batch);
 
@@ -1212,6 +1305,8 @@ static u32 *gen9_init_indirectctx_bb(struct intel_engine_cs *engine, u32 *batch)
 		*batch++ = 0;
 	}
 
+	*batch++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+
 	/* Pad to end of cacheline */
 	while ((unsigned long)batch % CACHELINE_BYTES)
 		*batch++ = MI_NOOP;
@@ -1365,6 +1460,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
 		   GT_CONTEXT_SWITCH_INTERRUPT << engine->irq_shift);
 	clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
 	execlists->csb_head = -1;
+	execlists->preempt = false;
 
 	/* After a GPU reset, we may have requests to replay */
 	if (!i915_modparams.enable_guc_submission && execlists->first)
@@ -1660,7 +1756,8 @@ static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
  */
 static void gen8_emit_wa_tail(struct drm_i915_gem_request *request, u32 *cs)
 {
-	*cs++ = MI_NOOP;
+	/* Ensure there's always at least one preemption point per-request. */
+	*cs++ = MI_ARB_CHECK;
 	*cs++ = MI_NOOP;
 	request->wa_tail = intel_ring_offset(request, cs);
 }
@@ -1681,7 +1778,6 @@ static void gen8_emit_breadcrumb(struct drm_i915_gem_request *request, u32 *cs)
 
 	gen8_emit_wa_tail(request, cs);
 }
-
 static const int gen8_emit_breadcrumb_sz = 6 + WA_TAIL_DWORDS;
 
 static void gen8_emit_breadcrumb_render(struct drm_i915_gem_request *request,
@@ -1709,7 +1805,6 @@ static void gen8_emit_breadcrumb_render(struct drm_i915_gem_request *request,
 
 	gen8_emit_wa_tail(request, cs);
 }
-
 static const int gen8_emit_breadcrumb_render_sz = 8 + WA_TAIL_DWORDS;
 
 static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 56d7ae9f298b..0fedda17488c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -238,6 +238,11 @@ struct intel_engine_execlists {
 #define EXECLIST_MAX_PORTS 2
 	} port[EXECLIST_MAX_PORTS];
 
+	/**
+	 * @preempt: are we currently handling a preempting context switch?
+	 */
+	bool preempt;
+
 	/**
 	 * @port_mask: number of execlist ports - 1
 	 */
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 9/9] drm/i915/scheduler: Support user-defined priorities
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
                   ` (6 preceding siblings ...)
  2017-10-03 20:34 ` [CI 8/9] drm/i915/execlists: Preemption! Chris Wilson
@ 2017-10-03 20:34 ` Chris Wilson
  2017-10-03 21:14 ` ✗ Fi.CI.BAT: warning for series starting with [CI,1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Patchwork
  8 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-03 20:34 UTC (permalink / raw)
  To: intel-gfx

Use a priority stored in the context as the initial value when
submitting a request. This allows us to change the default priority on a
per-context basis, allowing different contexts to be favoured with GPU
time at the expense of lower importance work. The user can adjust the
context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive
values being higher priority (they will be serviced earlier, after their
dependencies have been resolved). Any prerequisite work for an execbuf
will have its priority raised to match the new request as required.

Normal users can specify any value in the range of -1023 to 0 [default],
i.e. they can reduce the priority of their workloads (and temporarily
boost it back to normal if so desired).

Privileged users can specify any value in the range of -1023 to 1023,
[default is 0], i.e. they can raise their priority above all overs and
so potentially starve the system.

Note that the existing schedulers are not fair, nor load balancing, the
execution is strictly by priority on a first-come, first-served basis,
and the driver may choose to boost some requests above the range
available to users.

This priority was originally based around nice(2), but evolved to allow
clients to adjust their priority within a small range, and allow for a
privileged high priority range.

For example, this can be used to implement EGL_IMG_context_priority
https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt

	EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of
        the context to be created. This attribute is a hint, as an
        implementation may not support multiple contexts at some
        priority levels and system policy may limit access to high
        priority contexts to appropriate system privilege level. The
        default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is
        EGL_CONTEXT_PRIORITY_MEDIUM_IMG."

so we can map

	PRIORITY_HIGH -> 1023 [privileged, will failback to 0]
	PRIORITY_MED -> 0 [default]
	PRIORITY_LOW -> -1023

They also map onto the priorities used by VkQueue (and a VkQueue is
essentially a timeline, our i915_gem_context under full-ppgtt).

v2: s/CAP_SYS_ADMIN/CAP_SYS_NICE/
v3: Report min/max user priorities as defines in the uapi, and rebase
internal priorities on the exposed values.

Testcase: igt/gem_exec_schedule
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c         |  1 +
 drivers/gpu/drm/i915/i915_gem_context.c | 23 +++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_request.h | 14 ++++++++++----
 include/uapi/drm/i915_drm.h             |  7 +++++++
 4 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4d5a22485848..4f9a3d84e194 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -370,6 +370,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
 		value = 0;
 		if (dev_priv->engine[RCS] && dev_priv->engine[RCS]->schedule) {
 			value |= I915_SCHEDULER_CAP_ENABLED;
+			value |= I915_SCHEDULER_CAP_PRIORITY;
 
 			if (INTEL_INFO(dev_priv)->has_logical_ring_preemption &&
 			    i915_modparams.enable_execlists &&
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2bb8e58706ba..5bf96a258509 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1070,6 +1070,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_BANNABLE:
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_PRIORITY:
+		args->value = ctx->priority;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1125,6 +1128,26 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_bannable(ctx);
 		break;
+
+	case I915_CONTEXT_PARAM_PRIORITY:
+		{
+			int priority = args->value;
+
+			if (args->size)
+				ret = -EINVAL;
+			else if (!to_i915(dev)->engine[RCS]->schedule)
+				ret = -ENODEV;
+			else if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
+				 priority < I915_CONTEXT_MIN_USER_PRIORITY)
+				ret = -EINVAL;
+			else if (priority > I915_CONTEXT_DEFAULT_PRIORITY &&
+				 !capable(CAP_SYS_NICE))
+				ret = -EPERM;
+			else
+				ctx->priority = priority;
+		}
+		break;
+
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
index 6b9e992d01de..26249f39de67 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.h
+++ b/drivers/gpu/drm/i915/i915_gem_request.h
@@ -30,6 +30,8 @@
 #include "i915_gem.h"
 #include "i915_sw_fence.h"
 
+#include <uapi/drm/i915_drm.h>
+
 struct drm_file;
 struct drm_i915_gem_object;
 struct drm_i915_gem_request;
@@ -69,10 +71,14 @@ struct i915_priotree {
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
 	int priority;
-#define I915_PRIORITY_MAX 1024
-#define I915_PRIORITY_NORMAL 0
-#define I915_PRIORITY_MIN (-I915_PRIORITY_MAX)
-#define I915_PRIORITY_INVALID INT_MIN
+};
+
+enum {
+	I915_PRIORITY_MIN = I915_CONTEXT_MIN_USER_PRIORITY - 1,
+	I915_PRIORITY_NORMAL = I915_CONTEXT_DEFAULT_PRIORITY,
+	I915_PRIORITY_MAX = I915_CONTEXT_MAX_USER_PRIORITY + 1,
+
+	I915_PRIORITY_INVALID = INT_MIN
 };
 
 struct i915_gem_capture_list {
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index aa4a3b20ef6b..7266b53191ee 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -402,6 +402,9 @@ typedef struct drm_i915_irq_wait {
  * priorities and the driver will attempt to execute batches in priority order.
  * The param returns a capability bitmask, nonzero implies that the scheduler
  * is enabled, with different features present according to the mask.
+ *
+ * The initial priority for each batch is supplied by the context and is
+ * controlled via I915_CONTEXT_PARAM_PRIORITY.
  */
 #define I915_PARAM_HAS_SCHEDULER	 41
 #define   I915_SCHEDULER_CAP_ENABLED	(1ul << 0)
@@ -1367,6 +1370,10 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
 #define I915_CONTEXT_PARAM_BANNABLE	0x5
+#define I915_CONTEXT_PARAM_PRIORITY	0x6
+#define   I915_CONTEXT_MAX_USER_PRIORITY	1023 /* inclusive */
+#define   I915_CONTEXT_DEFAULT_PRIORITY		0
+#define   I915_CONTEXT_MIN_USER_PRIORITY	-1023 /* inclusive */
 	__u64 value;
 };
 
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.BAT: warning for series starting with [CI,1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD
  2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
                   ` (7 preceding siblings ...)
  2017-10-03 20:34 ` [CI 9/9] drm/i915/scheduler: Support user-defined priorities Chris Wilson
@ 2017-10-03 21:14 ` Patchwork
  8 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2017-10-03 21:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [CI,1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD
URL   : https://patchwork.freedesktop.org/series/31350/
State : warning

== Summary ==

Series 31350v1 series starting with [CI,1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD
https://patchwork.freedesktop.org/api/1.0/series/31350/revisions/1/mbox/

Test kms_cursor_legacy:
        Subgroup basic-flip-after-cursor-legacy:
                pass       -> DMESG-WARN (fi-bxt-dsi)

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:467s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:482s
fi-blb-e6850     total:289  pass:224  dwarn:1   dfail:0   fail:0   skip:64  time:399s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:569s
fi-bwr-2160      total:289  pass:184  dwarn:0   dfail:0   fail:0   skip:105 time:289s
fi-bxt-dsi       total:289  pass:258  dwarn:1   dfail:0   fail:0   skip:30  time:536s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:537s
fi-byt-j1900     total:289  pass:254  dwarn:1   dfail:0   fail:0   skip:34  time:546s
fi-byt-n2820     total:289  pass:250  dwarn:1   dfail:0   fail:0   skip:38  time:538s
fi-cfl-s         total:289  pass:256  dwarn:1   dfail:0   fail:0   skip:32  time:571s
fi-cnl-y         total:289  pass:261  dwarn:1   dfail:0   fail:0   skip:27  time:634s
fi-elk-e7500     total:289  pass:230  dwarn:0   dfail:0   fail:0   skip:59  time:438s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:598s
fi-hsw-4770      total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:444s
fi-hsw-4770r     total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:417s
fi-ilk-650       total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:470s
fi-ivb-3520m     total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:498s
fi-ivb-3770      total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:480s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:503s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:583s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:492s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:590s
fi-pnv-d510      total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:669s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:479s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:537s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:524s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:474s
fi-snb-2520m     total:289  pass:251  dwarn:0   dfail:0   fail:0   skip:38  time:584s
fi-snb-2600      total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:437s

654360cf73feea27f7ed5bfa2e5b2fa5ede2e8ec drm-tip: 2017y-10m-03d-17h-55m-08s UTC integration manifest
476fa7061ed8 drm/i915/scheduler: Support user-defined priorities
fcef1b9da6f5 drm/i915/execlists: Preemption!
80fd5334c8e1 drm/i915: Expand I915_PARAM_HAS_SCHEDULER into a capability bitmask
ca9d057a5788 drm/i915/execlists: Keep request->priority for its lifetime
95c67d878d4f drm/i915/execlists: Move bdw GPGPU w/a to emit_bb
3b9484d04dd0 drm/i915: Introduce a preempt context
6f00d4774d40 drm/i915/execlists: Distinguish the incomplete context notifies
27f4e56e982f drm/i915/preempt: Default to disabled mid-command preemption levels
2fe1ac1a5b7c drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5887/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD
@ 2017-10-04 15:52 Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-04 15:52 UTC (permalink / raw)
  To: intel-gfx

From: Jeff McGee <jeff.mcgee@intel.com>

The WA applies to all production Gen9 and requires both enabling and
whitelisting of the per-context preemption control register.

v2: Extend to Cannonlake.

Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e804a9b816f1..6245970eb3ab 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1075,8 +1075,10 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	/* WaEnablePreemptionGranularityControlByUMD:skl,bxt,kbl,cfl */
-	ret= wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
+	/* WaEnablePreemptionGranularityControlByUMD:skl,bxt,kbl,cfl,[cnl] */
+	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
+		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
+	ret = wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
 	if (ret)
 		return ret;
 
@@ -1138,14 +1140,6 @@ static int skl_init_workarounds(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	/*
-	 * Actual WA is to disable percontext preemption granularity control
-	 * until D0 which is the default case so this is equivalent to
-	 * !WaDisablePerCtxtPreemptionGranularityControl:skl
-	 */
-	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
-		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
-
 	/* WaEnableGapsTsvCreditFix:skl */
 	I915_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) |
 				   GEN9_GAPS_TSV_CREDIT_DISABLE));
@@ -1278,6 +1272,8 @@ static int cnl_init_workarounds(struct intel_engine_cs *engine)
 	WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3, CNL_FAST_ANISO_L1_BANKING_FIX);
 
 	/* WaEnablePreemptionGranularityControlByUMD:cnl */
+	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
+		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
 	ret= wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
 	if (ret)
 		return ret;
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD
@ 2017-10-04 14:33 Chris Wilson
  0 siblings, 0 replies; 12+ messages in thread
From: Chris Wilson @ 2017-10-04 14:33 UTC (permalink / raw)
  To: intel-gfx

From: Jeff McGee <jeff.mcgee@intel.com>

The WA applies to all production Gen9 and requires both enabling and
whitelisting of the per-context preemption control register.

v2: Extend to Cannonlake.

Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_engine_cs.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e804a9b816f1..6245970eb3ab 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1075,8 +1075,10 @@ static int gen9_init_workarounds(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	/* WaEnablePreemptionGranularityControlByUMD:skl,bxt,kbl,cfl */
-	ret= wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
+	/* WaEnablePreemptionGranularityControlByUMD:skl,bxt,kbl,cfl,[cnl] */
+	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
+		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
+	ret = wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
 	if (ret)
 		return ret;
 
@@ -1138,14 +1140,6 @@ static int skl_init_workarounds(struct intel_engine_cs *engine)
 	if (ret)
 		return ret;
 
-	/*
-	 * Actual WA is to disable percontext preemption granularity control
-	 * until D0 which is the default case so this is equivalent to
-	 * !WaDisablePerCtxtPreemptionGranularityControl:skl
-	 */
-	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
-		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
-
 	/* WaEnableGapsTsvCreditFix:skl */
 	I915_WRITE(GEN8_GARBCNTL, (I915_READ(GEN8_GARBCNTL) |
 				   GEN9_GAPS_TSV_CREDIT_DISABLE));
@@ -1278,6 +1272,8 @@ static int cnl_init_workarounds(struct intel_engine_cs *engine)
 	WA_SET_BIT_MASKED(HALF_SLICE_CHICKEN3, CNL_FAST_ANISO_L1_BANKING_FIX);
 
 	/* WaEnablePreemptionGranularityControlByUMD:cnl */
+	I915_WRITE(GEN7_FF_SLICE_CS_CHICKEN1,
+		   _MASKED_BIT_ENABLE(GEN9_FFSC_PERCTX_PREEMPT_CTRL));
 	ret= wa_ring_whitelist_reg(engine, GEN8_CS_CHICKEN1);
 	if (ret)
 		return ret;
-- 
2.14.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-04 15:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-03 20:34 [CI 1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Chris Wilson
2017-10-03 20:34 ` [CI 2/9] drm/i915/preempt: Default to disabled mid-command preemption levels Chris Wilson
2017-10-03 20:34 ` [CI 3/9] drm/i915/execlists: Distinguish the incomplete context notifies Chris Wilson
2017-10-03 20:34 ` [CI 4/9] drm/i915: Introduce a preempt context Chris Wilson
2017-10-03 20:34 ` [CI 5/9] drm/i915/execlists: Move bdw GPGPU w/a to emit_bb Chris Wilson
2017-10-03 20:34 ` [CI 6/9] drm/i915/execlists: Keep request->priority for its lifetime Chris Wilson
2017-10-03 20:34 ` [CI 7/9] drm/i915: Expand I915_PARAM_HAS_SCHEDULER into a capability bitmask Chris Wilson
2017-10-03 20:34 ` [CI 8/9] drm/i915/execlists: Preemption! Chris Wilson
2017-10-03 20:34 ` [CI 9/9] drm/i915/scheduler: Support user-defined priorities Chris Wilson
2017-10-03 21:14 ` ✗ Fi.CI.BAT: warning for series starting with [CI,1/9] drm/i915/preempt: Fix WaEnablePreemptionGranularityControlByUMD Patchwork
2017-10-04 14:33 [CI 1/9] " Chris Wilson
2017-10-04 15:52 Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.