All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Fix up request cancel
@ 2022-01-24 15:01 ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: daniele.ceraolospurio, john.c.harrison, tvrtko.ursulin

Fix request cancellation + add request cancel low level trace point.

v2:
  - Update cancel reset selftest preemption timeout value to zero
  - Fix bug in execlists cancel code

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Matthew Brost (4):
  drm/i915: Add request cancel low level trace point
  drm/i915/guc: Cancel requests immediately
  drm/i915/execlists: Fix execlists request cancellation corner case
  drm/i915/selftests: Set preemption timeout to zero in cancel reset
    test

 drivers/gpu/drm/i915/gt/intel_context.h       |  1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 ++++++++--
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
 .../drm/i915/gt/intel_execlists_submission.c  | 18 +++++---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 46 +++++++++++--------
 drivers/gpu/drm/i915/i915_request.h           |  6 +++
 drivers/gpu/drm/i915/i915_trace.h             | 10 ++++
 drivers/gpu/drm/i915/selftests/i915_request.c |  7 +--
 9 files changed, 84 insertions(+), 33 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH 0/4] Fix up request cancel
@ 2022-01-24 15:01 ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Fix request cancellation + add request cancel low level trace point.

v2:
  - Update cancel reset selftest preemption timeout value to zero
  - Fix bug in execlists cancel code

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Matthew Brost (4):
  drm/i915: Add request cancel low level trace point
  drm/i915/guc: Cancel requests immediately
  drm/i915/execlists: Fix execlists request cancellation corner case
  drm/i915/selftests: Set preemption timeout to zero in cancel reset
    test

 drivers/gpu/drm/i915/gt/intel_context.h       |  1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 ++++++++--
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
 .../drm/i915/gt/intel_execlists_submission.c  | 18 +++++---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 46 +++++++++++--------
 drivers/gpu/drm/i915/i915_request.h           |  6 +++
 drivers/gpu/drm/i915/i915_trace.h             | 10 ++++
 drivers/gpu/drm/i915/selftests/i915_request.c |  7 +--
 9 files changed, 84 insertions(+), 33 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/4] drm/i915: Add request cancel low level trace point
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
@ 2022-01-24 15:01   ` Matthew Brost
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: daniele.ceraolospurio, john.c.harrison, tvrtko.ursulin

Add request cancel trace point guarded by
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINT.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.h |  1 +
 drivers/gpu/drm/i915/i915_trace.h       | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index d8c74bbf9aae2..3aed4d77f116c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -124,6 +124,7 @@ intel_context_is_pinned(struct intel_context *ce)
 static inline void intel_context_cancel_request(struct intel_context *ce,
 						struct i915_request *rq)
 {
+	trace_i915_request_cancel(rq);
 	GEM_BUG_ON(!ce->ops->cancel_request);
 	return ce->ops->cancel_request(ce, rq);
 }
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 37b5c9e9d260e..d0a11a8bb0ca3 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -324,6 +324,11 @@ DEFINE_EVENT(i915_request, i915_request_add,
 );
 
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+DEFINE_EVENT(i915_request, i915_request_cancel,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
 DEFINE_EVENT(i915_request, i915_request_guc_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
@@ -497,6 +502,11 @@ DEFINE_EVENT(intel_context, intel_context_do_unpin,
 
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
+static inline void
+trace_i915_request_cancel(struct i915_request *rq)
+{
+}
+
 static inline void
 trace_i915_request_guc_submit(struct i915_request *rq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH 1/4] drm/i915: Add request cancel low level trace point
@ 2022-01-24 15:01   ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add request cancel trace point guarded by
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINT.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.h |  1 +
 drivers/gpu/drm/i915/i915_trace.h       | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index d8c74bbf9aae2..3aed4d77f116c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -124,6 +124,7 @@ intel_context_is_pinned(struct intel_context *ce)
 static inline void intel_context_cancel_request(struct intel_context *ce,
 						struct i915_request *rq)
 {
+	trace_i915_request_cancel(rq);
 	GEM_BUG_ON(!ce->ops->cancel_request);
 	return ce->ops->cancel_request(ce, rq);
 }
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 37b5c9e9d260e..d0a11a8bb0ca3 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -324,6 +324,11 @@ DEFINE_EVENT(i915_request, i915_request_add,
 );
 
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+DEFINE_EVENT(i915_request, i915_request_cancel,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
 DEFINE_EVENT(i915_request, i915_request_guc_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
@@ -497,6 +502,11 @@ DEFINE_EVENT(intel_context, intel_context_do_unpin,
 
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
+static inline void
+trace_i915_request_cancel(struct i915_request *rq)
+{
+}
+
 static inline void
 trace_i915_request_guc_submit(struct i915_request *rq)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/4] drm/i915/guc: Cancel requests immediately
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
@ 2022-01-24 15:01   ` Matthew Brost
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: daniele.ceraolospurio, john.c.harrison, tvrtko.ursulin

Change the preemption timeout to the smallest possible value (1 us) when
disabling scheduling to cancel a request and restore it after
cancellation. This not only cancels the request as fast as possible, it
fixes a bug where the preemption timeout is 0 which results in the
schedule disable hanging forever.

Reported-by: Jani Saarinen <jani.saarinen@intel.com>
Fixes: 62eaf0ae217d4 ("drm/i915/guc: Support request cancellation")
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4960
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 46 +++++++++++--------
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 30cd81ad8911a..730998823dbea 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -198,6 +198,11 @@ struct intel_context {
 		 * each priority bucket
 		 */
 		u32 prio_count[GUC_CLIENT_PRIORITY_NUM];
+		/**
+		 * @preemption_timeout: preemption timeout of the context, used
+		 * to restore this value after request cancellation
+		 */
+		u32 preemption_timeout;
 	} guc_state;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 3918f1be114fa..966947c450253 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2147,7 +2147,8 @@ static inline u32 get_children_join_value(struct intel_context *ce,
 	return __get_parent_scratch(ce)->join[child_index].semaphore;
 }
 
-static void guc_context_policy_init(struct intel_engine_cs *engine,
+static void guc_context_policy_init(struct intel_context *ce,
+				    struct intel_engine_cs *engine,
 				    struct guc_lrc_desc *desc)
 {
 	desc->policy_flags = 0;
@@ -2157,7 +2158,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 
 	/* NB: For both of these, zero means disabled. */
 	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
-	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
+	ce->guc_state.preemption_timeout = engine->props.preempt_timeout_ms * 1000;
+	desc->preemption_timeout = ce->guc_state.preemption_timeout;
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
@@ -2193,7 +2195,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	desc->hw_context_desc = ce->lrc.lrca;
 	desc->priority = ce->guc_state.prio;
 	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
-	guc_context_policy_init(engine, desc);
+	guc_context_policy_init(ce, engine, desc);
 
 	/*
 	 * If context is a parent, we need to register a process descriptor
@@ -2226,7 +2228,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 			desc->hw_context_desc = child->lrc.lrca;
 			desc->priority = ce->guc_state.prio;
 			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
-			guc_context_policy_init(engine, desc);
+			guc_context_policy_init(child, engine, desc);
 		}
 
 		clear_children_join_go_memory(ce);
@@ -2409,6 +2411,19 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 	return ce->guc_id.id;
 }
 
+static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
+						 u16 guc_id,
+						 u32 preemption_timeout)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
+		guc_id,
+		preemption_timeout
+	};
+
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+}
+
 static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
@@ -2442,8 +2457,10 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
 
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
-	with_intel_runtime_pm(runtime_pm, wakeref)
+	with_intel_runtime_pm(runtime_pm, wakeref) {
+		__guc_context_set_preemption_timeout(guc, guc_id, 1);
 		__guc_context_sched_disable(guc, ce, guc_id);
+	}
 
 	return &ce->guc_state.blocked;
 }
@@ -2492,8 +2509,10 @@ static void guc_context_unblock(struct intel_context *ce)
 
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
-	if (enable) {
-		with_intel_runtime_pm(runtime_pm, wakeref)
+	with_intel_runtime_pm(runtime_pm, wakeref) {
+		__guc_context_set_preemption_timeout(guc, ce->guc_id.id,
+						     ce->guc_state.preemption_timeout);
+		if (enable)
 			__guc_context_sched_enable(guc, ce);
 	}
 }
@@ -2521,19 +2540,6 @@ static void guc_context_cancel_request(struct intel_context *ce,
 	}
 }
 
-static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
-						 u16 guc_id,
-						 u32 preemption_timeout)
-{
-	u32 action[] = {
-		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
-		guc_id,
-		preemption_timeout
-	};
-
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
-}
-
 static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH 2/4] drm/i915/guc: Cancel requests immediately
@ 2022-01-24 15:01   ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Change the preemption timeout to the smallest possible value (1 us) when
disabling scheduling to cancel a request and restore it after
cancellation. This not only cancels the request as fast as possible, it
fixes a bug where the preemption timeout is 0 which results in the
schedule disable hanging forever.

Reported-by: Jani Saarinen <jani.saarinen@intel.com>
Fixes: 62eaf0ae217d4 ("drm/i915/guc: Support request cancellation")
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4960
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 46 +++++++++++--------
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 30cd81ad8911a..730998823dbea 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -198,6 +198,11 @@ struct intel_context {
 		 * each priority bucket
 		 */
 		u32 prio_count[GUC_CLIENT_PRIORITY_NUM];
+		/**
+		 * @preemption_timeout: preemption timeout of the context, used
+		 * to restore this value after request cancellation
+		 */
+		u32 preemption_timeout;
 	} guc_state;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 3918f1be114fa..966947c450253 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2147,7 +2147,8 @@ static inline u32 get_children_join_value(struct intel_context *ce,
 	return __get_parent_scratch(ce)->join[child_index].semaphore;
 }
 
-static void guc_context_policy_init(struct intel_engine_cs *engine,
+static void guc_context_policy_init(struct intel_context *ce,
+				    struct intel_engine_cs *engine,
 				    struct guc_lrc_desc *desc)
 {
 	desc->policy_flags = 0;
@@ -2157,7 +2158,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 
 	/* NB: For both of these, zero means disabled. */
 	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
-	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
+	ce->guc_state.preemption_timeout = engine->props.preempt_timeout_ms * 1000;
+	desc->preemption_timeout = ce->guc_state.preemption_timeout;
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
@@ -2193,7 +2195,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	desc->hw_context_desc = ce->lrc.lrca;
 	desc->priority = ce->guc_state.prio;
 	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
-	guc_context_policy_init(engine, desc);
+	guc_context_policy_init(ce, engine, desc);
 
 	/*
 	 * If context is a parent, we need to register a process descriptor
@@ -2226,7 +2228,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 			desc->hw_context_desc = child->lrc.lrca;
 			desc->priority = ce->guc_state.prio;
 			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
-			guc_context_policy_init(engine, desc);
+			guc_context_policy_init(child, engine, desc);
 		}
 
 		clear_children_join_go_memory(ce);
@@ -2409,6 +2411,19 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 	return ce->guc_id.id;
 }
 
+static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
+						 u16 guc_id,
+						 u32 preemption_timeout)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
+		guc_id,
+		preemption_timeout
+	};
+
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+}
+
 static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
@@ -2442,8 +2457,10 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
 
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
-	with_intel_runtime_pm(runtime_pm, wakeref)
+	with_intel_runtime_pm(runtime_pm, wakeref) {
+		__guc_context_set_preemption_timeout(guc, guc_id, 1);
 		__guc_context_sched_disable(guc, ce, guc_id);
+	}
 
 	return &ce->guc_state.blocked;
 }
@@ -2492,8 +2509,10 @@ static void guc_context_unblock(struct intel_context *ce)
 
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
-	if (enable) {
-		with_intel_runtime_pm(runtime_pm, wakeref)
+	with_intel_runtime_pm(runtime_pm, wakeref) {
+		__guc_context_set_preemption_timeout(guc, ce->guc_id.id,
+						     ce->guc_state.preemption_timeout);
+		if (enable)
 			__guc_context_sched_enable(guc, ce);
 	}
 }
@@ -2521,19 +2540,6 @@ static void guc_context_cancel_request(struct intel_context *ce,
 	}
 }
 
-static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
-						 u16 guc_id,
-						 u32 preemption_timeout)
-{
-	u32 action[] = {
-		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
-		guc_id,
-		preemption_timeout
-	};
-
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
-}
-
 static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
@ 2022-01-24 15:01   ` Matthew Brost
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: daniele.ceraolospurio, john.c.harrison, tvrtko.ursulin

More than 1 request can be submitted to a single ELSP at a time if
multiple requests are ready run to on the same context. When a request
is canceled it is marked bad, an idle pulse is triggered to the engine
(high priority kernel request), the execlists scheduler sees that
running request is bad and sets preemption timeout to minimum value (1
ms). This fails to work if multiple requests are combined on the ELSP as
only the most recent request is stored in the execlists schedule (the
request stored in the ELSP isn't marked bad, thus preemption timeout
isn't set to the minimum value). If the preempt timeout is configured to
zero, the engine is permanently hung. This is shown by an upcoming
selftest.

To work around this, mark the idle pulse with a flag to force a preempt
with the minimum value.

Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
 .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
 drivers/gpu/drm/i915/i915_request.h           |  6 +++++
 4 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index a3698f611f457..efd1c719b4072 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
 	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
 }
 
-static int __intel_engine_pulse(struct intel_engine_cs *engine)
+static int __intel_engine_pulse(struct intel_engine_cs *engine,
+				bool force_preempt)
 {
 	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
 	struct intel_context *ce = engine->kernel_context;
@@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
 		return PTR_ERR(rq);
 
 	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
+	if (force_preempt)
+		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
 
 	heartbeat_commit(rq, &attr);
 	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
@@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 
 		/* recheck current execution */
 		if (intel_engine_has_preemption(engine)) {
-			err = __intel_engine_pulse(engine);
+			err = __intel_engine_pulse(engine, false);
 			if (err)
 				set_heartbeat(engine, saved);
 		}
@@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 	return err;
 }
 
-int intel_engine_pulse(struct intel_engine_cs *engine)
+static int _intel_engine_pulse(struct intel_engine_cs *engine,
+			       bool force_preempt)
 {
 	struct intel_context *ce = engine->kernel_context;
 	int err;
@@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
 
 	err = -EINTR;
 	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
-		err = __intel_engine_pulse(engine);
+		err = __intel_engine_pulse(engine, force_preempt);
 		mutex_unlock(&ce->timeline->mutex);
 	}
 
@@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
 	return err;
 }
 
+int intel_engine_pulse(struct intel_engine_cs *engine)
+{
+	return _intel_engine_pulse(engine, false);
+}
+
+
+int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
+{
+	return _intel_engine_pulse(engine, true);
+}
+
 int intel_engine_flush_barriers(struct intel_engine_cs *engine)
 {
 	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
index 5da6d809a87a2..d9c8386754cb3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
@@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
 void intel_gt_unpark_heartbeats(struct intel_gt *gt);
 
 int intel_engine_pulse(struct intel_engine_cs *engine);
+int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
 int intel_engine_flush_barriers(struct intel_engine_cs *engine);
 
 #endif /* INTEL_ENGINE_HEARTBEAT_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 960a9aaf4f3a3..f0c2024058731 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
 }
 
 static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
-					    const struct i915_request *rq)
+					    const struct i915_request *rq,
+					    bool force_preempt)
 {
 	if (!rq)
 		return 0;
 
 	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
-	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
+	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
+		     force_preempt))
 		return 1;
 
 	return READ_ONCE(engine->props.preempt_timeout_ms);
 }
 
 static void set_preempt_timeout(struct intel_engine_cs *engine,
-				const struct i915_request *rq)
+				const struct i915_request *rq,
+				bool force_preempt)
 {
 	if (!intel_engine_has_preempt_reset(engine))
 		return;
 
 	set_timer_ms(&engine->execlists.preempt,
-		     active_preempt_timeout(engine, rq));
+		     active_preempt_timeout(engine, rq, force_preempt));
 }
 
 static bool completed(const struct i915_request *rq)
@@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	    memcmp(active,
 		   execlists->pending,
 		   (port - execlists->pending) * sizeof(*port))) {
+		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
+					      &last->fence.flags);
+
 		*port = NULL;
 		while (port-- != execlists->pending)
 			execlists_schedule_in(*port, port - execlists->pending);
 
 		WRITE_ONCE(execlists->yield, -1);
-		set_preempt_timeout(engine, *active);
+		set_preempt_timeout(engine, *active, force_preempt);
 		execlists_submit_ports(engine);
 	} else {
 		ring_set_paused(engine, 0);
@@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
 
 	i915_request_active_engine(rq, &engine);
 
-	if (engine && intel_engine_pulse(engine))
+	if (engine && intel_engine_pulse_force_preempt(engine))
 		intel_gt_handle_error(engine->gt, engine->mask, 0,
 				      "request cancellation by %s",
 				      current->comm);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 28b1f9db54875..7e6312233d4c7 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -170,6 +170,12 @@ enum {
 	 * fence (dma_fence_array) and i915 generated for parallel submission.
 	 */
 	I915_FENCE_FLAG_COMPOSITE,
+
+	/*
+	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
+	 * of preempt timeout configuration
+	 */
+	I915_FENCE_FLAG_FORCE_PREEMPT,
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
@ 2022-01-24 15:01   ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel

More than 1 request can be submitted to a single ELSP at a time if
multiple requests are ready run to on the same context. When a request
is canceled it is marked bad, an idle pulse is triggered to the engine
(high priority kernel request), the execlists scheduler sees that
running request is bad and sets preemption timeout to minimum value (1
ms). This fails to work if multiple requests are combined on the ELSP as
only the most recent request is stored in the execlists schedule (the
request stored in the ELSP isn't marked bad, thus preemption timeout
isn't set to the minimum value). If the preempt timeout is configured to
zero, the engine is permanently hung. This is shown by an upcoming
selftest.

To work around this, mark the idle pulse with a flag to force a preempt
with the minimum value.

Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
 .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
 drivers/gpu/drm/i915/i915_request.h           |  6 +++++
 4 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index a3698f611f457..efd1c719b4072 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
 	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
 }
 
-static int __intel_engine_pulse(struct intel_engine_cs *engine)
+static int __intel_engine_pulse(struct intel_engine_cs *engine,
+				bool force_preempt)
 {
 	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
 	struct intel_context *ce = engine->kernel_context;
@@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
 		return PTR_ERR(rq);
 
 	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
+	if (force_preempt)
+		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
 
 	heartbeat_commit(rq, &attr);
 	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
@@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 
 		/* recheck current execution */
 		if (intel_engine_has_preemption(engine)) {
-			err = __intel_engine_pulse(engine);
+			err = __intel_engine_pulse(engine, false);
 			if (err)
 				set_heartbeat(engine, saved);
 		}
@@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 	return err;
 }
 
-int intel_engine_pulse(struct intel_engine_cs *engine)
+static int _intel_engine_pulse(struct intel_engine_cs *engine,
+			       bool force_preempt)
 {
 	struct intel_context *ce = engine->kernel_context;
 	int err;
@@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
 
 	err = -EINTR;
 	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
-		err = __intel_engine_pulse(engine);
+		err = __intel_engine_pulse(engine, force_preempt);
 		mutex_unlock(&ce->timeline->mutex);
 	}
 
@@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
 	return err;
 }
 
+int intel_engine_pulse(struct intel_engine_cs *engine)
+{
+	return _intel_engine_pulse(engine, false);
+}
+
+
+int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
+{
+	return _intel_engine_pulse(engine, true);
+}
+
 int intel_engine_flush_barriers(struct intel_engine_cs *engine)
 {
 	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
index 5da6d809a87a2..d9c8386754cb3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
@@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
 void intel_gt_unpark_heartbeats(struct intel_gt *gt);
 
 int intel_engine_pulse(struct intel_engine_cs *engine);
+int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
 int intel_engine_flush_barriers(struct intel_engine_cs *engine);
 
 #endif /* INTEL_ENGINE_HEARTBEAT_H */
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 960a9aaf4f3a3..f0c2024058731 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
 }
 
 static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
-					    const struct i915_request *rq)
+					    const struct i915_request *rq,
+					    bool force_preempt)
 {
 	if (!rq)
 		return 0;
 
 	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
-	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
+	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
+		     force_preempt))
 		return 1;
 
 	return READ_ONCE(engine->props.preempt_timeout_ms);
 }
 
 static void set_preempt_timeout(struct intel_engine_cs *engine,
-				const struct i915_request *rq)
+				const struct i915_request *rq,
+				bool force_preempt)
 {
 	if (!intel_engine_has_preempt_reset(engine))
 		return;
 
 	set_timer_ms(&engine->execlists.preempt,
-		     active_preempt_timeout(engine, rq));
+		     active_preempt_timeout(engine, rq, force_preempt));
 }
 
 static bool completed(const struct i915_request *rq)
@@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	    memcmp(active,
 		   execlists->pending,
 		   (port - execlists->pending) * sizeof(*port))) {
+		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
+					      &last->fence.flags);
+
 		*port = NULL;
 		while (port-- != execlists->pending)
 			execlists_schedule_in(*port, port - execlists->pending);
 
 		WRITE_ONCE(execlists->yield, -1);
-		set_preempt_timeout(engine, *active);
+		set_preempt_timeout(engine, *active, force_preempt);
 		execlists_submit_ports(engine);
 	} else {
 		ring_set_paused(engine, 0);
@@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
 
 	i915_request_active_engine(rq, &engine);
 
-	if (engine && intel_engine_pulse(engine))
+	if (engine && intel_engine_pulse_force_preempt(engine))
 		intel_gt_handle_error(engine->gt, engine->mask, 0,
 				      "request cancellation by %s",
 				      current->comm);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 28b1f9db54875..7e6312233d4c7 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -170,6 +170,12 @@ enum {
 	 * fence (dma_fence_array) and i915 generated for parallel submission.
 	 */
 	I915_FENCE_FLAG_COMPOSITE,
+
+	/*
+	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
+	 * of preempt timeout configuration
+	 */
+	I915_FENCE_FLAG_FORCE_PREEMPT,
 };
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/4] drm/i915/selftests: Set preemption timeout to zero in cancel reset test
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
@ 2022-01-24 15:01   ` Matthew Brost
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: daniele.ceraolospurio, john.c.harrison, tvrtko.ursulin

Set the preemption timeout to zero to prove that request cancellation
with preemption disabled works. Also this seals a race between a
possible preemption and request cancellation.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_request.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 2a99dd7c2fe8a..e522e24129f9b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -790,8 +790,9 @@ static int __cancel_completed(struct intel_engine_cs *engine)
  * wait for spinner to start, create a NOP request and submit it, cancel the
  * spinner, wait for spinner to complete and verify it failed with an error,
  * finally wait for NOP request to complete verify it succeeded without an
- * error. Preemption timeout also reduced / restored so test runs in a timely
- * maner.
+ * error. Preemption timeout also set to zero to ensure cancellation works with
+ * preemption disabled and to ensure the NOP request doesn't trigger a
+ * preemption on the spinner sealing a race between a preemption and the cancel.
  */
 static int __cancel_reset(struct drm_i915_private *i915,
 			  struct intel_engine_cs *engine)
@@ -807,7 +808,7 @@ static int __cancel_reset(struct drm_i915_private *i915,
 		return 0;
 
 	preempt_timeout_ms = engine->props.preempt_timeout_ms;
-	engine->props.preempt_timeout_ms = 100;
+	engine->props.preempt_timeout_ms = 0;
 
 	if (igt_spinner_init(&spin, engine->gt))
 		goto out_restore;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] [PATCH 4/4] drm/i915/selftests: Set preemption timeout to zero in cancel reset test
@ 2022-01-24 15:01   ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-24 15:01 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Set the preemption timeout to zero to prove that request cancellation
with preemption disabled works. Also this seals a race between a
possible preemption and request cancellation.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_request.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 2a99dd7c2fe8a..e522e24129f9b 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -790,8 +790,9 @@ static int __cancel_completed(struct intel_engine_cs *engine)
  * wait for spinner to start, create a NOP request and submit it, cancel the
  * spinner, wait for spinner to complete and verify it failed with an error,
  * finally wait for NOP request to complete verify it succeeded without an
- * error. Preemption timeout also reduced / restored so test runs in a timely
- * maner.
+ * error. Preemption timeout also set to zero to ensure cancellation works with
+ * preemption disabled and to ensure the NOP request doesn't trigger a
+ * preemption on the spinner sealing a race between a preemption and the cancel.
  */
 static int __cancel_reset(struct drm_i915_private *i915,
 			  struct intel_engine_cs *engine)
@@ -807,7 +808,7 @@ static int __cancel_reset(struct drm_i915_private *i915,
 		return 0;
 
 	preempt_timeout_ms = engine->props.preempt_timeout_ms;
-	engine->props.preempt_timeout_ms = 100;
+	engine->props.preempt_timeout_ms = 0;
 
 	if (igt_spinner_init(&spin, engine->gt))
 		goto out_restore;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix up request cancel (rev2)
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
                   ` (4 preceding siblings ...)
  (?)
@ 2022-01-24 22:52 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2022-01-24 22:52 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: Fix up request cancel (rev2)
URL   : https://patchwork.freedesktop.org/series/99173/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
9d38a07fb177 drm/i915: Add request cancel low level trace point
f478dcc4fbab drm/i915/guc: Cancel requests immediately
bebfc9866392 drm/i915/execlists: Fix execlists request cancellation corner case
-:85: CHECK:LINE_SPACING: Please don't use multiple blank lines
#85: FILE: drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c:346:
+
+

total: 0 errors, 0 warnings, 1 checks, 135 lines checked
8a5938d99c50 drm/i915/selftests: Set preemption timeout to zero in cancel reset test



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Fix up request cancel (rev2)
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
                   ` (5 preceding siblings ...)
  (?)
@ 2022-01-24 22:54 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2022-01-24 22:54 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: Fix up request cancel (rev2)
URL   : https://patchwork.freedesktop.org/series/99173/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Fix up request cancel (rev2)
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
                   ` (6 preceding siblings ...)
  (?)
@ 2022-01-24 23:27 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2022-01-24 23:27 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 5136 bytes --]

== Series Details ==

Series: Fix up request cancel (rev2)
URL   : https://patchwork.freedesktop.org/series/99173/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11129 -> Patchwork_22087
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/index.html

Participating hosts (48 -> 43)
------------------------------

  Missing    (5): fi-hsw-4200u fi-bsw-cyan fi-icl-u2 fi-ctg-p8600 fi-bdw-samus 

Known issues
------------

  Here are the changes found in Patchwork_22087 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - fi-snb-2600:        [PASS][1] -> [DMESG-WARN][2] ([i915#4913])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/fi-snb-2600/igt@gem_exec_suspend@basic-s3@smem.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/fi-snb-2600/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@kms_psr@primary_page_flip:
    - fi-skl-6600u:       [PASS][3] -> [FAIL][4] ([i915#4547])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/fi-skl-6600u/igt@kms_psr@primary_page_flip.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/fi-skl-6600u/igt@kms_psr@primary_page_flip.html

  * igt@runner@aborted:
    - fi-skl-6600u:       NOTRUN -> [FAIL][5] ([i915#4312])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/fi-skl-6600u/igt@runner@aborted.html
    - fi-bdw-5557u:       NOTRUN -> [FAIL][6] ([i915#2426] / [i915#4312])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/fi-bdw-5557u/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gt_heartbeat:
    - {fi-tgl-dsi}:       [DMESG-FAIL][7] ([i915#541]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/fi-tgl-dsi/igt@i915_selftest@live@gt_heartbeat.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/fi-tgl-dsi/igt@i915_selftest@live@gt_heartbeat.html
    - fi-bsw-kefka:       [DMESG-FAIL][9] ([i915#541]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/fi-bsw-kefka/igt@i915_selftest@live@gt_heartbeat.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/fi-bsw-kefka/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-5:          [DMESG-FAIL][11] ([i915#4494]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/bat-dg1-5/igt@i915_selftest@live@hangcheck.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/bat-dg1-5/igt@i915_selftest@live@hangcheck.html

  
#### Warnings ####

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-6:          [DMESG-FAIL][13] -> [DMESG-FAIL][14] ([i915#4494])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/bat-dg1-6/igt@i915_selftest@live@hangcheck.html
    - fi-hsw-4770:        [INCOMPLETE][15] ([i915#3303]) -> [INCOMPLETE][16] ([i915#4785])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#2426]: https://gitlab.freedesktop.org/drm/intel/issues/2426
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4547]: https://gitlab.freedesktop.org/drm/intel/issues/4547
  [i915#4562]: https://gitlab.freedesktop.org/drm/intel/issues/4562
  [i915#4785]: https://gitlab.freedesktop.org/drm/intel/issues/4785
  [i915#4898]: https://gitlab.freedesktop.org/drm/intel/issues/4898
  [i915#4913]: https://gitlab.freedesktop.org/drm/intel/issues/4913
  [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541


Build changes
-------------

  * Linux: CI_DRM_11129 -> Patchwork_22087

  CI-20190529: 20190529
  CI_DRM_11129: 0b83d3cf9f9eab03ec804d56ac2686320a64f3ee @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6330: f73008bac9a8db0779264b170f630483e9165764 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22087: 8a5938d99c50adeced0139269015a91bc9ec126b @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

8a5938d99c50 drm/i915/selftests: Set preemption timeout to zero in cancel reset test
bebfc9866392 drm/i915/execlists: Fix execlists request cancellation corner case
f478dcc4fbab drm/i915/guc: Cancel requests immediately
9d38a07fb177 drm/i915: Add request cancel low level trace point

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/index.html

[-- Attachment #2: Type: text/html, Size: 5970 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for Fix up request cancel (rev2)
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
                   ` (7 preceding siblings ...)
  (?)
@ 2022-01-25  4:54 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2022-01-25  4:54 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 30251 bytes --]

== Series Details ==

Series: Fix up request cancel (rev2)
URL   : https://patchwork.freedesktop.org/series/99173/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11129_full -> Patchwork_22087_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_22087_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_22087_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (11 -> 10)
------------------------------

  Missing    (1): shard-tglu 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22087_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_create@create-massive:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][1] +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl7/igt@gem_create@create-massive.html
    - shard-apl:          NOTRUN -> [DMESG-WARN][2]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl1/igt@gem_create@create-massive.html

  * igt@gem_linear_blits@interruptible:
    - shard-glk:          [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk4/igt@gem_linear_blits@interruptible.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk5/igt@gem_linear_blits@interruptible.html

  * igt@gem_userptr_blits@input-checking:
    - shard-skl:          NOTRUN -> [DMESG-WARN][5] +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl2/igt@gem_userptr_blits@input-checking.html

  
Known issues
------------

  Here are the changes found in Patchwork_22087_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_isolation@preservation-s3@rcs0:
    - shard-kbl:          [PASS][6] -> [INCOMPLETE][7] ([i915#794])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-kbl6/igt@gem_ctx_isolation@preservation-s3@rcs0.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl4/igt@gem_ctx_isolation@preservation-s3@rcs0.html

  * igt@gem_ctx_sseu@mmap-args:
    - shard-apl:          NOTRUN -> [SKIP][8] ([fdo#109271]) +59 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl7/igt@gem_ctx_sseu@mmap-args.html

  * igt@gem_eio@in-flight-contexts-10ms:
    - shard-skl:          [PASS][9] -> [TIMEOUT][10] ([i915#3063])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl7/igt@gem_eio@in-flight-contexts-10ms.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl9/igt@gem_eio@in-flight-contexts-10ms.html

  * igt@gem_eio@in-flight-contexts-immediate:
    - shard-tglb:         [PASS][11] -> [TIMEOUT][12] ([i915#3063])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-tglb6/igt@gem_eio@in-flight-contexts-immediate.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb2/igt@gem_eio@in-flight-contexts-immediate.html

  * igt@gem_eio@unwedge-stress:
    - shard-iclb:         [PASS][13] -> [TIMEOUT][14] ([i915#2481] / [i915#3070])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-iclb7/igt@gem_eio@unwedge-stress.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb3/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_balancer@parallel:
    - shard-iclb:         [PASS][15] -> [SKIP][16] ([i915#4525])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-iclb4/igt@gem_exec_balancer@parallel.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb5/igt@gem_exec_balancer@parallel.html

  * igt@gem_exec_capture@pi@rcs0:
    - shard-skl:          [PASS][17] -> [INCOMPLETE][18] ([i915#4547])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl7/igt@gem_exec_capture@pi@rcs0.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl4/igt@gem_exec_capture@pi@rcs0.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-skl:          NOTRUN -> [FAIL][19] ([i915#2846])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl7/igt@gem_exec_fair@basic-deadline.html
    - shard-glk:          [PASS][20] -> [FAIL][21] ([i915#2846])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk9/igt@gem_exec_fair@basic-deadline.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk7/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-flow@rcs0:
    - shard-skl:          NOTRUN -> [SKIP][22] ([fdo#109271]) +125 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl2/igt@gem_exec_fair@basic-flow@rcs0.html
    - shard-tglb:         [PASS][23] -> [FAIL][24] ([i915#2842]) +2 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-tglb1/igt@gem_exec_fair@basic-flow@rcs0.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb6/igt@gem_exec_fair@basic-flow@rcs0.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-glk:          [PASS][25] -> [FAIL][26] ([i915#2842])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk8/igt@gem_exec_fair@basic-none-share@rcs0.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk4/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-none-solo@rcs0:
    - shard-kbl:          NOTRUN -> [FAIL][27] ([i915#2842])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@gem_exec_fair@basic-none-solo@rcs0.html
    - shard-tglb:         NOTRUN -> [FAIL][28] ([i915#2842])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@gem_exec_fair@basic-none-solo@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs0:
    - shard-kbl:          [PASS][29] -> [FAIL][30] ([i915#2842]) +3 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-kbl3/igt@gem_exec_fair@basic-none@vcs0.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl6/igt@gem_exec_fair@basic-none@vcs0.html

  * igt@gem_lmem_swapping@parallel-random-verify:
    - shard-kbl:          NOTRUN -> [SKIP][31] ([fdo#109271] / [i915#4613]) +2 similar issues
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl1/igt@gem_lmem_swapping@parallel-random-verify.html
    - shard-skl:          NOTRUN -> [SKIP][32] ([fdo#109271] / [i915#4613])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl7/igt@gem_lmem_swapping@parallel-random-verify.html

  * igt@gem_pread@exhaustion:
    - shard-kbl:          NOTRUN -> [WARN][33] ([i915#2658])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@gem_pread@exhaustion.html

  * igt@gem_pxp@reject-modify-context-protection-off-1:
    - shard-tglb:         NOTRUN -> [SKIP][34] ([i915#4270])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@gem_pxp@reject-modify-context-protection-off-1.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-kbl:          NOTRUN -> [SKIP][35] ([fdo#109271] / [i915#3323])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@gem_userptr_blits@dmabuf-sync.html
    - shard-skl:          NOTRUN -> [SKIP][36] ([fdo#109271] / [i915#3323])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl2/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@unsync-unmap-cycles:
    - shard-tglb:         NOTRUN -> [SKIP][37] ([i915#3297])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@gem_userptr_blits@unsync-unmap-cycles.html
    - shard-iclb:         NOTRUN -> [SKIP][38] ([i915#3297])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb3/igt@gem_userptr_blits@unsync-unmap-cycles.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-tglb:         NOTRUN -> [SKIP][39] ([i915#2527] / [i915#2856])
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@gen9_exec_parse@allowed-single.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-iclb:         [PASS][40] -> [FAIL][41] ([i915#454])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-iclb5/igt@i915_pm_dc@dc6-psr.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb4/igt@i915_pm_dc@dc6-psr.html

  * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-dp:
    - shard-kbl:          NOTRUN -> [SKIP][42] ([fdo#109271] / [i915#1937])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl1/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-dp.html

  * igt@kms_big_fb@linear-8bpp-rotate-90:
    - shard-tglb:         NOTRUN -> [SKIP][43] ([fdo#111614]) +1 similar issue
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_big_fb@linear-8bpp-rotate-90.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-hflip:
    - shard-apl:          NOTRUN -> [SKIP][44] ([fdo#109271] / [i915#3777])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl7/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-0-hflip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][45] ([i915#3763])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl2/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-async-flip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-kbl:          NOTRUN -> [SKIP][46] ([fdo#109271] / [i915#3777]) +2 similar issues
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_big_fb@y-tiled-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_big_fb@yf-tiled-16bpp-rotate-180:
    - shard-tglb:         NOTRUN -> [SKIP][47] ([fdo#111615])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_big_fb@yf-tiled-16bpp-rotate-180.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip:
    - shard-skl:          NOTRUN -> [SKIP][48] ([fdo#109271] / [i915#3777]) +1 similar issue
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl7/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip.html

  * igt@kms_ccs@pipe-a-ccs-on-another-bo-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][49] ([fdo#109271] / [i915#3886]) +1 similar issue
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl7/igt@kms_ccs@pipe-a-ccs-on-another-bo-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc:
    - shard-kbl:          NOTRUN -> [SKIP][50] ([fdo#109271] / [i915#3886]) +10 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc.html
    - shard-skl:          NOTRUN -> [SKIP][51] ([fdo#109271] / [i915#3886]) +8 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl8/igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-b-bad-rotation-90-y_tiled_gen12_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][52] ([i915#3689] / [i915#3886])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_ccs@pipe-b-bad-rotation-90-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][53] ([i915#3689])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_ccs.html

  * igt@kms_chamelium@dp-audio-edid:
    - shard-tglb:         NOTRUN -> [SKIP][54] ([fdo#109284] / [fdo#111827]) +3 similar issues
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_chamelium@dp-audio-edid.html

  * igt@kms_chamelium@dp-edid-change-during-suspend:
    - shard-apl:          NOTRUN -> [SKIP][55] ([fdo#109271] / [fdo#111827]) +3 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl7/igt@kms_chamelium@dp-edid-change-during-suspend.html

  * igt@kms_chamelium@hdmi-hpd-storm-disable:
    - shard-skl:          NOTRUN -> [SKIP][56] ([fdo#109271] / [fdo#111827]) +9 similar issues
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl8/igt@kms_chamelium@hdmi-hpd-storm-disable.html

  * igt@kms_color_chamelium@pipe-a-ctm-0-75:
    - shard-kbl:          NOTRUN -> [SKIP][57] ([fdo#109271] / [fdo#111827]) +18 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_color_chamelium@pipe-a-ctm-0-75.html

  * igt@kms_color_chamelium@pipe-a-ctm-red-to-blue:
    - shard-iclb:         NOTRUN -> [SKIP][58] ([fdo#109284] / [fdo#111827])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb3/igt@kms_color_chamelium@pipe-a-ctm-red-to-blue.html

  * igt@kms_content_protection@atomic:
    - shard-kbl:          NOTRUN -> [TIMEOUT][59] ([i915#1319])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@uevent:
    - shard-kbl:          NOTRUN -> [FAIL][60] ([i915#2105])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl1/igt@kms_content_protection@uevent.html

  * igt@kms_cursor_crc@pipe-b-cursor-suspend:
    - shard-skl:          [PASS][61] -> [INCOMPLETE][62] ([i915#300])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl4/igt@kms_cursor_crc@pipe-b-cursor-suspend.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl10/igt@kms_cursor_crc@pipe-b-cursor-suspend.html

  * igt@kms_cursor_crc@pipe-c-cursor-32x32-offscreen:
    - shard-tglb:         NOTRUN -> [SKIP][63] ([i915#3319]) +1 similar issue
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_cursor_crc@pipe-c-cursor-32x32-offscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-512x512-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][64] ([fdo#109279] / [i915#3359])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_cursor_crc@pipe-c-cursor-512x512-rapid-movement.html
    - shard-iclb:         NOTRUN -> [SKIP][65] ([fdo#109278] / [fdo#109279])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb3/igt@kms_cursor_crc@pipe-c-cursor-512x512-rapid-movement.html

  * igt@kms_cursor_crc@pipe-d-cursor-128x128-random:
    - shard-iclb:         NOTRUN -> [SKIP][66] ([fdo#109278]) +4 similar issues
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb3/igt@kms_cursor_crc@pipe-d-cursor-128x128-random.html

  * igt@kms_cursor_edge_walk@pipe-d-64x64-left-edge:
    - shard-kbl:          NOTRUN -> [SKIP][67] ([fdo#109271]) +226 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl1/igt@kms_cursor_edge_walk@pipe-d-64x64-left-edge.html

  * igt@kms_flip@2x-plain-flip-interruptible:
    - shard-tglb:         NOTRUN -> [SKIP][68] ([fdo#109274] / [fdo#111825]) +2 similar issues
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_flip@2x-plain-flip-interruptible.html

  * igt@kms_flip@flip-vs-expired-vblank@b-edp1:
    - shard-skl:          [PASS][69] -> [FAIL][70] ([i915#79])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl9/igt@kms_flip@flip-vs-expired-vblank@b-edp1.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl4/igt@kms_flip@flip-vs-expired-vblank@b-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-apl:          [PASS][71] -> [DMESG-WARN][72] ([i915#180]) +3 similar issues
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-apl1/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl2/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@b-vga1:
    - shard-snb:          [PASS][73] -> [FAIL][74] ([fdo#103375])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-snb4/igt@kms_flip@flip-vs-suspend-interruptible@b-vga1.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-snb7/igt@kms_flip@flip-vs-suspend-interruptible@b-vga1.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-kbl:          [PASS][75] -> [DMESG-WARN][76] ([i915#180]) +2 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-kbl1/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl6/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-mmap-wc:
    - shard-tglb:         NOTRUN -> [SKIP][77] ([fdo#109280] / [fdo#111825]) +6 similar issues
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-cur-indfb-draw-mmap-wc.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-shrfb-draw-mmap-gtt:
    - shard-iclb:         NOTRUN -> [SKIP][78] ([fdo#109280]) +1 similar issue
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb3/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-pri-shrfb-draw-mmap-gtt.html

  * igt@kms_hdr@bpc-switch:
    - shard-skl:          [PASS][79] -> [FAIL][80] ([i915#1188])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl8/igt@kms_hdr@bpc-switch.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl1/igt@kms_hdr@bpc-switch.html

  * igt@kms_hdr@static-toggle-dpms:
    - shard-tglb:         NOTRUN -> [SKIP][81] ([i915#1187])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_hdr@static-toggle-dpms.html
    - shard-iclb:         NOTRUN -> [SKIP][82] ([i915#1187])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb3/igt@kms_hdr@static-toggle-dpms.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-d-frame-sequence:
    - shard-kbl:          NOTRUN -> [SKIP][83] ([fdo#109271] / [i915#533]) +3 similar issues
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-d-frame-sequence.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-7efc:
    - shard-skl:          NOTRUN -> [FAIL][84] ([fdo#108145] / [i915#265]) +1 similar issue
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl7/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb:
    - shard-apl:          NOTRUN -> [FAIL][85] ([i915#265])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl1/igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
    - shard-kbl:          NOTRUN -> [FAIL][86] ([fdo#108145] / [i915#265]) +3 similar issues
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb:
    - shard-kbl:          NOTRUN -> [FAIL][87] ([i915#265]) +1 similar issue
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb.html
    - shard-skl:          NOTRUN -> [FAIL][88] ([i915#265])
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl8/igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb.html

  * igt@kms_plane_lowres@pipe-d-tiling-x:
    - shard-tglb:         NOTRUN -> [SKIP][89] ([i915#3536])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_plane_lowres@pipe-d-tiling-x.html

  * igt@kms_psr2_sf@overlay-plane-update-continuous-sf:
    - shard-tglb:         NOTRUN -> [SKIP][90] ([i915#2920])
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_psr2_sf@overlay-plane-update-continuous-sf.html
    - shard-skl:          NOTRUN -> [SKIP][91] ([fdo#109271] / [i915#658])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl8/igt@kms_psr2_sf@overlay-plane-update-continuous-sf.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area:
    - shard-kbl:          NOTRUN -> [SKIP][92] ([fdo#109271] / [i915#658]) +1 similar issue
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_psr2_sf@plane-move-sf-dmg-area.html

  * igt@kms_vblank@pipe-c-ts-continuation-suspend:
    - shard-skl:          [PASS][93] -> [INCOMPLETE][94] ([i915#2828])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl10/igt@kms_vblank@pipe-c-ts-continuation-suspend.html
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl5/igt@kms_vblank@pipe-c-ts-continuation-suspend.html

  * igt@kms_writeback@writeback-fb-id:
    - shard-kbl:          NOTRUN -> [SKIP][95] ([fdo#109271] / [i915#2437]) +2 similar issues
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_writeback@writeback-fb-id.html
    - shard-tglb:         NOTRUN -> [SKIP][96] ([i915#2437])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@kms_writeback@writeback-fb-id.html

  * igt@kms_writeback@writeback-invalid-parameters:
    - shard-skl:          NOTRUN -> [SKIP][97] ([fdo#109271] / [i915#2437]) +1 similar issue
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl2/igt@kms_writeback@writeback-invalid-parameters.html

  * igt@perf@polling-parameterized:
    - shard-glk:          [PASS][98] -> [FAIL][99] ([i915#1542])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk3/igt@perf@polling-parameterized.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk9/igt@perf@polling-parameterized.html

  * igt@prime_nv_api@nv_i915_import_twice_check_flink_name:
    - shard-tglb:         NOTRUN -> [SKIP][100] ([fdo#109291])
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@prime_nv_api@nv_i915_import_twice_check_flink_name.html

  * igt@sysfs_clients@sema-10:
    - shard-skl:          NOTRUN -> [SKIP][101] ([fdo#109271] / [i915#2994]) +1 similar issue
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl8/igt@sysfs_clients@sema-10.html
    - shard-tglb:         NOTRUN -> [SKIP][102] ([i915#2994])
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb5/igt@sysfs_clients@sema-10.html

  * igt@sysfs_clients@sema-50:
    - shard-apl:          NOTRUN -> [SKIP][103] ([fdo#109271] / [i915#2994]) +1 similar issue
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl7/igt@sysfs_clients@sema-50.html

  * igt@sysfs_clients@split-50:
    - shard-kbl:          NOTRUN -> [SKIP][104] ([fdo#109271] / [i915#2994]) +2 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl1/igt@sysfs_clients@split-50.html

  
#### Possible fixes ####

  * igt@gem_eio@in-flight-1us:
    - shard-tglb:         [TIMEOUT][105] ([i915#3063]) -> [PASS][106]
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-tglb1/igt@gem_eio@in-flight-1us.html
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb1/igt@gem_eio@in-flight-1us.html

  * igt@gem_eio@kms:
    - shard-tglb:         [FAIL][107] ([i915#232]) -> [PASS][108]
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-tglb1/igt@gem_eio@kms.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-tglb6/igt@gem_eio@kms.html

  * igt@gem_exec_balancer@parallel-contexts:
    - shard-iclb:         [SKIP][109] ([i915#4525]) -> [PASS][110]
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-iclb6/igt@gem_exec_balancer@parallel-contexts.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb4/igt@gem_exec_balancer@parallel-contexts.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
    - shard-glk:          [FAIL][111] ([i915#2842]) -> [PASS][112] +1 similar issue
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk4/igt@gem_exec_fair@basic-none-rrul@rcs0.html
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk5/igt@gem_exec_fair@basic-none-rrul@rcs0.html

  * igt@gem_exec_fair@basic-none@vecs0:
    - shard-kbl:          [FAIL][113] ([i915#2842]) -> [PASS][114]
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-kbl3/igt@gem_exec_fair@basic-none@vecs0.html
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl6/igt@gem_exec_fair@basic-none@vecs0.html
    - shard-apl:          [FAIL][115] ([i915#2842]) -> [PASS][116]
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-apl4/igt@gem_exec_fair@basic-none@vecs0.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl3/igt@gem_exec_fair@basic-none@vecs0.html

  * igt@gem_exec_fair@basic-pace@bcs0:
    - shard-iclb:         [FAIL][117] ([i915#2842]) -> [PASS][118]
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-iclb8/igt@gem_exec_fair@basic-pace@bcs0.html
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb6/igt@gem_exec_fair@basic-pace@bcs0.html

  * igt@gem_exec_suspend@basic-s3@smem:
    - shard-skl:          [INCOMPLETE][119] ([i915#4547]) -> [PASS][120]
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl10/igt@gem_exec_suspend@basic-s3@smem.html
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl8/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@gem_softpin@allocator-evict-all-engines:
    - shard-glk:          [FAIL][121] ([i915#4171]) -> [PASS][122]
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk6/igt@gem_softpin@allocator-evict-all-engines.html
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk8/igt@gem_softpin@allocator-evict-all-engines.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - shard-apl:          [DMESG-WARN][123] ([i915#180]) -> [PASS][124] +3 similar issues
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-apl3/igt@i915_suspend@fence-restore-tiled2untiled.html
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-apl7/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@kms_big_fb@y-tiled-32bpp-rotate-0:
    - shard-glk:          [DMESG-WARN][125] ([i915#118]) -> [PASS][126] +1 similar issue
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk2/igt@kms_big_fb@y-tiled-32bpp-rotate-0.html
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk9/igt@kms_big_fb@y-tiled-32bpp-rotate-0.html

  * igt@kms_cursor_legacy@flip-vs-cursor-toggle:
    - shard-iclb:         [FAIL][127] ([i915#2346]) -> [PASS][128]
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-iclb7/igt@kms_cursor_legacy@flip-vs-cursor-toggle.html
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-iclb3/igt@kms_cursor_legacy@flip-vs-cursor-toggle.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-kbl:          [INCOMPLETE][129] ([i915#180] / [i915#636]) -> [PASS][130]
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-kbl4/igt@kms_fbcon_fbt@fbc-suspend.html
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl3/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a2:
    - shard-glk:          [FAIL][131] ([i915#79]) -> [PASS][132]
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk9/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a2.html
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk7/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a2.html

  * igt@kms_flip@flip-vs-expired-vblank@a-edp1:
    - shard-skl:          [FAIL][133] ([i915#79]) -> [PASS][134]
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl9/igt@kms_flip@flip-vs-expired-vblank@a-edp1.html
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl4/igt@kms_flip@flip-vs-expired-vblank@a-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [DMESG-WARN][135] ([i915#180]) -> [PASS][136] +9 similar issues
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-kbl1/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-upscaling:
    - shard-glk:          [FAIL][137] ([i915#4911]) -> [PASS][138]
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-glk8/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-upscaling.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-glk3/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-upscaling.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [FAIL][139] ([fdo#108145] / [i915#265]) -> [PASS][140] +1 similar issue
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11129/shard-skl8/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/shard-skl5/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.ht

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22087/index.html

[-- Attachment #2: Type: text/html, Size: 33566 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix up request cancel (rev3)
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
                   ` (8 preceding siblings ...)
  (?)
@ 2022-01-25  6:45 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2022-01-25  6:45 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: Fix up request cancel (rev3)
URL   : https://patchwork.freedesktop.org/series/99173/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
b01d46d0bed9 drm/i915: Add request cancel low level trace point
bf264f6cff19 drm/i915/guc: Cancel requests immediately
0902137e128a drm/i915/execlists: Fix execlists request cancellation corner case
-:85: CHECK:LINE_SPACING: Please don't use multiple blank lines
#85: FILE: drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c:346:
+
+

total: 0 errors, 0 warnings, 1 checks, 135 lines checked
f3a864288387 drm/i915/selftests: Set preemption timeout to zero in cancel reset test



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Fix up request cancel (rev3)
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
                   ` (9 preceding siblings ...)
  (?)
@ 2022-01-25  6:46 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2022-01-25  6:46 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

== Series Details ==

Series: Fix up request cancel (rev3)
URL   : https://patchwork.freedesktop.org/series/99173/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for Fix up request cancel (rev3)
  2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
                   ` (10 preceding siblings ...)
  (?)
@ 2022-01-25  7:17 ` Patchwork
  -1 siblings, 0 replies; 31+ messages in thread
From: Patchwork @ 2022-01-25  7:17 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 15087 bytes --]

== Series Details ==

Series: Fix up request cancel (rev3)
URL   : https://patchwork.freedesktop.org/series/99173/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11130 -> Patchwork_22095
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_22095 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_22095, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/index.html

Participating hosts (39 -> 42)
------------------------------

  Additional (7): fi-kbl-soraka bat-dg1-6 bat-dg1-5 bat-adlp-4 bat-rpls-1 bat-jsl-2 bat-jsl-1 
  Missing    (4): fi-ctg-p8600 fi-bsw-cyan fi-hsw-4200u fi-pnv-d510 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22095:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@gtt:
    - fi-bdw-5557u:       [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11130/fi-bdw-5557u/igt@i915_selftest@live@gtt.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-bdw-5557u/igt@i915_selftest@live@gtt.html

  * igt@kms_addfb_basic@addfb25-framebuffer-vs-set-tiling:
    - bat-adlp-4:         NOTRUN -> [DMESG-WARN][3]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-adlp-4/igt@kms_addfb_basic@addfb25-framebuffer-vs-set-tiling.html

  
Known issues
------------

  Here are the changes found in Patchwork_22095 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@fbdev@info:
    - bat-dg1-6:          NOTRUN -> [SKIP][4] ([i915#2582]) +4 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@fbdev@info.html

  * igt@gem_exec_fence@basic-busy@bcs0:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][5] ([fdo#109271]) +8 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-kbl-soraka/igt@gem_exec_fence@basic-busy@bcs0.html

  * igt@gem_exec_gttfill@basic:
    - bat-dg1-6:          NOTRUN -> [SKIP][6] ([i915#4086])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@gem_exec_gttfill@basic.html
    - bat-dg1-5:          NOTRUN -> [SKIP][7] ([i915#4086])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@gem_exec_gttfill@basic.html

  * igt@gem_huc_copy@huc-copy:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][8] ([fdo#109271] / [i915#2190])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@basic:
    - bat-adlp-4:         NOTRUN -> [SKIP][9] ([i915#4613]) +3 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-adlp-4/igt@gem_lmem_swapping@basic.html
    - fi-kbl-soraka:      NOTRUN -> [SKIP][10] ([fdo#109271] / [i915#4613]) +3 similar issues
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-kbl-soraka/igt@gem_lmem_swapping@basic.html

  * igt@gem_mmap@basic:
    - bat-dg1-6:          NOTRUN -> [SKIP][11] ([i915#4083])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@gem_mmap@basic.html
    - bat-dg1-5:          NOTRUN -> [SKIP][12] ([i915#4083])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@gem_mmap@basic.html

  * igt@gem_tiled_blits@basic:
    - bat-dg1-6:          NOTRUN -> [SKIP][13] ([i915#4077]) +2 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@gem_tiled_blits@basic.html

  * igt@gem_tiled_fence_blits@basic:
    - bat-dg1-5:          NOTRUN -> [SKIP][14] ([i915#4077]) +2 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@gem_tiled_fence_blits@basic.html

  * igt@gem_tiled_pread_basic:
    - bat-dg1-5:          NOTRUN -> [SKIP][15] ([i915#4079]) +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@gem_tiled_pread_basic.html
    - bat-adlp-4:         NOTRUN -> [SKIP][16] ([i915#3282])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-adlp-4/igt@gem_tiled_pread_basic.html
    - bat-dg1-6:          NOTRUN -> [SKIP][17] ([i915#4079]) +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@gem_tiled_pread_basic.html

  * igt@i915_pm_backlight@basic-brightness:
    - bat-dg1-5:          NOTRUN -> [SKIP][18] ([i915#1155])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@i915_pm_backlight@basic-brightness.html
    - bat-dg1-6:          NOTRUN -> [SKIP][19] ([i915#1155])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][20] ([i915#1886] / [i915#2291])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-5:          NOTRUN -> [DMESG-FAIL][21] ([i915#4494] / [i915#4957])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@i915_selftest@live@hangcheck.html
    - fi-hsw-4770:        [PASS][22] -> [INCOMPLETE][23] ([i915#3303])
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11130/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-hsw-4770/igt@i915_selftest@live@hangcheck.html
    - bat-dg1-6:          NOTRUN -> [DMESG-FAIL][24] ([i915#4494])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@kms_addfb_basic@addfb25-x-tiled-legacy:
    - bat-dg1-6:          NOTRUN -> [SKIP][25] ([i915#4212]) +7 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@kms_addfb_basic@addfb25-x-tiled-legacy.html

  * igt@kms_addfb_basic@basic-y-tiled-legacy:
    - bat-dg1-5:          NOTRUN -> [SKIP][26] ([i915#4215])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@kms_addfb_basic@basic-y-tiled-legacy.html
    - bat-dg1-6:          NOTRUN -> [SKIP][27] ([i915#4215])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@kms_addfb_basic@basic-y-tiled-legacy.html

  * igt@kms_addfb_basic@tile-pitch-mismatch:
    - bat-dg1-5:          NOTRUN -> [SKIP][28] ([i915#4212]) +7 similar issues
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@kms_addfb_basic@tile-pitch-mismatch.html

  * igt@kms_busy@basic:
    - bat-dg1-6:          NOTRUN -> [SKIP][29] ([i915#4303])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@kms_busy@basic.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][30] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-kbl-soraka/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@hdmi-edid-read:
    - bat-dg1-6:          NOTRUN -> [SKIP][31] ([fdo#111827]) +8 similar issues
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - bat-dg1-5:          NOTRUN -> [SKIP][32] ([fdo#111827]) +8 similar issues
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - bat-dg1-6:          NOTRUN -> [SKIP][33] ([i915#4103] / [i915#4213]) +1 similar issue
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - bat-dg1-5:          NOTRUN -> [SKIP][34] ([i915#4103] / [i915#4213]) +1 similar issue
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-legacy:
    - bat-dg1-6:          NOTRUN -> [SKIP][35] ([i915#4078]) +24 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@kms_cursor_legacy@basic-flip-after-cursor-legacy.html

  * igt@kms_force_connector_basic@force-load-detect:
    - bat-dg1-6:          NOTRUN -> [SKIP][36] ([fdo#109285])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@kms_force_connector_basic@force-load-detect.html
    - bat-dg1-5:          NOTRUN -> [SKIP][37] ([fdo#109285])
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@kms_force_connector_basic@force-load-detect.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [PASS][38] -> [DMESG-WARN][39] ([i915#4269])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11130/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][40] ([fdo#109271] / [i915#533])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-kbl-soraka/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_psr@primary_page_flip:
    - bat-dg1-5:          NOTRUN -> [SKIP][41] ([i915#1072] / [i915#4078]) +3 similar issues
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@kms_psr@primary_page_flip.html

  * igt@kms_psr@sprite_plane_onoff:
    - bat-dg1-6:          NOTRUN -> [SKIP][42] ([i915#1072] / [i915#4078]) +3 similar issues
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@kms_psr@sprite_plane_onoff.html

  * igt@prime_vgem@basic-fence-mmap:
    - bat-dg1-5:          NOTRUN -> [SKIP][43] ([i915#3708] / [i915#4077]) +1 similar issue
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@prime_vgem@basic-fence-mmap.html

  * igt@prime_vgem@basic-gtt:
    - bat-dg1-6:          NOTRUN -> [SKIP][44] ([i915#3708] / [i915#4077]) +1 similar issue
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@prime_vgem@basic-gtt.html

  * igt@prime_vgem@basic-userptr:
    - bat-dg1-6:          NOTRUN -> [SKIP][45] ([i915#3708] / [i915#4873])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@prime_vgem@basic-userptr.html
    - bat-dg1-5:          NOTRUN -> [SKIP][46] ([i915#3708] / [i915#4873])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@prime_vgem@basic-userptr.html

  * igt@prime_vgem@basic-write:
    - bat-dg1-5:          NOTRUN -> [SKIP][47] ([i915#3708]) +3 similar issues
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-5/igt@prime_vgem@basic-write.html
    - bat-dg1-6:          NOTRUN -> [SKIP][48] ([i915#3708]) +3 similar issues
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-dg1-6/igt@prime_vgem@basic-write.html

  * igt@runner@aborted:
    - fi-hsw-4770:        NOTRUN -> [FAIL][49] ([fdo#109271] / [i915#1436] / [i915#4312])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/fi-hsw-4770/igt@runner@aborted.html
    - bat-adlp-4:         NOTRUN -> [FAIL][50] ([i915#4312])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/bat-adlp-4/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1155]: https://gitlab.freedesktop.org/drm/intel/issues/1155
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2291]: https://gitlab.freedesktop.org/drm/intel/issues/2291
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#3003]: https://gitlab.freedesktop.org/drm/intel/issues/3003
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4086]: https://gitlab.freedesktop.org/drm/intel/issues/4086
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4212]: https://gitlab.freedesktop.org/drm/intel/issues/4212
  [i915#4213]: https://gitlab.freedesktop.org/drm/intel/issues/4213
  [i915#4215]: https://gitlab.freedesktop.org/drm/intel/issues/4215
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269
  [i915#4303]: https://gitlab.freedesktop.org/drm/intel/issues/4303
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4873]: https://gitlab.freedesktop.org/drm/intel/issues/4873
  [i915#4898]: https://gitlab.freedesktop.org/drm/intel/issues/4898
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533


Build changes
-------------

  * Linux: CI_DRM_11130 -> Patchwork_22095

  CI-20190529: 20190529
  CI_DRM_11130: 6499314d1291ec97b2dd610fe981cccfeb2f743e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6331: 74fc362b425cf92cf41a29210980b25966dd5951 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22095: f3a864288387e6d1394ae14cf08da2da08ea3974 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

f3a864288387 drm/i915/selftests: Set preemption timeout to zero in cancel reset test
0902137e128a drm/i915/execlists: Fix execlists request cancellation corner case
bf264f6cff19 drm/i915/guc: Cancel requests immediately
b01d46d0bed9 drm/i915: Add request cancel low level trace point

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22095/index.html

[-- Attachment #2: Type: text/html, Size: 18705 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/i915: Add request cancel low level trace point
  2022-01-24 15:01   ` [Intel-gfx] " Matthew Brost
  (?)
@ 2022-01-25 12:27   ` Tvrtko Ursulin
  2022-01-25 16:39     ` Matthew Brost
  -1 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2022-01-25 12:27 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 24/01/2022 15:01, Matthew Brost wrote:
> Add request cancel trace point guarded by
> CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINT.

Okay-ish I guess (There is pr_notice with the only real caller, but I 
suppose you want it for selftests? Oh yes, why is missing from the 
commit message.), but I'd emit it from i915_request_cancel since that's 
what the tracepoint is called so it fits.

Regards,

Tvrtko

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.h |  1 +
>   drivers/gpu/drm/i915/i915_trace.h       | 10 ++++++++++
>   2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index d8c74bbf9aae2..3aed4d77f116c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -124,6 +124,7 @@ intel_context_is_pinned(struct intel_context *ce)
>   static inline void intel_context_cancel_request(struct intel_context *ce,
>   						struct i915_request *rq)
>   {
> +	trace_i915_request_cancel(rq);
>   	GEM_BUG_ON(!ce->ops->cancel_request);
>   	return ce->ops->cancel_request(ce, rq);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 37b5c9e9d260e..d0a11a8bb0ca3 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -324,6 +324,11 @@ DEFINE_EVENT(i915_request, i915_request_add,
>   );
>   
>   #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
> +DEFINE_EVENT(i915_request, i915_request_cancel,
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
> +);
> +
>   DEFINE_EVENT(i915_request, i915_request_guc_submit,
>   	     TP_PROTO(struct i915_request *rq),
>   	     TP_ARGS(rq)
> @@ -497,6 +502,11 @@ DEFINE_EVENT(intel_context, intel_context_do_unpin,
>   
>   #else
>   #if !defined(TRACE_HEADER_MULTI_READ)
> +static inline void
> +trace_i915_request_cancel(struct i915_request *rq)
> +{
> +}
> +
>   static inline void
>   trace_i915_request_guc_submit(struct i915_request *rq)
>   {
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
  2022-01-24 15:01   ` [Intel-gfx] " Matthew Brost
  (?)
@ 2022-01-25 15:27   ` Tvrtko Ursulin
  2022-01-25 16:32     ` Matthew Brost
  -1 siblings, 1 reply; 31+ messages in thread
From: Tvrtko Ursulin @ 2022-01-25 15:27 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 24/01/2022 15:01, Matthew Brost wrote:
> More than 1 request can be submitted to a single ELSP at a time if
> multiple requests are ready run to on the same context. When a request
> is canceled it is marked bad, an idle pulse is triggered to the engine
> (high priority kernel request), the execlists scheduler sees that
> running request is bad and sets preemption timeout to minimum value (1
> ms). This fails to work if multiple requests are combined on the ELSP as
> only the most recent request is stored in the execlists schedule (the
> request stored in the ELSP isn't marked bad, thus preemption timeout
> isn't set to the minimum value). If the preempt timeout is configured to
> zero, the engine is permanently hung. This is shown by an upcoming
> selftest.
> 
> To work around this, mark the idle pulse with a flag to force a preempt
> with the minimum value.

A couple of quick questions first before I find time to dig deeper.

First about the "permanently hung" statement. How permanent? Does the 
heartbeat eventually resolve it and if not why not? Naive view is that 
missed heartbeats would identify the stuck non-preemptible request and 
then engine reset would skip over it.

If it does resolve, then the problem is only that request timeout works 
less well if someone set preempt timeout to zero? Which may not be as 
bad, since request timeout was never about any time guarantees.

Regards,

Tvrtko

> 
> Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
>   .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
>   drivers/gpu/drm/i915/i915_request.h           |  6 +++++
>   4 files changed, 38 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index a3698f611f457..efd1c719b4072 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
>   	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
>   }
>   
> -static int __intel_engine_pulse(struct intel_engine_cs *engine)
> +static int __intel_engine_pulse(struct intel_engine_cs *engine,
> +				bool force_preempt)
>   {
>   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
>   	struct intel_context *ce = engine->kernel_context;
> @@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
>   		return PTR_ERR(rq);
>   
>   	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> +	if (force_preempt)
> +		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
>   
>   	heartbeat_commit(rq, &attr);
>   	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
> @@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>   
>   		/* recheck current execution */
>   		if (intel_engine_has_preemption(engine)) {
> -			err = __intel_engine_pulse(engine);
> +			err = __intel_engine_pulse(engine, false);
>   			if (err)
>   				set_heartbeat(engine, saved);
>   		}
> @@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>   	return err;
>   }
>   
> -int intel_engine_pulse(struct intel_engine_cs *engine)
> +static int _intel_engine_pulse(struct intel_engine_cs *engine,
> +			       bool force_preempt)
>   {
>   	struct intel_context *ce = engine->kernel_context;
>   	int err;
> @@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
>   
>   	err = -EINTR;
>   	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
> -		err = __intel_engine_pulse(engine);
> +		err = __intel_engine_pulse(engine, force_preempt);
>   		mutex_unlock(&ce->timeline->mutex);
>   	}
>   
> @@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
>   	return err;
>   }
>   
> +int intel_engine_pulse(struct intel_engine_cs *engine)
> +{
> +	return _intel_engine_pulse(engine, false);
> +}
> +
> +
> +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
> +{
> +	return _intel_engine_pulse(engine, true);
> +}
> +
>   int intel_engine_flush_barriers(struct intel_engine_cs *engine)
>   {
>   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> index 5da6d809a87a2..d9c8386754cb3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> @@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
>   void intel_gt_unpark_heartbeats(struct intel_gt *gt);
>   
>   int intel_engine_pulse(struct intel_engine_cs *engine);
> +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
>   int intel_engine_flush_barriers(struct intel_engine_cs *engine);
>   
>   #endif /* INTEL_ENGINE_HEARTBEAT_H */
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 960a9aaf4f3a3..f0c2024058731 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
>   }
>   
>   static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
> -					    const struct i915_request *rq)
> +					    const struct i915_request *rq,
> +					    bool force_preempt)
>   {
>   	if (!rq)
>   		return 0;
>   
>   	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
> -	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
> +	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
> +		     force_preempt))
>   		return 1;
>   
>   	return READ_ONCE(engine->props.preempt_timeout_ms);
>   }
>   
>   static void set_preempt_timeout(struct intel_engine_cs *engine,
> -				const struct i915_request *rq)
> +				const struct i915_request *rq,
> +				bool force_preempt)
>   {
>   	if (!intel_engine_has_preempt_reset(engine))
>   		return;
>   
>   	set_timer_ms(&engine->execlists.preempt,
> -		     active_preempt_timeout(engine, rq));
> +		     active_preempt_timeout(engine, rq, force_preempt));
>   }
>   
>   static bool completed(const struct i915_request *rq)
> @@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	    memcmp(active,
>   		   execlists->pending,
>   		   (port - execlists->pending) * sizeof(*port))) {
> +		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
> +					      &last->fence.flags);
> +
>   		*port = NULL;
>   		while (port-- != execlists->pending)
>   			execlists_schedule_in(*port, port - execlists->pending);
>   
>   		WRITE_ONCE(execlists->yield, -1);
> -		set_preempt_timeout(engine, *active);
> +		set_preempt_timeout(engine, *active, force_preempt);
>   		execlists_submit_ports(engine);
>   	} else {
>   		ring_set_paused(engine, 0);
> @@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
>   
>   	i915_request_active_engine(rq, &engine);
>   
> -	if (engine && intel_engine_pulse(engine))
> +	if (engine && intel_engine_pulse_force_preempt(engine))
>   		intel_gt_handle_error(engine->gt, engine->mask, 0,
>   				      "request cancellation by %s",
>   				      current->comm);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 28b1f9db54875..7e6312233d4c7 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -170,6 +170,12 @@ enum {
>   	 * fence (dma_fence_array) and i915 generated for parallel submission.
>   	 */
>   	I915_FENCE_FLAG_COMPOSITE,
> +
> +	/*
> +	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
> +	 * of preempt timeout configuration
> +	 */
> +	I915_FENCE_FLAG_FORCE_PREEMPT,
>   };
>   
>   /**
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
  2022-01-25 15:27   ` Tvrtko Ursulin
@ 2022-01-25 16:32     ` Matthew Brost
  2022-01-26 10:38       ` Tvrtko Ursulin
  0 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2022-01-25 16:32 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, dri-devel

On Tue, Jan 25, 2022 at 03:27:31PM +0000, Tvrtko Ursulin wrote:
> 
> On 24/01/2022 15:01, Matthew Brost wrote:
> > More than 1 request can be submitted to a single ELSP at a time if
> > multiple requests are ready run to on the same context. When a request
> > is canceled it is marked bad, an idle pulse is triggered to the engine
> > (high priority kernel request), the execlists scheduler sees that
> > running request is bad and sets preemption timeout to minimum value (1
> > ms). This fails to work if multiple requests are combined on the ELSP as
> > only the most recent request is stored in the execlists schedule (the
> > request stored in the ELSP isn't marked bad, thus preemption timeout
> > isn't set to the minimum value). If the preempt timeout is configured to
> > zero, the engine is permanently hung. This is shown by an upcoming
> > selftest.
> > 
> > To work around this, mark the idle pulse with a flag to force a preempt
> > with the minimum value.
> 
> A couple of quick questions first before I find time to dig deeper.
> 
> First about the "permanently hung" statement. How permanent? Does the
> heartbeat eventually resolve it and if not why not? Naive view is that
> missed heartbeats would identify the stuck non-preemptible request and then
> engine reset would skip over it.
> 

Yes, if the heartbeat is enabled then the heartbeat would eventually
recover the engine. It is not always enabled though...

> If it does resolve, then the problem is only that request timeout works less
> well if someone set preempt timeout to zero? Which may not be as bad, since
> request timeout was never about any time guarantees.
>

Yes, if the heartbeat is enabled the problem isn't as bad.

Matt

> Regards,
> 
> Tvrtko
> 
> > 
> > Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
> >   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
> >   .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
> >   drivers/gpu/drm/i915/i915_request.h           |  6 +++++
> >   4 files changed, 38 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > index a3698f611f457..efd1c719b4072 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > @@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
> >   	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
> >   }
> > -static int __intel_engine_pulse(struct intel_engine_cs *engine)
> > +static int __intel_engine_pulse(struct intel_engine_cs *engine,
> > +				bool force_preempt)
> >   {
> >   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
> >   	struct intel_context *ce = engine->kernel_context;
> > @@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
> >   		return PTR_ERR(rq);
> >   	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> > +	if (force_preempt)
> > +		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
> >   	heartbeat_commit(rq, &attr);
> >   	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
> > @@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
> >   		/* recheck current execution */
> >   		if (intel_engine_has_preemption(engine)) {
> > -			err = __intel_engine_pulse(engine);
> > +			err = __intel_engine_pulse(engine, false);
> >   			if (err)
> >   				set_heartbeat(engine, saved);
> >   		}
> > @@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
> >   	return err;
> >   }
> > -int intel_engine_pulse(struct intel_engine_cs *engine)
> > +static int _intel_engine_pulse(struct intel_engine_cs *engine,
> > +			       bool force_preempt)
> >   {
> >   	struct intel_context *ce = engine->kernel_context;
> >   	int err;
> > @@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
> >   	err = -EINTR;
> >   	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
> > -		err = __intel_engine_pulse(engine);
> > +		err = __intel_engine_pulse(engine, force_preempt);
> >   		mutex_unlock(&ce->timeline->mutex);
> >   	}
> > @@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
> >   	return err;
> >   }
> > +int intel_engine_pulse(struct intel_engine_cs *engine)
> > +{
> > +	return _intel_engine_pulse(engine, false);
> > +}
> > +
> > +
> > +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
> > +{
> > +	return _intel_engine_pulse(engine, true);
> > +}
> > +
> >   int intel_engine_flush_barriers(struct intel_engine_cs *engine)
> >   {
> >   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > index 5da6d809a87a2..d9c8386754cb3 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > @@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
> >   void intel_gt_unpark_heartbeats(struct intel_gt *gt);
> >   int intel_engine_pulse(struct intel_engine_cs *engine);
> > +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
> >   int intel_engine_flush_barriers(struct intel_engine_cs *engine);
> >   #endif /* INTEL_ENGINE_HEARTBEAT_H */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 960a9aaf4f3a3..f0c2024058731 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
> >   }
> >   static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
> > -					    const struct i915_request *rq)
> > +					    const struct i915_request *rq,
> > +					    bool force_preempt)
> >   {
> >   	if (!rq)
> >   		return 0;
> >   	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
> > -	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
> > +	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
> > +		     force_preempt))
> >   		return 1;
> >   	return READ_ONCE(engine->props.preempt_timeout_ms);
> >   }
> >   static void set_preempt_timeout(struct intel_engine_cs *engine,
> > -				const struct i915_request *rq)
> > +				const struct i915_request *rq,
> > +				bool force_preempt)
> >   {
> >   	if (!intel_engine_has_preempt_reset(engine))
> >   		return;
> >   	set_timer_ms(&engine->execlists.preempt,
> > -		     active_preempt_timeout(engine, rq));
> > +		     active_preempt_timeout(engine, rq, force_preempt));
> >   }
> >   static bool completed(const struct i915_request *rq)
> > @@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >   	    memcmp(active,
> >   		   execlists->pending,
> >   		   (port - execlists->pending) * sizeof(*port))) {
> > +		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
> > +					      &last->fence.flags);
> > +
> >   		*port = NULL;
> >   		while (port-- != execlists->pending)
> >   			execlists_schedule_in(*port, port - execlists->pending);
> >   		WRITE_ONCE(execlists->yield, -1);
> > -		set_preempt_timeout(engine, *active);
> > +		set_preempt_timeout(engine, *active, force_preempt);
> >   		execlists_submit_ports(engine);
> >   	} else {
> >   		ring_set_paused(engine, 0);
> > @@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
> >   	i915_request_active_engine(rq, &engine);
> > -	if (engine && intel_engine_pulse(engine))
> > +	if (engine && intel_engine_pulse_force_preempt(engine))
> >   		intel_gt_handle_error(engine->gt, engine->mask, 0,
> >   				      "request cancellation by %s",
> >   				      current->comm);
> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index 28b1f9db54875..7e6312233d4c7 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -170,6 +170,12 @@ enum {
> >   	 * fence (dma_fence_array) and i915 generated for parallel submission.
> >   	 */
> >   	I915_FENCE_FLAG_COMPOSITE,
> > +
> > +	/*
> > +	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
> > +	 * of preempt timeout configuration
> > +	 */
> > +	I915_FENCE_FLAG_FORCE_PREEMPT,
> >   };
> >   /**
> > 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/i915: Add request cancel low level trace point
  2022-01-25 12:27   ` Tvrtko Ursulin
@ 2022-01-25 16:39     ` Matthew Brost
  2022-01-26 10:29       ` Tvrtko Ursulin
  0 siblings, 1 reply; 31+ messages in thread
From: Matthew Brost @ 2022-01-25 16:39 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, dri-devel

On Tue, Jan 25, 2022 at 12:27:43PM +0000, Tvrtko Ursulin wrote:
> 
> On 24/01/2022 15:01, Matthew Brost wrote:
> > Add request cancel trace point guarded by
> > CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINT.
> 
> Okay-ish I guess (There is pr_notice with the only real caller, but I
> suppose you want it for selftests? Oh yes, why is missing from the commit
> message.), but I'd emit it from i915_request_cancel since that's what the
> tracepoint is called so it fits.
> 

I had this tracepoint at one point but somehow during the upstreaming it
got lost. Noticed when debugging the below issue this tracepoint wasn't
present, so I brought it back in.

I generally use low level tracepoints for debug, so a pr_notice doesn't
really help there.

Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4960

Matt

> Regards,
> 
> Tvrtko
> 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.h |  1 +
> >   drivers/gpu/drm/i915/i915_trace.h       | 10 ++++++++++
> >   2 files changed, 11 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> > index d8c74bbf9aae2..3aed4d77f116c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -124,6 +124,7 @@ intel_context_is_pinned(struct intel_context *ce)
> >   static inline void intel_context_cancel_request(struct intel_context *ce,
> >   						struct i915_request *rq)
> >   {
> > +	trace_i915_request_cancel(rq);
> >   	GEM_BUG_ON(!ce->ops->cancel_request);
> >   	return ce->ops->cancel_request(ce, rq);
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 37b5c9e9d260e..d0a11a8bb0ca3 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -324,6 +324,11 @@ DEFINE_EVENT(i915_request, i915_request_add,
> >   );
> >   #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
> > +DEFINE_EVENT(i915_request, i915_request_cancel,
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> > +);
> > +
> >   DEFINE_EVENT(i915_request, i915_request_guc_submit,
> >   	     TP_PROTO(struct i915_request *rq),
> >   	     TP_ARGS(rq)
> > @@ -497,6 +502,11 @@ DEFINE_EVENT(intel_context, intel_context_do_unpin,
> >   #else
> >   #if !defined(TRACE_HEADER_MULTI_READ)
> > +static inline void
> > +trace_i915_request_cancel(struct i915_request *rq)
> > +{
> > +}
> > +
> >   static inline void
> >   trace_i915_request_guc_submit(struct i915_request *rq)
> >   {
> > 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] drm/i915: Add request cancel low level trace point
  2022-01-25 16:39     ` Matthew Brost
@ 2022-01-26 10:29       ` Tvrtko Ursulin
  0 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2022-01-26 10:29 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel


On 25/01/2022 16:39, Matthew Brost wrote:
> On Tue, Jan 25, 2022 at 12:27:43PM +0000, Tvrtko Ursulin wrote:
>>
>> On 24/01/2022 15:01, Matthew Brost wrote:
>>> Add request cancel trace point guarded by
>>> CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINT.
>>
>> Okay-ish I guess (There is pr_notice with the only real caller, but I
>> suppose you want it for selftests? Oh yes, why is missing from the commit
>> message.), but I'd emit it from i915_request_cancel since that's what the
>> tracepoint is called so it fits.
>>
> 
> I had this tracepoint at one point but somehow during the upstreaming it
> got lost. Noticed when debugging the below issue this tracepoint wasn't
> present, so I brought it back in.
> 
> I generally use low level tracepoints for debug, so a pr_notice doesn't
> really help there.
> 
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4960

This was a GuC backend bug right? And log shows this:

<7> [275.431050] [drm:eb_lookup_vmas [i915]] EINVAL at eb_validate_vma:514
<5> [295.433920] Fence expiration time out i915-0000:03:00.0:kms_vblank[1038]:2!
<3> [332.736763] INFO: task kworker/2:1:55 blocked for more than 30 seconds.

So pr_notice worked. I am not opposed to the tracepoint just put a solid why in the commit message and if the tracepoint is called i915_request_cancel it should be emitted from i915_request_cancel().

Regards,

Tvrtko

> 
> Matt
> 
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_context.h |  1 +
>>>    drivers/gpu/drm/i915/i915_trace.h       | 10 ++++++++++
>>>    2 files changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
>>> index d8c74bbf9aae2..3aed4d77f116c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
>>> @@ -124,6 +124,7 @@ intel_context_is_pinned(struct intel_context *ce)
>>>    static inline void intel_context_cancel_request(struct intel_context *ce,
>>>    						struct i915_request *rq)
>>>    {
>>> +	trace_i915_request_cancel(rq);
>>>    	GEM_BUG_ON(!ce->ops->cancel_request);
>>>    	return ce->ops->cancel_request(ce, rq);
>>>    }
>>> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
>>> index 37b5c9e9d260e..d0a11a8bb0ca3 100644
>>> --- a/drivers/gpu/drm/i915/i915_trace.h
>>> +++ b/drivers/gpu/drm/i915/i915_trace.h
>>> @@ -324,6 +324,11 @@ DEFINE_EVENT(i915_request, i915_request_add,
>>>    );
>>>    #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
>>> +DEFINE_EVENT(i915_request, i915_request_cancel,
>>> +	     TP_PROTO(struct i915_request *rq),
>>> +	     TP_ARGS(rq)
>>> +);
>>> +
>>>    DEFINE_EVENT(i915_request, i915_request_guc_submit,
>>>    	     TP_PROTO(struct i915_request *rq),
>>>    	     TP_ARGS(rq)
>>> @@ -497,6 +502,11 @@ DEFINE_EVENT(intel_context, intel_context_do_unpin,
>>>    #else
>>>    #if !defined(TRACE_HEADER_MULTI_READ)
>>> +static inline void
>>> +trace_i915_request_cancel(struct i915_request *rq)
>>> +{
>>> +}
>>> +
>>>    static inline void
>>>    trace_i915_request_guc_submit(struct i915_request *rq)
>>>    {
>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
  2022-01-25 16:32     ` Matthew Brost
@ 2022-01-26 10:38       ` Tvrtko Ursulin
  0 siblings, 0 replies; 31+ messages in thread
From: Tvrtko Ursulin @ 2022-01-26 10:38 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel


On 25/01/2022 16:32, Matthew Brost wrote:
> On Tue, Jan 25, 2022 at 03:27:31PM +0000, Tvrtko Ursulin wrote:
>>
>> On 24/01/2022 15:01, Matthew Brost wrote:
>>> More than 1 request can be submitted to a single ELSP at a time if
>>> multiple requests are ready run to on the same context. When a request
>>> is canceled it is marked bad, an idle pulse is triggered to the engine
>>> (high priority kernel request), the execlists scheduler sees that
>>> running request is bad and sets preemption timeout to minimum value (1
>>> ms). This fails to work if multiple requests are combined on the ELSP as
>>> only the most recent request is stored in the execlists schedule (the
>>> request stored in the ELSP isn't marked bad, thus preemption timeout
>>> isn't set to the minimum value). If the preempt timeout is configured to
>>> zero, the engine is permanently hung. This is shown by an upcoming
>>> selftest.
>>>
>>> To work around this, mark the idle pulse with a flag to force a preempt
>>> with the minimum value.
>>
>> A couple of quick questions first before I find time to dig deeper.
>>
>> First about the "permanently hung" statement. How permanent? Does the
>> heartbeat eventually resolve it and if not why not? Naive view is that
>> missed heartbeats would identify the stuck non-preemptible request and then
>> engine reset would skip over it.
>>
> 
> Yes, if the heartbeat is enabled then the heartbeat would eventually
> recover the engine. It is not always enabled though...
> 
>> If it does resolve, then the problem is only that request timeout works less
>> well if someone set preempt timeout to zero? Which may not be as bad, since
>> request timeout was never about any time guarantees.
>>
> 
> Yes, if the heartbeat is enabled the problem isn't as bad.

Good, so commit message needs some work to be accurate.

On the technical side of the patch it looks reasonable to me. And the 
idea that cancellation pulse is made special also sounds plausible. 
Question is whether we want to add code to support this considering the 
opens I have:

1)
Do we care about request cancellation working for non-preemptible 
batches, *if* someone explicitly disabled both preemption timeout and 
hearbteats?

2)
Do we care to improve the responsiveness of request cancellation if only 
preempt timeout was disabled?

Conclusions here will also dictate whether Fixes: tag is warranted. Best 
to avoid hairy backports if we decide it is not really needed.

Also, in the next patch you change one selftest to only run with preempt 
timeout disabled. I think it makes sense to have this test run in the 
default config (preempt timeout enabled) to reflect the typical 
configuration. You may add a second pass with it disabled to execise the 
corner case, again, depending on conclusions after above questions.

Regards,

Tvrtko

> 
> Matt
> 
>> Regards,
>>
>> Tvrtko
>>
>>>
>>> Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
>>>    .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
>>>    .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
>>>    drivers/gpu/drm/i915/i915_request.h           |  6 +++++
>>>    4 files changed, 38 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> index a3698f611f457..efd1c719b4072 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
>>> @@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
>>>    	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
>>>    }
>>> -static int __intel_engine_pulse(struct intel_engine_cs *engine)
>>> +static int __intel_engine_pulse(struct intel_engine_cs *engine,
>>> +				bool force_preempt)
>>>    {
>>>    	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
>>>    	struct intel_context *ce = engine->kernel_context;
>>> @@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
>>>    		return PTR_ERR(rq);
>>>    	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
>>> +	if (force_preempt)
>>> +		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
>>>    	heartbeat_commit(rq, &attr);
>>>    	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
>>> @@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>>>    		/* recheck current execution */
>>>    		if (intel_engine_has_preemption(engine)) {
>>> -			err = __intel_engine_pulse(engine);
>>> +			err = __intel_engine_pulse(engine, false);
>>>    			if (err)
>>>    				set_heartbeat(engine, saved);
>>>    		}
>>> @@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>>>    	return err;
>>>    }
>>> -int intel_engine_pulse(struct intel_engine_cs *engine)
>>> +static int _intel_engine_pulse(struct intel_engine_cs *engine,
>>> +			       bool force_preempt)
>>>    {
>>>    	struct intel_context *ce = engine->kernel_context;
>>>    	int err;
>>> @@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
>>>    	err = -EINTR;
>>>    	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
>>> -		err = __intel_engine_pulse(engine);
>>> +		err = __intel_engine_pulse(engine, force_preempt);
>>>    		mutex_unlock(&ce->timeline->mutex);
>>>    	}
>>> @@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
>>>    	return err;
>>>    }
>>> +int intel_engine_pulse(struct intel_engine_cs *engine)
>>> +{
>>> +	return _intel_engine_pulse(engine, false);
>>> +}
>>> +
>>> +
>>> +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
>>> +{
>>> +	return _intel_engine_pulse(engine, true);
>>> +}
>>> +
>>>    int intel_engine_flush_barriers(struct intel_engine_cs *engine)
>>>    {
>>>    	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
>>> index 5da6d809a87a2..d9c8386754cb3 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
>>> @@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
>>>    void intel_gt_unpark_heartbeats(struct intel_gt *gt);
>>>    int intel_engine_pulse(struct intel_engine_cs *engine);
>>> +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
>>>    int intel_engine_flush_barriers(struct intel_engine_cs *engine);
>>>    #endif /* INTEL_ENGINE_HEARTBEAT_H */
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> index 960a9aaf4f3a3..f0c2024058731 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> @@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
>>>    }
>>>    static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
>>> -					    const struct i915_request *rq)
>>> +					    const struct i915_request *rq,
>>> +					    bool force_preempt)
>>>    {
>>>    	if (!rq)
>>>    		return 0;
>>>    	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
>>> -	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
>>> +	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
>>> +		     force_preempt))
>>>    		return 1;
>>>    	return READ_ONCE(engine->props.preempt_timeout_ms);
>>>    }
>>>    static void set_preempt_timeout(struct intel_engine_cs *engine,
>>> -				const struct i915_request *rq)
>>> +				const struct i915_request *rq,
>>> +				bool force_preempt)
>>>    {
>>>    	if (!intel_engine_has_preempt_reset(engine))
>>>    		return;
>>>    	set_timer_ms(&engine->execlists.preempt,
>>> -		     active_preempt_timeout(engine, rq));
>>> +		     active_preempt_timeout(engine, rq, force_preempt));
>>>    }
>>>    static bool completed(const struct i915_request *rq)
>>> @@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>    	    memcmp(active,
>>>    		   execlists->pending,
>>>    		   (port - execlists->pending) * sizeof(*port))) {
>>> +		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
>>> +					      &last->fence.flags);
>>> +
>>>    		*port = NULL;
>>>    		while (port-- != execlists->pending)
>>>    			execlists_schedule_in(*port, port - execlists->pending);
>>>    		WRITE_ONCE(execlists->yield, -1);
>>> -		set_preempt_timeout(engine, *active);
>>> +		set_preempt_timeout(engine, *active, force_preempt);
>>>    		execlists_submit_ports(engine);
>>>    	} else {
>>>    		ring_set_paused(engine, 0);
>>> @@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
>>>    	i915_request_active_engine(rq, &engine);
>>> -	if (engine && intel_engine_pulse(engine))
>>> +	if (engine && intel_engine_pulse_force_preempt(engine))
>>>    		intel_gt_handle_error(engine->gt, engine->mask, 0,
>>>    				      "request cancellation by %s",
>>>    				      current->comm);
>>> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
>>> index 28b1f9db54875..7e6312233d4c7 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.h
>>> +++ b/drivers/gpu/drm/i915/i915_request.h
>>> @@ -170,6 +170,12 @@ enum {
>>>    	 * fence (dma_fence_array) and i915 generated for parallel submission.
>>>    	 */
>>>    	I915_FENCE_FLAG_COMPOSITE,
>>> +
>>> +	/*
>>> +	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
>>> +	 * of preempt timeout configuration
>>> +	 */
>>> +	I915_FENCE_FLAG_FORCE_PREEMPT,
>>>    };
>>>    /**
>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/4] drm/i915/guc: Cancel requests immediately
  2022-01-24 15:01   ` [Intel-gfx] " Matthew Brost
@ 2022-01-26 18:58     ` John Harrison
  -1 siblings, 0 replies; 31+ messages in thread
From: John Harrison @ 2022-01-26 18:58 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, tvrtko.ursulin

On 1/24/2022 07:01, Matthew Brost wrote:
> Change the preemption timeout to the smallest possible value (1 us) when
> disabling scheduling to cancel a request and restore it after
> cancellation. This not only cancels the request as fast as possible, it
> fixes a bug where the preemption timeout is 0 which results in the
> schedule disable hanging forever.
Shouldn't there be an 'if' in the above statement? The pre-emption 
timeout is not normally zero.

>
> Reported-by: Jani Saarinen <jani.saarinen@intel.com>
> Fixes: 62eaf0ae217d4 ("drm/i915/guc: Support request cancellation")
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4960
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 46 +++++++++++--------
>   2 files changed, 31 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 30cd81ad8911a..730998823dbea 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -198,6 +198,11 @@ struct intel_context {
>   		 * each priority bucket
>   		 */
>   		u32 prio_count[GUC_CLIENT_PRIORITY_NUM];
> +		/**
> +		 * @preemption_timeout: preemption timeout of the context, used
> +		 * to restore this value after request cancellation
> +		 */
> +		u32 preemption_timeout;
>   	} guc_state;
>   
>   	struct {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 3918f1be114fa..966947c450253 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2147,7 +2147,8 @@ static inline u32 get_children_join_value(struct intel_context *ce,
>   	return __get_parent_scratch(ce)->join[child_index].semaphore;
>   }
>   
> -static void guc_context_policy_init(struct intel_engine_cs *engine,
> +static void guc_context_policy_init(struct intel_context *ce,
> +				    struct intel_engine_cs *engine,
>   				    struct guc_lrc_desc *desc)
Shouldn't engine be before ce? The more general structure usually goes 
first.

John.

>   {
>   	desc->policy_flags = 0;
> @@ -2157,7 +2158,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>   
>   	/* NB: For both of these, zero means disabled. */
>   	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
> -	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> +	ce->guc_state.preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> +	desc->preemption_timeout = ce->guc_state.preemption_timeout;
>   }
>   
>   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> @@ -2193,7 +2195,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   	desc->hw_context_desc = ce->lrc.lrca;
>   	desc->priority = ce->guc_state.prio;
>   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> -	guc_context_policy_init(engine, desc);
> +	guc_context_policy_init(ce, engine, desc);
>   
>   	/*
>   	 * If context is a parent, we need to register a process descriptor
> @@ -2226,7 +2228,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   			desc->hw_context_desc = child->lrc.lrca;
>   			desc->priority = ce->guc_state.prio;
>   			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> -			guc_context_policy_init(engine, desc);
> +			guc_context_policy_init(child, engine, desc);
>   		}
>   
>   		clear_children_join_go_memory(ce);
> @@ -2409,6 +2411,19 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
>   	return ce->guc_id.id;
>   }
>   
> +static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> +						 u16 guc_id,
> +						 u32 preemption_timeout)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> +		guc_id,
> +		preemption_timeout
> +	};
> +
> +	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +}
> +
>   static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
> @@ -2442,8 +2457,10 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
>   
>   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   
> -	with_intel_runtime_pm(runtime_pm, wakeref)
> +	with_intel_runtime_pm(runtime_pm, wakeref) {
> +		__guc_context_set_preemption_timeout(guc, guc_id, 1);
>   		__guc_context_sched_disable(guc, ce, guc_id);
> +	}
>   
>   	return &ce->guc_state.blocked;
>   }
> @@ -2492,8 +2509,10 @@ static void guc_context_unblock(struct intel_context *ce)
>   
>   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   
> -	if (enable) {
> -		with_intel_runtime_pm(runtime_pm, wakeref)
> +	with_intel_runtime_pm(runtime_pm, wakeref) {
> +		__guc_context_set_preemption_timeout(guc, ce->guc_id.id,
> +						     ce->guc_state.preemption_timeout);
> +		if (enable)
>   			__guc_context_sched_enable(guc, ce);
>   	}
>   }
> @@ -2521,19 +2540,6 @@ static void guc_context_cancel_request(struct intel_context *ce,
>   	}
>   }
>   
> -static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> -						 u16 guc_id,
> -						 u32 preemption_timeout)
> -{
> -	u32 action[] = {
> -		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> -		guc_id,
> -		preemption_timeout
> -	};
> -
> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> -}
> -
>   static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 2/4] drm/i915/guc: Cancel requests immediately
@ 2022-01-26 18:58     ` John Harrison
  0 siblings, 0 replies; 31+ messages in thread
From: John Harrison @ 2022-01-26 18:58 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 1/24/2022 07:01, Matthew Brost wrote:
> Change the preemption timeout to the smallest possible value (1 us) when
> disabling scheduling to cancel a request and restore it after
> cancellation. This not only cancels the request as fast as possible, it
> fixes a bug where the preemption timeout is 0 which results in the
> schedule disable hanging forever.
Shouldn't there be an 'if' in the above statement? The pre-emption 
timeout is not normally zero.

>
> Reported-by: Jani Saarinen <jani.saarinen@intel.com>
> Fixes: 62eaf0ae217d4 ("drm/i915/guc: Support request cancellation")
> Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4960
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 46 +++++++++++--------
>   2 files changed, 31 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 30cd81ad8911a..730998823dbea 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -198,6 +198,11 @@ struct intel_context {
>   		 * each priority bucket
>   		 */
>   		u32 prio_count[GUC_CLIENT_PRIORITY_NUM];
> +		/**
> +		 * @preemption_timeout: preemption timeout of the context, used
> +		 * to restore this value after request cancellation
> +		 */
> +		u32 preemption_timeout;
>   	} guc_state;
>   
>   	struct {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 3918f1be114fa..966947c450253 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2147,7 +2147,8 @@ static inline u32 get_children_join_value(struct intel_context *ce,
>   	return __get_parent_scratch(ce)->join[child_index].semaphore;
>   }
>   
> -static void guc_context_policy_init(struct intel_engine_cs *engine,
> +static void guc_context_policy_init(struct intel_context *ce,
> +				    struct intel_engine_cs *engine,
>   				    struct guc_lrc_desc *desc)
Shouldn't engine be before ce? The more general structure usually goes 
first.

John.

>   {
>   	desc->policy_flags = 0;
> @@ -2157,7 +2158,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>   
>   	/* NB: For both of these, zero means disabled. */
>   	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
> -	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> +	ce->guc_state.preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> +	desc->preemption_timeout = ce->guc_state.preemption_timeout;
>   }
>   
>   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> @@ -2193,7 +2195,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   	desc->hw_context_desc = ce->lrc.lrca;
>   	desc->priority = ce->guc_state.prio;
>   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> -	guc_context_policy_init(engine, desc);
> +	guc_context_policy_init(ce, engine, desc);
>   
>   	/*
>   	 * If context is a parent, we need to register a process descriptor
> @@ -2226,7 +2228,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   			desc->hw_context_desc = child->lrc.lrca;
>   			desc->priority = ce->guc_state.prio;
>   			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> -			guc_context_policy_init(engine, desc);
> +			guc_context_policy_init(child, engine, desc);
>   		}
>   
>   		clear_children_join_go_memory(ce);
> @@ -2409,6 +2411,19 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
>   	return ce->guc_id.id;
>   }
>   
> +static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> +						 u16 guc_id,
> +						 u32 preemption_timeout)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> +		guc_id,
> +		preemption_timeout
> +	};
> +
> +	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +}
> +
>   static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
> @@ -2442,8 +2457,10 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
>   
>   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   
> -	with_intel_runtime_pm(runtime_pm, wakeref)
> +	with_intel_runtime_pm(runtime_pm, wakeref) {
> +		__guc_context_set_preemption_timeout(guc, guc_id, 1);
>   		__guc_context_sched_disable(guc, ce, guc_id);
> +	}
>   
>   	return &ce->guc_state.blocked;
>   }
> @@ -2492,8 +2509,10 @@ static void guc_context_unblock(struct intel_context *ce)
>   
>   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   
> -	if (enable) {
> -		with_intel_runtime_pm(runtime_pm, wakeref)
> +	with_intel_runtime_pm(runtime_pm, wakeref) {
> +		__guc_context_set_preemption_timeout(guc, ce->guc_id.id,
> +						     ce->guc_state.preemption_timeout);
> +		if (enable)
>   			__guc_context_sched_enable(guc, ce);
>   	}
>   }
> @@ -2521,19 +2540,6 @@ static void guc_context_cancel_request(struct intel_context *ce,
>   	}
>   }
>   
> -static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> -						 u16 guc_id,
> -						 u32 preemption_timeout)
> -{
> -	u32 action[] = {
> -		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> -		guc_id,
> -		preemption_timeout
> -	};
> -
> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> -}
> -
>   static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
  2022-01-24 15:01   ` [Intel-gfx] " Matthew Brost
@ 2022-01-26 19:03     ` John Harrison
  -1 siblings, 0 replies; 31+ messages in thread
From: John Harrison @ 2022-01-26 19:03 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, tvrtko.ursulin

On 1/24/2022 07:01, Matthew Brost wrote:
> More than 1 request can be submitted to a single ELSP at a time if
> multiple requests are ready run to on the same context. When a request
> is canceled it is marked bad, an idle pulse is triggered to the engine
> (high priority kernel request), the execlists scheduler sees that
> running request is bad and sets preemption timeout to minimum value (1
> ms). This fails to work if multiple requests are combined on the ELSP as
> only the most recent request is stored in the execlists schedule (the
> request stored in the ELSP isn't marked bad, thus preemption timeout
> isn't set to the minimum value). If the preempt timeout is configured to
> zero, the engine is permanently hung. This is shown by an upcoming
> selftest.
>
> To work around this, mark the idle pulse with a flag to force a preempt
> with the minimum value.
>
> Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
>   .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
>   drivers/gpu/drm/i915/i915_request.h           |  6 +++++
>   4 files changed, 38 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index a3698f611f457..efd1c719b4072 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
>   	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
>   }
>   
> -static int __intel_engine_pulse(struct intel_engine_cs *engine)
> +static int __intel_engine_pulse(struct intel_engine_cs *engine,
> +				bool force_preempt)
>   {
>   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
>   	struct intel_context *ce = engine->kernel_context;
> @@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
>   		return PTR_ERR(rq);
>   
>   	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> +	if (force_preempt)
> +		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
>   
>   	heartbeat_commit(rq, &attr);
>   	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
> @@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>   
>   		/* recheck current execution */
>   		if (intel_engine_has_preemption(engine)) {
> -			err = __intel_engine_pulse(engine);
> +			err = __intel_engine_pulse(engine, false);
>   			if (err)
>   				set_heartbeat(engine, saved);
>   		}
> @@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>   	return err;
>   }
>   
> -int intel_engine_pulse(struct intel_engine_cs *engine)
> +static int _intel_engine_pulse(struct intel_engine_cs *engine,
> +			       bool force_preempt)
>   {
>   	struct intel_context *ce = engine->kernel_context;
>   	int err;
> @@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
>   
>   	err = -EINTR;
>   	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
> -		err = __intel_engine_pulse(engine);
> +		err = __intel_engine_pulse(engine, force_preempt);
>   		mutex_unlock(&ce->timeline->mutex);
>   	}
>   
> @@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
>   	return err;
>   }
>   
> +int intel_engine_pulse(struct intel_engine_cs *engine)
> +{
> +	return _intel_engine_pulse(engine, false);
> +}
> +
> +
> +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
> +{
> +	return _intel_engine_pulse(engine, true);
> +}
> +
>   int intel_engine_flush_barriers(struct intel_engine_cs *engine)
>   {
>   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> index 5da6d809a87a2..d9c8386754cb3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> @@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
>   void intel_gt_unpark_heartbeats(struct intel_gt *gt);
>   
>   int intel_engine_pulse(struct intel_engine_cs *engine);
> +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
>   int intel_engine_flush_barriers(struct intel_engine_cs *engine);
>   
>   #endif /* INTEL_ENGINE_HEARTBEAT_H */
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 960a9aaf4f3a3..f0c2024058731 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
>   }
>   
>   static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
> -					    const struct i915_request *rq)
> +					    const struct i915_request *rq,
> +					    bool force_preempt)
>   {
>   	if (!rq)
>   		return 0;
>   
>   	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
> -	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
> +	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
> +		     force_preempt))
>   		return 1;
>   
>   	return READ_ONCE(engine->props.preempt_timeout_ms);
>   }
>   
>   static void set_preempt_timeout(struct intel_engine_cs *engine,
> -				const struct i915_request *rq)
> +				const struct i915_request *rq,
> +				bool force_preempt)
>   {
>   	if (!intel_engine_has_preempt_reset(engine))
>   		return;
>   
>   	set_timer_ms(&engine->execlists.preempt,
> -		     active_preempt_timeout(engine, rq));
> +		     active_preempt_timeout(engine, rq, force_preempt));
>   }
>   
>   static bool completed(const struct i915_request *rq)
> @@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	    memcmp(active,
>   		   execlists->pending,
>   		   (port - execlists->pending) * sizeof(*port))) {
> +		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
> +					      &last->fence.flags);
> +
>   		*port = NULL;
>   		while (port-- != execlists->pending)
>   			execlists_schedule_in(*port, port - execlists->pending);
>   
>   		WRITE_ONCE(execlists->yield, -1);
> -		set_preempt_timeout(engine, *active);
> +		set_preempt_timeout(engine, *active, force_preempt);
>   		execlists_submit_ports(engine);
>   	} else {
>   		ring_set_paused(engine, 0);
> @@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
>   
>   	i915_request_active_engine(rq, &engine);
>   
> -	if (engine && intel_engine_pulse(engine))
> +	if (engine && intel_engine_pulse_force_preempt(engine))
>   		intel_gt_handle_error(engine->gt, engine->mask, 0,
>   				      "request cancellation by %s",
>   				      current->comm);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 28b1f9db54875..7e6312233d4c7 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -170,6 +170,12 @@ enum {
>   	 * fence (dma_fence_array) and i915 generated for parallel submission.
>   	 */
>   	I915_FENCE_FLAG_COMPOSITE,
> +
> +	/*
> +	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
> +	 * of preempt timeout configuration
> +	 */
> +	I915_FENCE_FLAG_FORCE_PREEMPT,
This would be execlist only? I'm a bit concerned about adding a global 
flag that cannot be implemented on current and future hardware.

John.

>   };
>   
>   /**


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
@ 2022-01-26 19:03     ` John Harrison
  0 siblings, 0 replies; 31+ messages in thread
From: John Harrison @ 2022-01-26 19:03 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 1/24/2022 07:01, Matthew Brost wrote:
> More than 1 request can be submitted to a single ELSP at a time if
> multiple requests are ready run to on the same context. When a request
> is canceled it is marked bad, an idle pulse is triggered to the engine
> (high priority kernel request), the execlists scheduler sees that
> running request is bad and sets preemption timeout to minimum value (1
> ms). This fails to work if multiple requests are combined on the ELSP as
> only the most recent request is stored in the execlists schedule (the
> request stored in the ELSP isn't marked bad, thus preemption timeout
> isn't set to the minimum value). If the preempt timeout is configured to
> zero, the engine is permanently hung. This is shown by an upcoming
> selftest.
>
> To work around this, mark the idle pulse with a flag to force a preempt
> with the minimum value.
>
> Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
>   .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
>   drivers/gpu/drm/i915/i915_request.h           |  6 +++++
>   4 files changed, 38 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index a3698f611f457..efd1c719b4072 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
>   	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
>   }
>   
> -static int __intel_engine_pulse(struct intel_engine_cs *engine)
> +static int __intel_engine_pulse(struct intel_engine_cs *engine,
> +				bool force_preempt)
>   {
>   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
>   	struct intel_context *ce = engine->kernel_context;
> @@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
>   		return PTR_ERR(rq);
>   
>   	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> +	if (force_preempt)
> +		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
>   
>   	heartbeat_commit(rq, &attr);
>   	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
> @@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>   
>   		/* recheck current execution */
>   		if (intel_engine_has_preemption(engine)) {
> -			err = __intel_engine_pulse(engine);
> +			err = __intel_engine_pulse(engine, false);
>   			if (err)
>   				set_heartbeat(engine, saved);
>   		}
> @@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
>   	return err;
>   }
>   
> -int intel_engine_pulse(struct intel_engine_cs *engine)
> +static int _intel_engine_pulse(struct intel_engine_cs *engine,
> +			       bool force_preempt)
>   {
>   	struct intel_context *ce = engine->kernel_context;
>   	int err;
> @@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
>   
>   	err = -EINTR;
>   	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
> -		err = __intel_engine_pulse(engine);
> +		err = __intel_engine_pulse(engine, force_preempt);
>   		mutex_unlock(&ce->timeline->mutex);
>   	}
>   
> @@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
>   	return err;
>   }
>   
> +int intel_engine_pulse(struct intel_engine_cs *engine)
> +{
> +	return _intel_engine_pulse(engine, false);
> +}
> +
> +
> +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
> +{
> +	return _intel_engine_pulse(engine, true);
> +}
> +
>   int intel_engine_flush_barriers(struct intel_engine_cs *engine)
>   {
>   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> index 5da6d809a87a2..d9c8386754cb3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> @@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
>   void intel_gt_unpark_heartbeats(struct intel_gt *gt);
>   
>   int intel_engine_pulse(struct intel_engine_cs *engine);
> +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
>   int intel_engine_flush_barriers(struct intel_engine_cs *engine);
>   
>   #endif /* INTEL_ENGINE_HEARTBEAT_H */
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 960a9aaf4f3a3..f0c2024058731 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
>   }
>   
>   static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
> -					    const struct i915_request *rq)
> +					    const struct i915_request *rq,
> +					    bool force_preempt)
>   {
>   	if (!rq)
>   		return 0;
>   
>   	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
> -	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
> +	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
> +		     force_preempt))
>   		return 1;
>   
>   	return READ_ONCE(engine->props.preempt_timeout_ms);
>   }
>   
>   static void set_preempt_timeout(struct intel_engine_cs *engine,
> -				const struct i915_request *rq)
> +				const struct i915_request *rq,
> +				bool force_preempt)
>   {
>   	if (!intel_engine_has_preempt_reset(engine))
>   		return;
>   
>   	set_timer_ms(&engine->execlists.preempt,
> -		     active_preempt_timeout(engine, rq));
> +		     active_preempt_timeout(engine, rq, force_preempt));
>   }
>   
>   static bool completed(const struct i915_request *rq)
> @@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	    memcmp(active,
>   		   execlists->pending,
>   		   (port - execlists->pending) * sizeof(*port))) {
> +		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
> +					      &last->fence.flags);
> +
>   		*port = NULL;
>   		while (port-- != execlists->pending)
>   			execlists_schedule_in(*port, port - execlists->pending);
>   
>   		WRITE_ONCE(execlists->yield, -1);
> -		set_preempt_timeout(engine, *active);
> +		set_preempt_timeout(engine, *active, force_preempt);
>   		execlists_submit_ports(engine);
>   	} else {
>   		ring_set_paused(engine, 0);
> @@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
>   
>   	i915_request_active_engine(rq, &engine);
>   
> -	if (engine && intel_engine_pulse(engine))
> +	if (engine && intel_engine_pulse_force_preempt(engine))
>   		intel_gt_handle_error(engine->gt, engine->mask, 0,
>   				      "request cancellation by %s",
>   				      current->comm);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 28b1f9db54875..7e6312233d4c7 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -170,6 +170,12 @@ enum {
>   	 * fence (dma_fence_array) and i915 generated for parallel submission.
>   	 */
>   	I915_FENCE_FLAG_COMPOSITE,
> +
> +	/*
> +	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
> +	 * of preempt timeout configuration
> +	 */
> +	I915_FENCE_FLAG_FORCE_PREEMPT,
This would be execlist only? I'm a bit concerned about adding a global 
flag that cannot be implemented on current and future hardware.

John.

>   };
>   
>   /**


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
  2022-01-26 19:03     ` [Intel-gfx] " John Harrison
@ 2022-01-26 20:10       ` Matthew Brost
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-26 20:10 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel, tvrtko.ursulin

On Wed, Jan 26, 2022 at 11:03:24AM -0800, John Harrison wrote:
> On 1/24/2022 07:01, Matthew Brost wrote:
> > More than 1 request can be submitted to a single ELSP at a time if
> > multiple requests are ready run to on the same context. When a request
> > is canceled it is marked bad, an idle pulse is triggered to the engine
> > (high priority kernel request), the execlists scheduler sees that
> > running request is bad and sets preemption timeout to minimum value (1
> > ms). This fails to work if multiple requests are combined on the ELSP as
> > only the most recent request is stored in the execlists schedule (the
> > request stored in the ELSP isn't marked bad, thus preemption timeout
> > isn't set to the minimum value). If the preempt timeout is configured to
> > zero, the engine is permanently hung. This is shown by an upcoming
> > selftest.
> > 
> > To work around this, mark the idle pulse with a flag to force a preempt
> > with the minimum value.
> > 
> > Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
> >   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
> >   .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
> >   drivers/gpu/drm/i915/i915_request.h           |  6 +++++
> >   4 files changed, 38 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > index a3698f611f457..efd1c719b4072 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > @@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
> >   	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
> >   }
> > -static int __intel_engine_pulse(struct intel_engine_cs *engine)
> > +static int __intel_engine_pulse(struct intel_engine_cs *engine,
> > +				bool force_preempt)
> >   {
> >   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
> >   	struct intel_context *ce = engine->kernel_context;
> > @@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
> >   		return PTR_ERR(rq);
> >   	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> > +	if (force_preempt)
> > +		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
> >   	heartbeat_commit(rq, &attr);
> >   	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
> > @@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
> >   		/* recheck current execution */
> >   		if (intel_engine_has_preemption(engine)) {
> > -			err = __intel_engine_pulse(engine);
> > +			err = __intel_engine_pulse(engine, false);
> >   			if (err)
> >   				set_heartbeat(engine, saved);
> >   		}
> > @@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
> >   	return err;
> >   }
> > -int intel_engine_pulse(struct intel_engine_cs *engine)
> > +static int _intel_engine_pulse(struct intel_engine_cs *engine,
> > +			       bool force_preempt)
> >   {
> >   	struct intel_context *ce = engine->kernel_context;
> >   	int err;
> > @@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
> >   	err = -EINTR;
> >   	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
> > -		err = __intel_engine_pulse(engine);
> > +		err = __intel_engine_pulse(engine, force_preempt);
> >   		mutex_unlock(&ce->timeline->mutex);
> >   	}
> > @@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
> >   	return err;
> >   }
> > +int intel_engine_pulse(struct intel_engine_cs *engine)
> > +{
> > +	return _intel_engine_pulse(engine, false);
> > +}
> > +
> > +
> > +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
> > +{
> > +	return _intel_engine_pulse(engine, true);
> > +}
> > +
> >   int intel_engine_flush_barriers(struct intel_engine_cs *engine)
> >   {
> >   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > index 5da6d809a87a2..d9c8386754cb3 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > @@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
> >   void intel_gt_unpark_heartbeats(struct intel_gt *gt);
> >   int intel_engine_pulse(struct intel_engine_cs *engine);
> > +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
> >   int intel_engine_flush_barriers(struct intel_engine_cs *engine);
> >   #endif /* INTEL_ENGINE_HEARTBEAT_H */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 960a9aaf4f3a3..f0c2024058731 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
> >   }
> >   static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
> > -					    const struct i915_request *rq)
> > +					    const struct i915_request *rq,
> > +					    bool force_preempt)
> >   {
> >   	if (!rq)
> >   		return 0;
> >   	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
> > -	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
> > +	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
> > +		     force_preempt))
> >   		return 1;
> >   	return READ_ONCE(engine->props.preempt_timeout_ms);
> >   }
> >   static void set_preempt_timeout(struct intel_engine_cs *engine,
> > -				const struct i915_request *rq)
> > +				const struct i915_request *rq,
> > +				bool force_preempt)
> >   {
> >   	if (!intel_engine_has_preempt_reset(engine))
> >   		return;
> >   	set_timer_ms(&engine->execlists.preempt,
> > -		     active_preempt_timeout(engine, rq));
> > +		     active_preempt_timeout(engine, rq, force_preempt));
> >   }
> >   static bool completed(const struct i915_request *rq)
> > @@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >   	    memcmp(active,
> >   		   execlists->pending,
> >   		   (port - execlists->pending) * sizeof(*port))) {
> > +		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
> > +					      &last->fence.flags);
> > +
> >   		*port = NULL;
> >   		while (port-- != execlists->pending)
> >   			execlists_schedule_in(*port, port - execlists->pending);
> >   		WRITE_ONCE(execlists->yield, -1);
> > -		set_preempt_timeout(engine, *active);
> > +		set_preempt_timeout(engine, *active, force_preempt);
> >   		execlists_submit_ports(engine);
> >   	} else {
> >   		ring_set_paused(engine, 0);
> > @@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
> >   	i915_request_active_engine(rq, &engine);
> > -	if (engine && intel_engine_pulse(engine))
> > +	if (engine && intel_engine_pulse_force_preempt(engine))
> >   		intel_gt_handle_error(engine->gt, engine->mask, 0,
> >   				      "request cancellation by %s",
> >   				      current->comm);
> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index 28b1f9db54875..7e6312233d4c7 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -170,6 +170,12 @@ enum {
> >   	 * fence (dma_fence_array) and i915 generated for parallel submission.
> >   	 */
> >   	I915_FENCE_FLAG_COMPOSITE,
> > +
> > +	/*
> > +	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
> > +	 * of preempt timeout configuration
> > +	 */
> > +	I915_FENCE_FLAG_FORCE_PREEMPT,
> This would be execlist only? I'm a bit concerned about adding a global flag
> that cannot be implemented on current and future hardware.
> 

That ship has sailed... A lot of flags defined here are backend specific.

Matt

> John.
> 
> >   };
> >   /**
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case
@ 2022-01-26 20:10       ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-26 20:10 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Wed, Jan 26, 2022 at 11:03:24AM -0800, John Harrison wrote:
> On 1/24/2022 07:01, Matthew Brost wrote:
> > More than 1 request can be submitted to a single ELSP at a time if
> > multiple requests are ready run to on the same context. When a request
> > is canceled it is marked bad, an idle pulse is triggered to the engine
> > (high priority kernel request), the execlists scheduler sees that
> > running request is bad and sets preemption timeout to minimum value (1
> > ms). This fails to work if multiple requests are combined on the ELSP as
> > only the most recent request is stored in the execlists schedule (the
> > request stored in the ELSP isn't marked bad, thus preemption timeout
> > isn't set to the minimum value). If the preempt timeout is configured to
> > zero, the engine is permanently hung. This is shown by an upcoming
> > selftest.
> > 
> > To work around this, mark the idle pulse with a flag to force a preempt
> > with the minimum value.
> > 
> > Fixes: 38b237eab2bc7 ("drm/i915: Individual request cancellation")
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 23 +++++++++++++++----
> >   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  1 +
> >   .../drm/i915/gt/intel_execlists_submission.c  | 18 ++++++++++-----
> >   drivers/gpu/drm/i915/i915_request.h           |  6 +++++
> >   4 files changed, 38 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > index a3698f611f457..efd1c719b4072 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> > @@ -243,7 +243,8 @@ void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
> >   	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
> >   }
> > -static int __intel_engine_pulse(struct intel_engine_cs *engine)
> > +static int __intel_engine_pulse(struct intel_engine_cs *engine,
> > +				bool force_preempt)
> >   {
> >   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_BARRIER };
> >   	struct intel_context *ce = engine->kernel_context;
> > @@ -258,6 +259,8 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
> >   		return PTR_ERR(rq);
> >   	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> > +	if (force_preempt)
> > +		__set_bit(I915_FENCE_FLAG_FORCE_PREEMPT, &rq->fence.flags);
> >   	heartbeat_commit(rq, &attr);
> >   	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
> > @@ -299,7 +302,7 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
> >   		/* recheck current execution */
> >   		if (intel_engine_has_preemption(engine)) {
> > -			err = __intel_engine_pulse(engine);
> > +			err = __intel_engine_pulse(engine, false);
> >   			if (err)
> >   				set_heartbeat(engine, saved);
> >   		}
> > @@ -312,7 +315,8 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
> >   	return err;
> >   }
> > -int intel_engine_pulse(struct intel_engine_cs *engine)
> > +static int _intel_engine_pulse(struct intel_engine_cs *engine,
> > +			       bool force_preempt)
> >   {
> >   	struct intel_context *ce = engine->kernel_context;
> >   	int err;
> > @@ -325,7 +329,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
> >   	err = -EINTR;
> >   	if (!mutex_lock_interruptible(&ce->timeline->mutex)) {
> > -		err = __intel_engine_pulse(engine);
> > +		err = __intel_engine_pulse(engine, force_preempt);
> >   		mutex_unlock(&ce->timeline->mutex);
> >   	}
> > @@ -334,6 +338,17 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
> >   	return err;
> >   }
> > +int intel_engine_pulse(struct intel_engine_cs *engine)
> > +{
> > +	return _intel_engine_pulse(engine, false);
> > +}
> > +
> > +
> > +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine)
> > +{
> > +	return _intel_engine_pulse(engine, true);
> > +}
> > +
> >   int intel_engine_flush_barriers(struct intel_engine_cs *engine)
> >   {
> >   	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > index 5da6d809a87a2..d9c8386754cb3 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
> > @@ -21,6 +21,7 @@ void intel_gt_park_heartbeats(struct intel_gt *gt);
> >   void intel_gt_unpark_heartbeats(struct intel_gt *gt);
> >   int intel_engine_pulse(struct intel_engine_cs *engine);
> > +int intel_engine_pulse_force_preempt(struct intel_engine_cs *engine);
> >   int intel_engine_flush_barriers(struct intel_engine_cs *engine);
> >   #endif /* INTEL_ENGINE_HEARTBEAT_H */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 960a9aaf4f3a3..f0c2024058731 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -1222,26 +1222,29 @@ static void record_preemption(struct intel_engine_execlists *execlists)
> >   }
> >   static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
> > -					    const struct i915_request *rq)
> > +					    const struct i915_request *rq,
> > +					    bool force_preempt)
> >   {
> >   	if (!rq)
> >   		return 0;
> >   	/* Force a fast reset for terminated contexts (ignoring sysfs!) */
> > -	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
> > +	if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq) ||
> > +		     force_preempt))
> >   		return 1;
> >   	return READ_ONCE(engine->props.preempt_timeout_ms);
> >   }
> >   static void set_preempt_timeout(struct intel_engine_cs *engine,
> > -				const struct i915_request *rq)
> > +				const struct i915_request *rq,
> > +				bool force_preempt)
> >   {
> >   	if (!intel_engine_has_preempt_reset(engine))
> >   		return;
> >   	set_timer_ms(&engine->execlists.preempt,
> > -		     active_preempt_timeout(engine, rq));
> > +		     active_preempt_timeout(engine, rq, force_preempt));
> >   }
> >   static bool completed(const struct i915_request *rq)
> > @@ -1584,12 +1587,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >   	    memcmp(active,
> >   		   execlists->pending,
> >   		   (port - execlists->pending) * sizeof(*port))) {
> > +		bool force_preempt = test_bit(I915_FENCE_FLAG_FORCE_PREEMPT,
> > +					      &last->fence.flags);
> > +
> >   		*port = NULL;
> >   		while (port-- != execlists->pending)
> >   			execlists_schedule_in(*port, port - execlists->pending);
> >   		WRITE_ONCE(execlists->yield, -1);
> > -		set_preempt_timeout(engine, *active);
> > +		set_preempt_timeout(engine, *active, force_preempt);
> >   		execlists_submit_ports(engine);
> >   	} else {
> >   		ring_set_paused(engine, 0);
> > @@ -2594,7 +2600,7 @@ static void execlists_context_cancel_request(struct intel_context *ce,
> >   	i915_request_active_engine(rq, &engine);
> > -	if (engine && intel_engine_pulse(engine))
> > +	if (engine && intel_engine_pulse_force_preempt(engine))
> >   		intel_gt_handle_error(engine->gt, engine->mask, 0,
> >   				      "request cancellation by %s",
> >   				      current->comm);
> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index 28b1f9db54875..7e6312233d4c7 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -170,6 +170,12 @@ enum {
> >   	 * fence (dma_fence_array) and i915 generated for parallel submission.
> >   	 */
> >   	I915_FENCE_FLAG_COMPOSITE,
> > +
> > +	/*
> > +	 * I915_FENCE_FLAG_FORCE_PREEMPT - Force preempt immediately regardless
> > +	 * of preempt timeout configuration
> > +	 */
> > +	I915_FENCE_FLAG_FORCE_PREEMPT,
> This would be execlist only? I'm a bit concerned about adding a global flag
> that cannot be implemented on current and future hardware.
> 

That ship has sailed... A lot of flags defined here are backend specific.

Matt

> John.
> 
> >   };
> >   /**
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/4] drm/i915/guc: Cancel requests immediately
  2022-01-26 18:58     ` [Intel-gfx] " John Harrison
@ 2022-01-26 20:12       ` Matthew Brost
  -1 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-26 20:12 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel, tvrtko.ursulin

On Wed, Jan 26, 2022 at 10:58:46AM -0800, John Harrison wrote:
> On 1/24/2022 07:01, Matthew Brost wrote:
> > Change the preemption timeout to the smallest possible value (1 us) when
> > disabling scheduling to cancel a request and restore it after
> > cancellation. This not only cancels the request as fast as possible, it
> > fixes a bug where the preemption timeout is 0 which results in the
> > schedule disable hanging forever.
> Shouldn't there be an 'if' in the above statement? The pre-emption timeout
> is not normally zero.
>

Yes. Will reword.
 
> > 
> > Reported-by: Jani Saarinen <jani.saarinen@intel.com>
> > Fixes: 62eaf0ae217d4 ("drm/i915/guc: Support request cancellation")
> > Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4960
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 46 +++++++++++--------
> >   2 files changed, 31 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 30cd81ad8911a..730998823dbea 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -198,6 +198,11 @@ struct intel_context {
> >   		 * each priority bucket
> >   		 */
> >   		u32 prio_count[GUC_CLIENT_PRIORITY_NUM];
> > +		/**
> > +		 * @preemption_timeout: preemption timeout of the context, used
> > +		 * to restore this value after request cancellation
> > +		 */
> > +		u32 preemption_timeout;
> >   	} guc_state;
> >   	struct {
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 3918f1be114fa..966947c450253 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -2147,7 +2147,8 @@ static inline u32 get_children_join_value(struct intel_context *ce,
> >   	return __get_parent_scratch(ce)->join[child_index].semaphore;
> >   }
> > -static void guc_context_policy_init(struct intel_engine_cs *engine,
> > +static void guc_context_policy_init(struct intel_context *ce,
> > +				    struct intel_engine_cs *engine,
> >   				    struct guc_lrc_desc *desc)
> Shouldn't engine be before ce? The more general structure usually goes
> first.
> 

Sure. Fix fix this in the next rev.

Matt

> John.
> 
> >   {
> >   	desc->policy_flags = 0;
> > @@ -2157,7 +2158,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> >   	/* NB: For both of these, zero means disabled. */
> >   	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
> > -	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> > +	ce->guc_state.preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> > +	desc->preemption_timeout = ce->guc_state.preemption_timeout;
> >   }
> >   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> > @@ -2193,7 +2195,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	desc->hw_context_desc = ce->lrc.lrca;
> >   	desc->priority = ce->guc_state.prio;
> >   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > -	guc_context_policy_init(engine, desc);
> > +	guc_context_policy_init(ce, engine, desc);
> >   	/*
> >   	 * If context is a parent, we need to register a process descriptor
> > @@ -2226,7 +2228,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   			desc->hw_context_desc = child->lrc.lrca;
> >   			desc->priority = ce->guc_state.prio;
> >   			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > -			guc_context_policy_init(engine, desc);
> > +			guc_context_policy_init(child, engine, desc);
> >   		}
> >   		clear_children_join_go_memory(ce);
> > @@ -2409,6 +2411,19 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
> >   	return ce->guc_id.id;
> >   }
> > +static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> > +						 u16 guc_id,
> > +						 u32 preemption_timeout)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> > +		guc_id,
> > +		preemption_timeout
> > +	};
> > +
> > +	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > +}
> > +
> >   static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> > @@ -2442,8 +2457,10 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
> >   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > -	with_intel_runtime_pm(runtime_pm, wakeref)
> > +	with_intel_runtime_pm(runtime_pm, wakeref) {
> > +		__guc_context_set_preemption_timeout(guc, guc_id, 1);
> >   		__guc_context_sched_disable(guc, ce, guc_id);
> > +	}
> >   	return &ce->guc_state.blocked;
> >   }
> > @@ -2492,8 +2509,10 @@ static void guc_context_unblock(struct intel_context *ce)
> >   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > -	if (enable) {
> > -		with_intel_runtime_pm(runtime_pm, wakeref)
> > +	with_intel_runtime_pm(runtime_pm, wakeref) {
> > +		__guc_context_set_preemption_timeout(guc, ce->guc_id.id,
> > +						     ce->guc_state.preemption_timeout);
> > +		if (enable)
> >   			__guc_context_sched_enable(guc, ce);
> >   	}
> >   }
> > @@ -2521,19 +2540,6 @@ static void guc_context_cancel_request(struct intel_context *ce,
> >   	}
> >   }
> > -static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> > -						 u16 guc_id,
> > -						 u32 preemption_timeout)
> > -{
> > -	u32 action[] = {
> > -		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> > -		guc_id,
> > -		preemption_timeout
> > -	};
> > -
> > -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > -}
> > -
> >   static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-gfx] [PATCH 2/4] drm/i915/guc: Cancel requests immediately
@ 2022-01-26 20:12       ` Matthew Brost
  0 siblings, 0 replies; 31+ messages in thread
From: Matthew Brost @ 2022-01-26 20:12 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Wed, Jan 26, 2022 at 10:58:46AM -0800, John Harrison wrote:
> On 1/24/2022 07:01, Matthew Brost wrote:
> > Change the preemption timeout to the smallest possible value (1 us) when
> > disabling scheduling to cancel a request and restore it after
> > cancellation. This not only cancels the request as fast as possible, it
> > fixes a bug where the preemption timeout is 0 which results in the
> > schedule disable hanging forever.
> Shouldn't there be an 'if' in the above statement? The pre-emption timeout
> is not normally zero.
>

Yes. Will reword.
 
> > 
> > Reported-by: Jani Saarinen <jani.saarinen@intel.com>
> > Fixes: 62eaf0ae217d4 ("drm/i915/guc: Support request cancellation")
> > Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4960
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 46 +++++++++++--------
> >   2 files changed, 31 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 30cd81ad8911a..730998823dbea 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -198,6 +198,11 @@ struct intel_context {
> >   		 * each priority bucket
> >   		 */
> >   		u32 prio_count[GUC_CLIENT_PRIORITY_NUM];
> > +		/**
> > +		 * @preemption_timeout: preemption timeout of the context, used
> > +		 * to restore this value after request cancellation
> > +		 */
> > +		u32 preemption_timeout;
> >   	} guc_state;
> >   	struct {
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 3918f1be114fa..966947c450253 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -2147,7 +2147,8 @@ static inline u32 get_children_join_value(struct intel_context *ce,
> >   	return __get_parent_scratch(ce)->join[child_index].semaphore;
> >   }
> > -static void guc_context_policy_init(struct intel_engine_cs *engine,
> > +static void guc_context_policy_init(struct intel_context *ce,
> > +				    struct intel_engine_cs *engine,
> >   				    struct guc_lrc_desc *desc)
> Shouldn't engine be before ce? The more general structure usually goes
> first.
> 

Sure. Fix fix this in the next rev.

Matt

> John.
> 
> >   {
> >   	desc->policy_flags = 0;
> > @@ -2157,7 +2158,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> >   	/* NB: For both of these, zero means disabled. */
> >   	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
> > -	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> > +	ce->guc_state.preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> > +	desc->preemption_timeout = ce->guc_state.preemption_timeout;
> >   }
> >   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> > @@ -2193,7 +2195,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	desc->hw_context_desc = ce->lrc.lrca;
> >   	desc->priority = ce->guc_state.prio;
> >   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > -	guc_context_policy_init(engine, desc);
> > +	guc_context_policy_init(ce, engine, desc);
> >   	/*
> >   	 * If context is a parent, we need to register a process descriptor
> > @@ -2226,7 +2228,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   			desc->hw_context_desc = child->lrc.lrca;
> >   			desc->priority = ce->guc_state.prio;
> >   			desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > -			guc_context_policy_init(engine, desc);
> > +			guc_context_policy_init(child, engine, desc);
> >   		}
> >   		clear_children_join_go_memory(ce);
> > @@ -2409,6 +2411,19 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
> >   	return ce->guc_id.id;
> >   }
> > +static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> > +						 u16 guc_id,
> > +						 u32 preemption_timeout)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> > +		guc_id,
> > +		preemption_timeout
> > +	};
> > +
> > +	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > +}
> > +
> >   static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> > @@ -2442,8 +2457,10 @@ static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
> >   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > -	with_intel_runtime_pm(runtime_pm, wakeref)
> > +	with_intel_runtime_pm(runtime_pm, wakeref) {
> > +		__guc_context_set_preemption_timeout(guc, guc_id, 1);
> >   		__guc_context_sched_disable(guc, ce, guc_id);
> > +	}
> >   	return &ce->guc_state.blocked;
> >   }
> > @@ -2492,8 +2509,10 @@ static void guc_context_unblock(struct intel_context *ce)
> >   	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > -	if (enable) {
> > -		with_intel_runtime_pm(runtime_pm, wakeref)
> > +	with_intel_runtime_pm(runtime_pm, wakeref) {
> > +		__guc_context_set_preemption_timeout(guc, ce->guc_id.id,
> > +						     ce->guc_state.preemption_timeout);
> > +		if (enable)
> >   			__guc_context_sched_enable(guc, ce);
> >   	}
> >   }
> > @@ -2521,19 +2540,6 @@ static void guc_context_cancel_request(struct intel_context *ce,
> >   	}
> >   }
> > -static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> > -						 u16 guc_id,
> > -						 u32 preemption_timeout)
> > -{
> > -	u32 action[] = {
> > -		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> > -		guc_id,
> > -		preemption_timeout
> > -	};
> > -
> > -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > -}
> > -
> >   static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-01-26 20:18 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-24 15:01 [PATCH 0/4] Fix up request cancel Matthew Brost
2022-01-24 15:01 ` [Intel-gfx] " Matthew Brost
2022-01-24 15:01 ` [PATCH 1/4] drm/i915: Add request cancel low level trace point Matthew Brost
2022-01-24 15:01   ` [Intel-gfx] " Matthew Brost
2022-01-25 12:27   ` Tvrtko Ursulin
2022-01-25 16:39     ` Matthew Brost
2022-01-26 10:29       ` Tvrtko Ursulin
2022-01-24 15:01 ` [PATCH 2/4] drm/i915/guc: Cancel requests immediately Matthew Brost
2022-01-24 15:01   ` [Intel-gfx] " Matthew Brost
2022-01-26 18:58   ` John Harrison
2022-01-26 18:58     ` [Intel-gfx] " John Harrison
2022-01-26 20:12     ` Matthew Brost
2022-01-26 20:12       ` [Intel-gfx] " Matthew Brost
2022-01-24 15:01 ` [PATCH 3/4] drm/i915/execlists: Fix execlists request cancellation corner case Matthew Brost
2022-01-24 15:01   ` [Intel-gfx] " Matthew Brost
2022-01-25 15:27   ` Tvrtko Ursulin
2022-01-25 16:32     ` Matthew Brost
2022-01-26 10:38       ` Tvrtko Ursulin
2022-01-26 19:03   ` John Harrison
2022-01-26 19:03     ` [Intel-gfx] " John Harrison
2022-01-26 20:10     ` Matthew Brost
2022-01-26 20:10       ` [Intel-gfx] " Matthew Brost
2022-01-24 15:01 ` [PATCH 4/4] drm/i915/selftests: Set preemption timeout to zero in cancel reset test Matthew Brost
2022-01-24 15:01   ` [Intel-gfx] " Matthew Brost
2022-01-24 22:52 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix up request cancel (rev2) Patchwork
2022-01-24 22:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-24 23:27 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-01-25  4:54 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-01-25  6:45 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix up request cancel (rev3) Patchwork
2022-01-25  6:46 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-01-25  7:17 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.