All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Clean up some GuC related failure paths
@ 2023-01-12  1:54 ` John.C.Harrison
  0 siblings, 0 replies; 15+ messages in thread
From: John.C.Harrison @ 2023-01-12  1:54 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

Improve failure code handling during GuC intialisation.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>


John Harrison (2):
  drm/i915/guc: Improve clean up of busyness stats worker
  drm/i915/guc: Fix missing return code checks in submission init

 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 113 +++++++++++++-----
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |   7 +-
 3 files changed, 89 insertions(+), 33 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Intel-gfx] [PATCH 0/2] Clean up some GuC related failure paths
@ 2023-01-12  1:54 ` John.C.Harrison
  0 siblings, 0 replies; 15+ messages in thread
From: John.C.Harrison @ 2023-01-12  1:54 UTC (permalink / raw)
  To: Intel-GFX; +Cc: DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

Improve failure code handling during GuC intialisation.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>


John Harrison (2):
  drm/i915/guc: Improve clean up of busyness stats worker
  drm/i915/guc: Fix missing return code checks in submission init

 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 113 +++++++++++++-----
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |   7 +-
 3 files changed, 89 insertions(+), 33 deletions(-)

-- 
2.39.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] drm/i915/guc: Improve clean up of busyness stats worker
  2023-01-12  1:54 ` [Intel-gfx] " John.C.Harrison
@ 2023-01-12  1:54   ` John.C.Harrison
  -1 siblings, 0 replies; 15+ messages in thread
From: John.C.Harrison @ 2023-01-12  1:54 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

The stats worker thread management was mis-matched between
enable/disable call sites. Fix those up. Also, abstract the cancel
code into a helper function rather than replicating in multiple places.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 22 ++++++++++++-------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b436dd7f12e42..982364777d0c6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1435,19 +1435,25 @@ static void guc_init_engine_stats(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	intel_wakeref_t wakeref;
+	int ret;
 
 	mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
 			 guc->timestamp.ping_delay);
 
-	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref) {
-		int ret = guc_action_enable_usage_stats(guc);
+	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
+		ret = guc_action_enable_usage_stats(guc);
 
-		if (ret)
-			drm_err(&gt->i915->drm,
-				"Failed to enable usage stats: %d!\n", ret);
+	if (ret) {
+		cancel_delayed_work_sync(&guc->timestamp.work);
+		drm_err(&gt->i915->drm, "Failed to enable usage stats: %d!\n", ret);
 	}
 }
 
+static void guc_park_engine_stats(struct intel_guc *guc)
+{
+	cancel_delayed_work_sync(&guc->timestamp.work);
+}
+
 void intel_guc_busyness_park(struct intel_gt *gt)
 {
 	struct intel_guc *guc = &gt->uc.guc;
@@ -1460,7 +1466,7 @@ void intel_guc_busyness_park(struct intel_gt *gt)
 	 * and causes an unclaimed register access warning. Cancel the worker
 	 * synchronously here.
 	 */
-	cancel_delayed_work_sync(&guc->timestamp.work);
+	guc_park_engine_stats(guc);
 
 	/*
 	 * Before parking, we should sample engine busyness stats if we need to.
@@ -4409,11 +4415,11 @@ void intel_guc_submission_enable(struct intel_guc *guc)
 	guc_init_global_schedule_policy(guc);
 }
 
+/* Note: By the time we're here, GuC may have already been reset */
 void intel_guc_submission_disable(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
-
-	/* Note: By the time we're here, GuC may have already been reset */
+	guc_park_engine_stats(guc);
 
 	/* Disable and route to host */
 	if (GRAPHICS_VER(gt->i915) >= 12)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Intel-gfx] [PATCH 1/2] drm/i915/guc: Improve clean up of busyness stats worker
@ 2023-01-12  1:54   ` John.C.Harrison
  0 siblings, 0 replies; 15+ messages in thread
From: John.C.Harrison @ 2023-01-12  1:54 UTC (permalink / raw)
  To: Intel-GFX; +Cc: DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

The stats worker thread management was mis-matched between
enable/disable call sites. Fix those up. Also, abstract the cancel
code into a helper function rather than replicating in multiple places.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 22 ++++++++++++-------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index b436dd7f12e42..982364777d0c6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1435,19 +1435,25 @@ static void guc_init_engine_stats(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	intel_wakeref_t wakeref;
+	int ret;
 
 	mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
 			 guc->timestamp.ping_delay);
 
-	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref) {
-		int ret = guc_action_enable_usage_stats(guc);
+	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
+		ret = guc_action_enable_usage_stats(guc);
 
-		if (ret)
-			drm_err(&gt->i915->drm,
-				"Failed to enable usage stats: %d!\n", ret);
+	if (ret) {
+		cancel_delayed_work_sync(&guc->timestamp.work);
+		drm_err(&gt->i915->drm, "Failed to enable usage stats: %d!\n", ret);
 	}
 }
 
+static void guc_park_engine_stats(struct intel_guc *guc)
+{
+	cancel_delayed_work_sync(&guc->timestamp.work);
+}
+
 void intel_guc_busyness_park(struct intel_gt *gt)
 {
 	struct intel_guc *guc = &gt->uc.guc;
@@ -1460,7 +1466,7 @@ void intel_guc_busyness_park(struct intel_gt *gt)
 	 * and causes an unclaimed register access warning. Cancel the worker
 	 * synchronously here.
 	 */
-	cancel_delayed_work_sync(&guc->timestamp.work);
+	guc_park_engine_stats(guc);
 
 	/*
 	 * Before parking, we should sample engine busyness stats if we need to.
@@ -4409,11 +4415,11 @@ void intel_guc_submission_enable(struct intel_guc *guc)
 	guc_init_global_schedule_policy(guc);
 }
 
+/* Note: By the time we're here, GuC may have already been reset */
 void intel_guc_submission_disable(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
-
-	/* Note: By the time we're here, GuC may have already been reset */
+	guc_park_engine_stats(guc);
 
 	/* Disable and route to host */
 	if (GRAPHICS_VER(gt->i915) >= 12)
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] drm/i915/guc: Fix missing return code checks in submission init
  2023-01-12  1:54 ` [Intel-gfx] " John.C.Harrison
@ 2023-01-12  1:54   ` John.C.Harrison
  -1 siblings, 0 replies; 15+ messages in thread
From: John.C.Harrison @ 2023-01-12  1:54 UTC (permalink / raw)
  To: Intel-GFX; +Cc: John Harrison, DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

The CI results for the 'fast request' patch set (enables error return
codes for fire-and-forget H2G messages) hit an issue with the KMD
sending context submission requests on an invalid context. That was
caused by a fault injection probe failing the context creation of a
kernel context. However, there was no return code checking on any of
the kernel context registration paths. So the driver kept going and
tried to use the kernel context for the record defaults process.

This would not cause any actual problems. The invalid requests would
be rejected by GuC and ultimately the start up sequence would
correctly wedge due to the context creation failure. But fixing the
issue correctly rather ignoring it means we won't get CI complaining
when the fast request patch lands and enables the extra error checking.

So fix it by checking for errors and aborting as appropriate when
creating kernel contexts. While at it, clean up some other submission
init related failure cleanup paths. Also, rename guc_init_lrc_mapping
to guc_init_submission as the former name hasn't been valid in a long
time.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++-----
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  7 +-
 3 files changed, 75 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 982364777d0c6..dd856fd92945b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1431,7 +1431,7 @@ static int guc_action_enable_usage_stats(struct intel_guc *guc)
 	return intel_guc_send(guc, action, ARRAY_SIZE(action));
 }
 
-static void guc_init_engine_stats(struct intel_guc *guc)
+static int guc_init_engine_stats(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	intel_wakeref_t wakeref;
@@ -1447,6 +1447,8 @@ static void guc_init_engine_stats(struct intel_guc *guc)
 		cancel_delayed_work_sync(&guc->timestamp.work);
 		drm_err(&gt->i915->drm, "Failed to enable usage stats: %d!\n", ret);
 	}
+
+	return ret;
 }
 
 static void guc_park_engine_stats(struct intel_guc *guc)
@@ -4108,9 +4110,11 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = guc_submit_request;
 }
 
-static inline void guc_kernel_context_pin(struct intel_guc *guc,
-					  struct intel_context *ce)
+static inline int guc_kernel_context_pin(struct intel_guc *guc,
+					 struct intel_context *ce)
 {
+	int ret;
+
 	/*
 	 * Note: we purposefully do not check the returns below because
 	 * the registration can only fail if a reset is just starting.
@@ -4118,16 +4122,24 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
 	 * isn't happening and even it did this code would be run again.
 	 */
 
-	if (context_guc_id_invalid(ce))
-		pin_guc_id(guc, ce);
+	if (context_guc_id_invalid(ce)) {
+		int ret = pin_guc_id(guc, ce);
+
+		if (ret < 0)
+			return ret;
+	}
 
 	if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))
 		guc_context_init(ce);
 
-	try_context_registration(ce, true);
+	ret = try_context_registration(ce, true);
+	if (ret)
+		unpin_guc_id(guc, ce);
+
+	return ret;
 }
 
-static inline void guc_init_lrc_mapping(struct intel_guc *guc)
+static inline int guc_init_submission(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	struct intel_engine_cs *engine;
@@ -4154,9 +4166,17 @@ static inline void guc_init_lrc_mapping(struct intel_guc *guc)
 		struct intel_context *ce;
 
 		list_for_each_entry(ce, &engine->pinned_contexts_list,
-				    pinned_contexts_link)
-			guc_kernel_context_pin(guc, ce);
+				    pinned_contexts_link) {
+			int ret = guc_kernel_context_pin(guc, ce);
+
+			if (ret) {
+				/* No point in trying to clean up as i915 will wedge on failure */
+				return ret;
+			}
+		}
 	}
+
+	return 0;
 }
 
 static void guc_release(struct intel_engine_cs *engine)
@@ -4400,30 +4420,57 @@ static int guc_init_global_schedule_policy(struct intel_guc *guc)
 	return ret;
 }
 
-void intel_guc_submission_enable(struct intel_guc *guc)
+static void guc_route_semaphores(struct intel_guc *guc, bool to_guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
+	u32 val;
 
-	/* Enable and route to GuC */
-	if (GRAPHICS_VER(gt->i915) >= 12)
-		intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES,
-				   GUC_SEM_INTR_ROUTE_TO_GUC |
-				   GUC_SEM_INTR_ENABLE_ALL);
+	if (GRAPHICS_VER(gt->i915) < 12)
+		return;
 
-	guc_init_lrc_mapping(guc);
-	guc_init_engine_stats(guc);
-	guc_init_global_schedule_policy(guc);
+	if (to_guc)
+		val = GUC_SEM_INTR_ROUTE_TO_GUC | GUC_SEM_INTR_ENABLE_ALL;
+	else
+		val = 0;
+
+	intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, val);
+}
+
+int intel_guc_submission_enable(struct intel_guc *guc)
+{
+	int ret;
+
+	/* Semaphore interrupt enable and route to GuC */
+	guc_route_semaphores(guc, true);
+
+	ret = guc_init_submission(guc);
+	if (ret)
+		goto fail_sem;
+
+	ret = guc_init_engine_stats(guc);
+	if (ret)
+		goto fail_sem;
+
+	ret = guc_init_global_schedule_policy(guc);
+	if (ret)
+		goto fail_stats;
+
+	return 0;
+
+fail_stats:
+	guc_park_engine_stats(guc);
+fail_sem:
+	guc_route_semaphores(guc, false);
+	return ret;
 }
 
 /* Note: By the time we're here, GuC may have already been reset */
 void intel_guc_submission_disable(struct intel_guc *guc)
 {
-	struct intel_gt *gt = guc_to_gt(guc);
 	guc_park_engine_stats(guc);
 
-	/* Disable and route to host */
-	if (GRAPHICS_VER(gt->i915) >= 12)
-		intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, 0x0);
+	/* Semaphore interrupt disable and route to host */
+	guc_route_semaphores(guc, false);
 }
 
 static bool __guc_submission_supported(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 5a95a9f0a8e31..c57b29cdb1a64 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -15,7 +15,7 @@ struct intel_engine_cs;
 
 void intel_guc_submission_init_early(struct intel_guc *guc);
 int intel_guc_submission_init(struct intel_guc *guc);
-void intel_guc_submission_enable(struct intel_guc *guc);
+int intel_guc_submission_enable(struct intel_guc *guc);
 void intel_guc_submission_disable(struct intel_guc *guc);
 void intel_guc_submission_fini(struct intel_guc *guc);
 int intel_guc_preempt_work_create(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 9a8a1abf71d7f..8e338b3321a93 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -537,8 +537,11 @@ static int __uc_init_hw(struct intel_uc *uc)
 	else
 		intel_huc_auth(huc);
 
-	if (intel_uc_uses_guc_submission(uc))
-		intel_guc_submission_enable(guc);
+	if (intel_uc_uses_guc_submission(uc)) {
+		ret = intel_guc_submission_enable(guc);
+		if (ret)
+			goto err_log_capture;
+	}
 
 	if (intel_uc_uses_guc_slpc(uc)) {
 		ret = intel_guc_slpc_enable(&guc->slpc);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Intel-gfx] [PATCH 2/2] drm/i915/guc: Fix missing return code checks in submission init
@ 2023-01-12  1:54   ` John.C.Harrison
  0 siblings, 0 replies; 15+ messages in thread
From: John.C.Harrison @ 2023-01-12  1:54 UTC (permalink / raw)
  To: Intel-GFX; +Cc: DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

The CI results for the 'fast request' patch set (enables error return
codes for fire-and-forget H2G messages) hit an issue with the KMD
sending context submission requests on an invalid context. That was
caused by a fault injection probe failing the context creation of a
kernel context. However, there was no return code checking on any of
the kernel context registration paths. So the driver kept going and
tried to use the kernel context for the record defaults process.

This would not cause any actual problems. The invalid requests would
be rejected by GuC and ultimately the start up sequence would
correctly wedge due to the context creation failure. But fixing the
issue correctly rather ignoring it means we won't get CI complaining
when the fast request patch lands and enables the extra error checking.

So fix it by checking for errors and aborting as appropriate when
creating kernel contexts. While at it, clean up some other submission
init related failure cleanup paths. Also, rename guc_init_lrc_mapping
to guc_init_submission as the former name hasn't been valid in a long
time.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++-----
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  7 +-
 3 files changed, 75 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 982364777d0c6..dd856fd92945b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1431,7 +1431,7 @@ static int guc_action_enable_usage_stats(struct intel_guc *guc)
 	return intel_guc_send(guc, action, ARRAY_SIZE(action));
 }
 
-static void guc_init_engine_stats(struct intel_guc *guc)
+static int guc_init_engine_stats(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	intel_wakeref_t wakeref;
@@ -1447,6 +1447,8 @@ static void guc_init_engine_stats(struct intel_guc *guc)
 		cancel_delayed_work_sync(&guc->timestamp.work);
 		drm_err(&gt->i915->drm, "Failed to enable usage stats: %d!\n", ret);
 	}
+
+	return ret;
 }
 
 static void guc_park_engine_stats(struct intel_guc *guc)
@@ -4108,9 +4110,11 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = guc_submit_request;
 }
 
-static inline void guc_kernel_context_pin(struct intel_guc *guc,
-					  struct intel_context *ce)
+static inline int guc_kernel_context_pin(struct intel_guc *guc,
+					 struct intel_context *ce)
 {
+	int ret;
+
 	/*
 	 * Note: we purposefully do not check the returns below because
 	 * the registration can only fail if a reset is just starting.
@@ -4118,16 +4122,24 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
 	 * isn't happening and even it did this code would be run again.
 	 */
 
-	if (context_guc_id_invalid(ce))
-		pin_guc_id(guc, ce);
+	if (context_guc_id_invalid(ce)) {
+		int ret = pin_guc_id(guc, ce);
+
+		if (ret < 0)
+			return ret;
+	}
 
 	if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))
 		guc_context_init(ce);
 
-	try_context_registration(ce, true);
+	ret = try_context_registration(ce, true);
+	if (ret)
+		unpin_guc_id(guc, ce);
+
+	return ret;
 }
 
-static inline void guc_init_lrc_mapping(struct intel_guc *guc)
+static inline int guc_init_submission(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	struct intel_engine_cs *engine;
@@ -4154,9 +4166,17 @@ static inline void guc_init_lrc_mapping(struct intel_guc *guc)
 		struct intel_context *ce;
 
 		list_for_each_entry(ce, &engine->pinned_contexts_list,
-				    pinned_contexts_link)
-			guc_kernel_context_pin(guc, ce);
+				    pinned_contexts_link) {
+			int ret = guc_kernel_context_pin(guc, ce);
+
+			if (ret) {
+				/* No point in trying to clean up as i915 will wedge on failure */
+				return ret;
+			}
+		}
 	}
+
+	return 0;
 }
 
 static void guc_release(struct intel_engine_cs *engine)
@@ -4400,30 +4420,57 @@ static int guc_init_global_schedule_policy(struct intel_guc *guc)
 	return ret;
 }
 
-void intel_guc_submission_enable(struct intel_guc *guc)
+static void guc_route_semaphores(struct intel_guc *guc, bool to_guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
+	u32 val;
 
-	/* Enable and route to GuC */
-	if (GRAPHICS_VER(gt->i915) >= 12)
-		intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES,
-				   GUC_SEM_INTR_ROUTE_TO_GUC |
-				   GUC_SEM_INTR_ENABLE_ALL);
+	if (GRAPHICS_VER(gt->i915) < 12)
+		return;
 
-	guc_init_lrc_mapping(guc);
-	guc_init_engine_stats(guc);
-	guc_init_global_schedule_policy(guc);
+	if (to_guc)
+		val = GUC_SEM_INTR_ROUTE_TO_GUC | GUC_SEM_INTR_ENABLE_ALL;
+	else
+		val = 0;
+
+	intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, val);
+}
+
+int intel_guc_submission_enable(struct intel_guc *guc)
+{
+	int ret;
+
+	/* Semaphore interrupt enable and route to GuC */
+	guc_route_semaphores(guc, true);
+
+	ret = guc_init_submission(guc);
+	if (ret)
+		goto fail_sem;
+
+	ret = guc_init_engine_stats(guc);
+	if (ret)
+		goto fail_sem;
+
+	ret = guc_init_global_schedule_policy(guc);
+	if (ret)
+		goto fail_stats;
+
+	return 0;
+
+fail_stats:
+	guc_park_engine_stats(guc);
+fail_sem:
+	guc_route_semaphores(guc, false);
+	return ret;
 }
 
 /* Note: By the time we're here, GuC may have already been reset */
 void intel_guc_submission_disable(struct intel_guc *guc)
 {
-	struct intel_gt *gt = guc_to_gt(guc);
 	guc_park_engine_stats(guc);
 
-	/* Disable and route to host */
-	if (GRAPHICS_VER(gt->i915) >= 12)
-		intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, 0x0);
+	/* Semaphore interrupt disable and route to host */
+	guc_route_semaphores(guc, false);
 }
 
 static bool __guc_submission_supported(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 5a95a9f0a8e31..c57b29cdb1a64 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -15,7 +15,7 @@ struct intel_engine_cs;
 
 void intel_guc_submission_init_early(struct intel_guc *guc);
 int intel_guc_submission_init(struct intel_guc *guc);
-void intel_guc_submission_enable(struct intel_guc *guc);
+int intel_guc_submission_enable(struct intel_guc *guc);
 void intel_guc_submission_disable(struct intel_guc *guc);
 void intel_guc_submission_fini(struct intel_guc *guc);
 int intel_guc_preempt_work_create(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 9a8a1abf71d7f..8e338b3321a93 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -537,8 +537,11 @@ static int __uc_init_hw(struct intel_uc *uc)
 	else
 		intel_huc_auth(huc);
 
-	if (intel_uc_uses_guc_submission(uc))
-		intel_guc_submission_enable(guc);
+	if (intel_uc_uses_guc_submission(uc)) {
+		ret = intel_guc_submission_enable(guc);
+		if (ret)
+			goto err_log_capture;
+	}
 
 	if (intel_uc_uses_guc_slpc(uc)) {
 		ret = intel_guc_slpc_enable(&guc->slpc);
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Clean up some GuC related failure paths
  2023-01-12  1:54 ` [Intel-gfx] " John.C.Harrison
                   ` (2 preceding siblings ...)
  (?)
@ 2023-01-12  2:16 ` Patchwork
  -1 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2023-01-12  2:16 UTC (permalink / raw)
  To: john.c.harrison; +Cc: intel-gfx

== Series Details ==

Series: Clean up some GuC related failure paths
URL   : https://patchwork.freedesktop.org/series/112708/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Clean up some GuC related failure paths
  2023-01-12  1:54 ` [Intel-gfx] " John.C.Harrison
                   ` (3 preceding siblings ...)
  (?)
@ 2023-01-12  2:33 ` Patchwork
  -1 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2023-01-12  2:33 UTC (permalink / raw)
  To: john.c.harrison; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 2078 bytes --]

== Series Details ==

Series: Clean up some GuC related failure paths
URL   : https://patchwork.freedesktop.org/series/112708/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12574 -> Patchwork_112708v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/index.html

Participating hosts (35 -> 34)
------------------------------

  Missing    (1): fi-snb-2520m 

Known issues
------------

  Here are the changes found in Patchwork_112708v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_gttfill@basic:
    - fi-pnv-d510:        [PASS][1] -> [FAIL][2] ([i915#7229])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/fi-pnv-d510/igt@gem_exec_gttfill@basic.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/fi-pnv-d510/igt@gem_exec_gttfill@basic.html

  
#### Possible fixes ####

  * igt@i915_pm_rpm@module-reload:
    - bat-dg1-5:          [SKIP][3] -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/bat-dg1-5/igt@i915_pm_rpm@module-reload.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/bat-dg1-5/igt@i915_pm_rpm@module-reload.html

  
  [i915#7229]: https://gitlab.freedesktop.org/drm/intel/issues/7229


Build changes
-------------

  * Linux: CI_DRM_12574 -> Patchwork_112708v1

  CI-20190529: 20190529
  CI_DRM_12574: bf7f7c53ac622a3f6d6738d062e59dd21ce28bd7 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7116: 79eb8984acd309108be713a8831e60667db67e21 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_112708v1: bf7f7c53ac622a3f6d6738d062e59dd21ce28bd7 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

6aa0e5127b05 drm/i915/guc: Fix missing return code checks in submission init
882f4daee3b3 drm/i915/guc: Improve clean up of busyness stats worker

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/index.html

[-- Attachment #2: Type: text/html, Size: 2708 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for Clean up some GuC related failure paths
  2023-01-12  1:54 ` [Intel-gfx] " John.C.Harrison
                   ` (4 preceding siblings ...)
  (?)
@ 2023-01-12  4:02 ` Patchwork
  -1 siblings, 0 replies; 15+ messages in thread
From: Patchwork @ 2023-01-12  4:02 UTC (permalink / raw)
  To: john.c.harrison; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 20095 bytes --]

== Series Details ==

Series: Clean up some GuC related failure paths
URL   : https://patchwork.freedesktop.org/series/112708/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12574_full -> Patchwork_112708v1_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/index.html

Participating hosts (13 -> 9)
------------------------------

  Missing    (4): shard-rkl0 pig-kbl-iris pig-glk-j5005 pig-skl-6260u 

Known issues
------------

  Here are the changes found in Patchwork_112708v1_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_fair@basic-none@vcs0:
    - shard-glk:          [PASS][1] -> [FAIL][2] ([i915#2842])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-glk7/igt@gem_exec_fair@basic-none@vcs0.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk1/igt@gem_exec_fair@basic-none@vcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-glk:          NOTRUN -> [FAIL][3] ([i915#2842])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk6/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_lmem_swapping@heavy-verify-random-ccs:
    - shard-glk:          NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#4613]) +2 similar issues
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk2/igt@gem_lmem_swapping@heavy-verify-random-ccs.html

  * igt@gem_pread@exhaustion:
    - shard-glk:          NOTRUN -> [WARN][5] ([i915#2658])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk6/igt@gem_pread@exhaustion.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-glk:          [PASS][6] -> [DMESG-WARN][7] ([i915#5566] / [i915#716])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-glk9/igt@gen9_exec_parse@allowed-single.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk1/igt@gen9_exec_parse@allowed-single.html

  * igt@i915_pm_dc@dc3co-vpb-simulation:
    - shard-glk:          NOTRUN -> [SKIP][8] ([fdo#109271] / [i915#658])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk2/igt@i915_pm_dc@dc3co-vpb-simulation.html

  * igt@kms_ccs@pipe-c-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs_cc:
    - shard-glk:          NOTRUN -> [SKIP][9] ([fdo#109271] / [i915#3886]) +4 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk2/igt@kms_ccs@pipe-c-crc-sprite-planes-basic-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_chamelium_color@ctm-blue-to-red:
    - shard-glk:          NOTRUN -> [SKIP][10] ([fdo#109271]) +55 similar issues
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk6/igt@kms_chamelium_color@ctm-blue-to-red.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-hdmi-a2:
    - shard-glk:          [PASS][11] -> [FAIL][12] ([i915#79])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-glk5/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-hdmi-a2.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk9/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-hdmi-a2.html

  * igt@runner@aborted:
    - shard-glk:          NOTRUN -> [FAIL][13] ([i915#4312])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk1/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@drm_fdinfo@idle@rcs0:
    - {shard-rkl}:        [FAIL][14] ([i915#7742]) -> [PASS][15] +1 similar issue
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-4/igt@drm_fdinfo@idle@rcs0.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-3/igt@drm_fdinfo@idle@rcs0.html

  * igt@drm_read@empty-block:
    - {shard-rkl}:        [SKIP][16] ([i915#4098]) -> [PASS][17]
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-3/igt@drm_read@empty-block.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-6/igt@drm_read@empty-block.html

  * igt@gem_ctx_persistence@hang:
    - {shard-rkl}:        [SKIP][18] ([i915#6252]) -> [PASS][19] +1 similar issue
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-5/igt@gem_ctx_persistence@hang.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-3/igt@gem_ctx_persistence@hang.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-glk:          [FAIL][20] ([i915#2842]) -> [PASS][21]
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-glk3/igt@gem_exec_fair@basic-pace@vcs0.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-glk2/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_exec_reloc@basic-gtt-cpu:
    - {shard-rkl}:        [SKIP][22] ([i915#3281]) -> [PASS][23] +7 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-3/igt@gem_exec_reloc@basic-gtt-cpu.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-5/igt@gem_exec_reloc@basic-gtt-cpu.html

  * igt@gem_pread@bench:
    - {shard-rkl}:        [SKIP][24] ([i915#3282]) -> [PASS][25] +1 similar issue
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-2/igt@gem_pread@bench.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-5/igt@gem_pread@bench.html

  * igt@gen9_exec_parse@batch-invalid-length:
    - {shard-rkl}:        [SKIP][26] ([i915#2527]) -> [PASS][27] +3 similar issues
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-3/igt@gen9_exec_parse@batch-invalid-length.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-5/igt@gen9_exec_parse@batch-invalid-length.html

  * igt@i915_pm_dc@dc6-dpms:
    - {shard-rkl}:        [SKIP][28] ([i915#3361]) -> [PASS][29]
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-5/igt@i915_pm_dc@dc6-dpms.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-4/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_rc6_residency@rc6-idle@vcs0:
    - {shard-dg1}:        [FAIL][30] ([i915#3591]) -> [PASS][31]
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-dg1-19/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-dg1-17/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html

  * igt@i915_pm_rpm@dpms-non-lpsp:
    - {shard-dg1}:        [SKIP][32] ([i915#1397]) -> [PASS][33]
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-dg1-14/igt@i915_pm_rpm@dpms-non-lpsp.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-dg1-15/igt@i915_pm_rpm@dpms-non-lpsp.html

  * igt@i915_pm_rpm@i2c:
    - {shard-rkl}:        [SKIP][34] ([fdo#109308]) -> [PASS][35]
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-1/igt@i915_pm_rpm@i2c.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-6/igt@i915_pm_rpm@i2c.html

  * igt@kms_big_fb@linear-32bpp-rotate-0:
    - {shard-rkl}:        [SKIP][36] ([i915#1845] / [i915#4098]) -> [PASS][37] +17 similar issues
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-3/igt@kms_big_fb@linear-32bpp-rotate-0.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-6/igt@kms_big_fb@linear-32bpp-rotate-0.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-onoff:
    - {shard-rkl}:        [SKIP][38] ([i915#1849] / [i915#4098]) -> [PASS][39] +13 similar issues
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-3/igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-onoff.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-6/igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-onoff.html

  * igt@kms_plane@plane-position-hole-dpms@pipe-b-planes:
    - {shard-rkl}:        [SKIP][40] ([i915#1849]) -> [PASS][41] +1 similar issue
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-5/igt@kms_plane@plane-position-hole-dpms@pipe-b-planes.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-6/igt@kms_plane@plane-position-hole-dpms@pipe-b-planes.html

  * igt@kms_psr@cursor_render:
    - {shard-rkl}:        [SKIP][42] ([i915#1072]) -> [PASS][43] +2 similar issues
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-1/igt@kms_psr@cursor_render.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-6/igt@kms_psr@cursor_render.html

  * igt@perf_pmu@idle@rcs0:
    - {shard-rkl}:        [FAIL][44] ([i915#4349]) -> [PASS][45]
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-2/igt@perf_pmu@idle@rcs0.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-5/igt@perf_pmu@idle@rcs0.html

  * igt@prime_vgem@basic-read:
    - {shard-rkl}:        [SKIP][46] ([fdo#109295] / [i915#3291] / [i915#3708]) -> [PASS][47]
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-rkl-2/igt@prime_vgem@basic-read.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-rkl-5/igt@prime_vgem@basic-read.html

  * igt@syncobj_timeline@reset-during-wait-for-submit:
    - {shard-dg1}:        [DMESG-WARN][48] ([i915#1982]) -> [PASS][49]
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-dg1-13/igt@syncobj_timeline@reset-during-wait-for-submit.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-dg1-13/igt@syncobj_timeline@reset-during-wait-for-submit.html

  * igt@sysfs_heartbeat_interval@precise@rcs0:
    - {shard-dg1}:        [FAIL][50] ([i915#1755]) -> [PASS][51]
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12574/shard-dg1-18/igt@sysfs_heartbeat_interval@precise@rcs0.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/shard-dg1-14/igt@sysfs_heartbeat_interval@precise@rcs0.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109279]: https://bugs.freedesktop.org/show_bug.cgi?id=109279
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109283]: https://bugs.freedesktop.org/show_bug.cgi?id=109283
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#109302]: https://bugs.freedesktop.org/show_bug.cgi?id=109302
  [fdo#109303]: https://bugs.freedesktop.org/show_bug.cgi?id=109303
  [fdo#109307]: https://bugs.freedesktop.org/show_bug.cgi?id=109307
  [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308
  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [fdo#109314]: https://bugs.freedesktop.org/show_bug.cgi?id=109314
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189
  [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111644]: https://bugs.freedesktop.org/show_bug.cgi?id=111644
  [fdo#111656]: https://bugs.freedesktop.org/show_bug.cgi?id=111656
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [fdo#112054]: https://bugs.freedesktop.org/show_bug.cgi?id=112054
  [fdo#112283]: https://bugs.freedesktop.org/show_bug.cgi?id=112283
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#1755]: https://gitlab.freedesktop.org/drm/intel/issues/1755
  [i915#1769]: https://gitlab.freedesktop.org/drm/intel/issues/1769
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1902]: https://gitlab.freedesktop.org/drm/intel/issues/1902
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2434]: https://gitlab.freedesktop.org/drm/intel/issues/2434
  [i915#2436]: https://gitlab.freedesktop.org/drm/intel/issues/2436
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2532]: https://gitlab.freedesktop.org/drm/intel/issues/2532
  [i915#2575]: https://gitlab.freedesktop.org/drm/intel/issues/2575
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#2658]: https://gitlab.freedesktop.org/drm/intel/issues/2658
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#2681]: https://gitlab.freedesktop.org/drm/intel/issues/2681
  [i915#2705]: https://gitlab.freedesktop.org/drm/intel/issues/2705
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#284]: https://gitlab.freedesktop.org/drm/intel/issues/284
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994
  [i915#3116]: https://gitlab.freedesktop.org/drm/intel/issues/3116
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3291]: https://gitlab.freedesktop.org/drm/intel/issues/3291
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3299]: https://gitlab.freedesktop.org/drm/intel/issues/3299
  [i915#3318]: https://gitlab.freedesktop.org/drm/intel/issues/3318
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361
  [i915#3469]: https://gitlab.freedesktop.org/drm/intel/issues/3469
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3591]: https://gitlab.freedesktop.org/drm/intel/issues/3591
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734
  [i915#3742]: https://gitlab.freedesktop.org/drm/intel/issues/3742
  [i915#3826]: https://gitlab.freedesktop.org/drm/intel/issues/3826
  [i915#3840]: https://gitlab.freedesktop.org/drm/intel/issues/3840
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3955]: https://gitlab.freedesktop.org/drm/intel/issues/3955
  [i915#404]: https://gitlab.freedesktop.org/drm/intel/issues/404
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#426]: https://gitlab.freedesktop.org/drm/intel/issues/426
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4349]: https://gitlab.freedesktop.org/drm/intel/issues/4349
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4767]: https://gitlab.freedesktop.org/drm/intel/issues/4767
  [i915#4877]: https://gitlab.freedesktop.org/drm/intel/issues/4877
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5288]: https://gitlab.freedesktop.org/drm/intel/issues/5288
  [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289
  [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325
  [i915#5327]: https://gitlab.freedesktop.org/drm/intel/issues/5327
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5439]: https://gitlab.freedesktop.org/drm/intel/issues/5439
  [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566
  [i915#5723]: https://gitlab.freedesktop.org/drm/intel/issues/5723
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6230]: https://gitlab.freedesktop.org/drm/intel/issues/6230
  [i915#6245]: https://gitlab.freedesktop.org/drm/intel/issues/6245
  [i915#6248]: https://gitlab.freedesktop.org/drm/intel/issues/6248
  [i915#6252]: https://gitlab.freedesktop.org/drm/intel/issues/6252
  [i915#6259]: https://gitlab.freedesktop.org/drm/intel/issues/6259
  [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268
  [i915#6301]: https://gitlab.freedesktop.org/drm/intel/issues/6301
  [i915#6334]: https://gitlab.freedesktop.org/drm/intel/issues/6334
  [i915#6335]: https://gitlab.freedesktop.org/drm/intel/issues/6335
  [i915#6344]: https://gitlab.freedesktop.org/drm/intel/issues/6344
  [i915#6412]: https://gitlab.freedesktop.org/drm/intel/issues/6412
  [i915#6433]: https://gitlab.freedesktop.org/drm/intel/issues/6433
  [i915#6497]: https://gitlab.freedesktop.org/drm/intel/issues/6497
  [i915#6524]: https://gitlab.freedesktop.org/drm/intel/issues/6524
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#6768]: https://gitlab.freedesktop.org/drm/intel/issues/6768
  [i915#6944]: https://gitlab.freedesktop.org/drm/intel/issues/6944
  [i915#6946]: https://gitlab.freedesktop.org/drm/intel/issues/6946
  [i915#7037]: https://gitlab.freedesktop.org/drm/intel/issues/7037
  [i915#7116]: https://gitlab.freedesktop.org/drm/intel/issues/7116
  [i915#7118]: https://gitlab.freedesktop.org/drm/intel/issues/7118
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#7276]: https://gitlab.freedesktop.org/drm/intel/issues/7276
  [i915#7561]: https://gitlab.freedesktop.org/drm/intel/issues/7561
  [i915#7582]: https://gitlab.freedesktop.org/drm/intel/issues/7582
  [i915#7697]: https://gitlab.freedesktop.org/drm/intel/issues/7697
  [i915#7701]: https://gitlab.freedesktop.org/drm/intel/issues/7701
  [i915#7707]: https://gitlab.freedesktop.org/drm/intel/issues/7707
  [i915#7711]: https://gitlab.freedesktop.org/drm/intel/issues/7711
  [i915#7742]: https://gitlab.freedesktop.org/drm/intel/issues/7742
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79


Build changes
-------------

  * Linux: CI_DRM_12574 -> Patchwork_112708v1
  * Piglit: piglit_4509 -> None

  CI-20190529: 20190529
  CI_DRM_12574: bf7f7c53ac622a3f6d6738d062e59dd21ce28bd7 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7116: 79eb8984acd309108be713a8831e60667db67e21 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_112708v1: bf7f7c53ac622a3f6d6738d062e59dd21ce28bd7 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_112708v1/index.html

[-- Attachment #2: Type: text/html, Size: 14831 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/guc: Improve clean up of busyness stats worker
  2023-01-12  1:54   ` [Intel-gfx] " John.C.Harrison
  (?)
@ 2023-01-25  0:55   ` Ceraolo Spurio, Daniele
  2023-02-17 20:13     ` John Harrison
  -1 siblings, 1 reply; 15+ messages in thread
From: Ceraolo Spurio, Daniele @ 2023-01-25  0:55 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX; +Cc: DRI-Devel



On 1/11/2023 5:54 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The stats worker thread management was mis-matched between
> enable/disable call sites. Fix those up. Also, abstract the cancel
> code into a helper function rather than replicating in multiple places.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 22 ++++++++++++-------
>   1 file changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index b436dd7f12e42..982364777d0c6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1435,19 +1435,25 @@ static void guc_init_engine_stats(struct intel_guc *guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
>   	intel_wakeref_t wakeref;
> +	int ret;
>   
>   	mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
>   			 guc->timestamp.ping_delay);
>   
> -	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref) {
> -		int ret = guc_action_enable_usage_stats(guc);
> +	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
> +		ret = guc_action_enable_usage_stats(guc);
>   
> -		if (ret)
> -			drm_err(&gt->i915->drm,
> -				"Failed to enable usage stats: %d!\n", ret);
> +	if (ret) {
> +		cancel_delayed_work_sync(&guc->timestamp.work);

Wouldn't it be easier to just call mod_delayed_work after the H2G if 
ret==0, instead of having it before and cancelling if we get a failure?

> +		drm_err(&gt->i915->drm, "Failed to enable usage stats: %d!\n", ret);
>   	}
>   }
>   
> +static void guc_park_engine_stats(struct intel_guc *guc)
> +{
> +	cancel_delayed_work_sync(&guc->timestamp.work);
> +}
> +

Now you're asymmetric with the park/unpark, because on the park side you 
have this wrapper, while on the unpark side you directly call 
mod_delayed_work.

Daniele

>   void intel_guc_busyness_park(struct intel_gt *gt)
>   {
>   	struct intel_guc *guc = &gt->uc.guc;
> @@ -1460,7 +1466,7 @@ void intel_guc_busyness_park(struct intel_gt *gt)
>   	 * and causes an unclaimed register access warning. Cancel the worker
>   	 * synchronously here.
>   	 */
> -	cancel_delayed_work_sync(&guc->timestamp.work);
> +	guc_park_engine_stats(guc);
>   
>   	/*
>   	 * Before parking, we should sample engine busyness stats if we need to.
> @@ -4409,11 +4415,11 @@ void intel_guc_submission_enable(struct intel_guc *guc)
>   	guc_init_global_schedule_policy(guc);
>   }
>   
> +/* Note: By the time we're here, GuC may have already been reset */
>   void intel_guc_submission_disable(struct intel_guc *guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
> -
> -	/* Note: By the time we're here, GuC may have already been reset */
> +	guc_park_engine_stats(guc);
>   
>   	/* Disable and route to host */
>   	if (GRAPHICS_VER(gt->i915) >= 12)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/i915/guc: Fix missing return code checks in submission init
  2023-01-12  1:54   ` [Intel-gfx] " John.C.Harrison
@ 2023-01-25  1:01     ` Ceraolo Spurio, Daniele
  -1 siblings, 0 replies; 15+ messages in thread
From: Ceraolo Spurio, Daniele @ 2023-01-25  1:01 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX; +Cc: DRI-Devel



On 1/11/2023 5:54 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The CI results for the 'fast request' patch set (enables error return
> codes for fire-and-forget H2G messages) hit an issue with the KMD
> sending context submission requests on an invalid context. That was
> caused by a fault injection probe failing the context creation of a
> kernel context. However, there was no return code checking on any of
> the kernel context registration paths. So the driver kept going and
> tried to use the kernel context for the record defaults process.
>
> This would not cause any actual problems. The invalid requests would
> be rejected by GuC and ultimately the start up sequence would
> correctly wedge due to the context creation failure. But fixing the
> issue correctly rather ignoring it means we won't get CI complaining
> when the fast request patch lands and enables the extra error checking.
>
> So fix it by checking for errors and aborting as appropriate when
> creating kernel contexts. While at it, clean up some other submission
> init related failure cleanup paths. Also, rename guc_init_lrc_mapping
> to guc_init_submission as the former name hasn't been valid in a long
> time.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++-----
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  7 +-
>   3 files changed, 75 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 982364777d0c6..dd856fd92945b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1431,7 +1431,7 @@ static int guc_action_enable_usage_stats(struct intel_guc *guc)
>   	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>   }
>   
> -static void guc_init_engine_stats(struct intel_guc *guc)
> +static int guc_init_engine_stats(struct intel_guc *guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
>   	intel_wakeref_t wakeref;
> @@ -1447,6 +1447,8 @@ static void guc_init_engine_stats(struct intel_guc *guc)
>   		cancel_delayed_work_sync(&guc->timestamp.work);
>   		drm_err(&gt->i915->drm, "Failed to enable usage stats: %d!\n", ret);
>   	}
> +
> +	return ret;
>   }
>   
>   static void guc_park_engine_stats(struct intel_guc *guc)
> @@ -4108,9 +4110,11 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>   	engine->submit_request = guc_submit_request;
>   }
>   
> -static inline void guc_kernel_context_pin(struct intel_guc *guc,
> -					  struct intel_context *ce)
> +static inline int guc_kernel_context_pin(struct intel_guc *guc,
> +					 struct intel_context *ce)
>   {
> +	int ret;
> +
>   	/*
>   	 * Note: we purposefully do not check the returns below because
>   	 * the registration can only fail if a reset is just starting.
> @@ -4118,16 +4122,24 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
>   	 * isn't happening and even it did this code would be run again.
>   	 */
>   
> -	if (context_guc_id_invalid(ce))
> -		pin_guc_id(guc, ce);
> +	if (context_guc_id_invalid(ce)) {
> +		int ret = pin_guc_id(guc, ce);

Why do you need a local ret variable inside this if statement, when you 
already have a function-level one? or is it just a cut & paste error?

> +
> +		if (ret < 0)
> +			return ret;
> +	}
>   
>   	if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))
>   		guc_context_init(ce);
>   
> -	try_context_registration(ce, true);
> +	ret = try_context_registration(ce, true);
> +	if (ret)
> +		unpin_guc_id(guc, ce);
> +
> +	return ret;
>   }
>   
> -static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> +static inline int guc_init_submission(struct intel_guc *guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
>   	struct intel_engine_cs *engine;
> @@ -4154,9 +4166,17 @@ static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>   		struct intel_context *ce;
>   
>   		list_for_each_entry(ce, &engine->pinned_contexts_list,
> -				    pinned_contexts_link)
> -			guc_kernel_context_pin(guc, ce);
> +				    pinned_contexts_link) {
> +			int ret = guc_kernel_context_pin(guc, ce);
> +
> +			if (ret) {
> +				/* No point in trying to clean up as i915 will wedge on failure */
> +				return ret;
> +			}
> +		}
>   	}
> +
> +	return 0;
>   }
>   
>   static void guc_release(struct intel_engine_cs *engine)
> @@ -4400,30 +4420,57 @@ static int guc_init_global_schedule_policy(struct intel_guc *guc)
>   	return ret;
>   }
>   
> -void intel_guc_submission_enable(struct intel_guc *guc)
> +static void guc_route_semaphores(struct intel_guc *guc, bool to_guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
> +	u32 val;
>   
> -	/* Enable and route to GuC */
> -	if (GRAPHICS_VER(gt->i915) >= 12)
> -		intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES,
> -				   GUC_SEM_INTR_ROUTE_TO_GUC |
> -				   GUC_SEM_INTR_ENABLE_ALL);
> +	if (GRAPHICS_VER(gt->i915) < 12)
> +		return;
>   
> -	guc_init_lrc_mapping(guc);
> -	guc_init_engine_stats(guc);
> -	guc_init_global_schedule_policy(guc);
> +	if (to_guc)
> +		val = GUC_SEM_INTR_ROUTE_TO_GUC | GUC_SEM_INTR_ENABLE_ALL;
> +	else
> +		val = 0;
> +
> +	intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, val);
> +}
> +
> +int intel_guc_submission_enable(struct intel_guc *guc)
> +{
> +	int ret;
> +
> +	/* Semaphore interrupt enable and route to GuC */
> +	guc_route_semaphores(guc, true);
> +
> +	ret = guc_init_submission(guc);
> +	if (ret)
> +		goto fail_sem;
> +
> +	ret = guc_init_engine_stats(guc);
> +	if (ret)
> +		goto fail_sem;
> +
> +	ret = guc_init_global_schedule_policy(guc);
> +	if (ret)
> +		goto fail_stats;
> +
> +	return 0;
> +
> +fail_stats:
> +	guc_park_engine_stats(guc);

personal preference: I'd prefer an extra guc_fini_engine_stats wrapper 
just so that we're balanced with the naming, but not a blocker.

Daniele

> +fail_sem:
> +	guc_route_semaphores(guc, false);
> +	return ret;
>   }
>   
>   /* Note: By the time we're here, GuC may have already been reset */
>   void intel_guc_submission_disable(struct intel_guc *guc)
>   {
> -	struct intel_gt *gt = guc_to_gt(guc);
>   	guc_park_engine_stats(guc);
>   
> -	/* Disable and route to host */
> -	if (GRAPHICS_VER(gt->i915) >= 12)
> -		intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, 0x0);
> +	/* Semaphore interrupt disable and route to host */
> +	guc_route_semaphores(guc, false);
>   }
>   
>   static bool __guc_submission_supported(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index 5a95a9f0a8e31..c57b29cdb1a64 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -15,7 +15,7 @@ struct intel_engine_cs;
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc);
>   int intel_guc_submission_init(struct intel_guc *guc);
> -void intel_guc_submission_enable(struct intel_guc *guc);
> +int intel_guc_submission_enable(struct intel_guc *guc);
>   void intel_guc_submission_disable(struct intel_guc *guc);
>   void intel_guc_submission_fini(struct intel_guc *guc);
>   int intel_guc_preempt_work_create(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 9a8a1abf71d7f..8e338b3321a93 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -537,8 +537,11 @@ static int __uc_init_hw(struct intel_uc *uc)
>   	else
>   		intel_huc_auth(huc);
>   
> -	if (intel_uc_uses_guc_submission(uc))
> -		intel_guc_submission_enable(guc);
> +	if (intel_uc_uses_guc_submission(uc)) {
> +		ret = intel_guc_submission_enable(guc);
> +		if (ret)
> +			goto err_log_capture;
> +	}
>   
>   	if (intel_uc_uses_guc_slpc(uc)) {
>   		ret = intel_guc_slpc_enable(&guc->slpc);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Fix missing return code checks in submission init
@ 2023-01-25  1:01     ` Ceraolo Spurio, Daniele
  0 siblings, 0 replies; 15+ messages in thread
From: Ceraolo Spurio, Daniele @ 2023-01-25  1:01 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX; +Cc: DRI-Devel



On 1/11/2023 5:54 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The CI results for the 'fast request' patch set (enables error return
> codes for fire-and-forget H2G messages) hit an issue with the KMD
> sending context submission requests on an invalid context. That was
> caused by a fault injection probe failing the context creation of a
> kernel context. However, there was no return code checking on any of
> the kernel context registration paths. So the driver kept going and
> tried to use the kernel context for the record defaults process.
>
> This would not cause any actual problems. The invalid requests would
> be rejected by GuC and ultimately the start up sequence would
> correctly wedge due to the context creation failure. But fixing the
> issue correctly rather ignoring it means we won't get CI complaining
> when the fast request patch lands and enables the extra error checking.
>
> So fix it by checking for errors and aborting as appropriate when
> creating kernel contexts. While at it, clean up some other submission
> init related failure cleanup paths. Also, rename guc_init_lrc_mapping
> to guc_init_submission as the former name hasn't been valid in a long
> time.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++-----
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  7 +-
>   3 files changed, 75 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 982364777d0c6..dd856fd92945b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1431,7 +1431,7 @@ static int guc_action_enable_usage_stats(struct intel_guc *guc)
>   	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>   }
>   
> -static void guc_init_engine_stats(struct intel_guc *guc)
> +static int guc_init_engine_stats(struct intel_guc *guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
>   	intel_wakeref_t wakeref;
> @@ -1447,6 +1447,8 @@ static void guc_init_engine_stats(struct intel_guc *guc)
>   		cancel_delayed_work_sync(&guc->timestamp.work);
>   		drm_err(&gt->i915->drm, "Failed to enable usage stats: %d!\n", ret);
>   	}
> +
> +	return ret;
>   }
>   
>   static void guc_park_engine_stats(struct intel_guc *guc)
> @@ -4108,9 +4110,11 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>   	engine->submit_request = guc_submit_request;
>   }
>   
> -static inline void guc_kernel_context_pin(struct intel_guc *guc,
> -					  struct intel_context *ce)
> +static inline int guc_kernel_context_pin(struct intel_guc *guc,
> +					 struct intel_context *ce)
>   {
> +	int ret;
> +
>   	/*
>   	 * Note: we purposefully do not check the returns below because
>   	 * the registration can only fail if a reset is just starting.
> @@ -4118,16 +4122,24 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
>   	 * isn't happening and even it did this code would be run again.
>   	 */
>   
> -	if (context_guc_id_invalid(ce))
> -		pin_guc_id(guc, ce);
> +	if (context_guc_id_invalid(ce)) {
> +		int ret = pin_guc_id(guc, ce);

Why do you need a local ret variable inside this if statement, when you 
already have a function-level one? or is it just a cut & paste error?

> +
> +		if (ret < 0)
> +			return ret;
> +	}
>   
>   	if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))
>   		guc_context_init(ce);
>   
> -	try_context_registration(ce, true);
> +	ret = try_context_registration(ce, true);
> +	if (ret)
> +		unpin_guc_id(guc, ce);
> +
> +	return ret;
>   }
>   
> -static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> +static inline int guc_init_submission(struct intel_guc *guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
>   	struct intel_engine_cs *engine;
> @@ -4154,9 +4166,17 @@ static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>   		struct intel_context *ce;
>   
>   		list_for_each_entry(ce, &engine->pinned_contexts_list,
> -				    pinned_contexts_link)
> -			guc_kernel_context_pin(guc, ce);
> +				    pinned_contexts_link) {
> +			int ret = guc_kernel_context_pin(guc, ce);
> +
> +			if (ret) {
> +				/* No point in trying to clean up as i915 will wedge on failure */
> +				return ret;
> +			}
> +		}
>   	}
> +
> +	return 0;
>   }
>   
>   static void guc_release(struct intel_engine_cs *engine)
> @@ -4400,30 +4420,57 @@ static int guc_init_global_schedule_policy(struct intel_guc *guc)
>   	return ret;
>   }
>   
> -void intel_guc_submission_enable(struct intel_guc *guc)
> +static void guc_route_semaphores(struct intel_guc *guc, bool to_guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
> +	u32 val;
>   
> -	/* Enable and route to GuC */
> -	if (GRAPHICS_VER(gt->i915) >= 12)
> -		intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES,
> -				   GUC_SEM_INTR_ROUTE_TO_GUC |
> -				   GUC_SEM_INTR_ENABLE_ALL);
> +	if (GRAPHICS_VER(gt->i915) < 12)
> +		return;
>   
> -	guc_init_lrc_mapping(guc);
> -	guc_init_engine_stats(guc);
> -	guc_init_global_schedule_policy(guc);
> +	if (to_guc)
> +		val = GUC_SEM_INTR_ROUTE_TO_GUC | GUC_SEM_INTR_ENABLE_ALL;
> +	else
> +		val = 0;
> +
> +	intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, val);
> +}
> +
> +int intel_guc_submission_enable(struct intel_guc *guc)
> +{
> +	int ret;
> +
> +	/* Semaphore interrupt enable and route to GuC */
> +	guc_route_semaphores(guc, true);
> +
> +	ret = guc_init_submission(guc);
> +	if (ret)
> +		goto fail_sem;
> +
> +	ret = guc_init_engine_stats(guc);
> +	if (ret)
> +		goto fail_sem;
> +
> +	ret = guc_init_global_schedule_policy(guc);
> +	if (ret)
> +		goto fail_stats;
> +
> +	return 0;
> +
> +fail_stats:
> +	guc_park_engine_stats(guc);

personal preference: I'd prefer an extra guc_fini_engine_stats wrapper 
just so that we're balanced with the naming, but not a blocker.

Daniele

> +fail_sem:
> +	guc_route_semaphores(guc, false);
> +	return ret;
>   }
>   
>   /* Note: By the time we're here, GuC may have already been reset */
>   void intel_guc_submission_disable(struct intel_guc *guc)
>   {
> -	struct intel_gt *gt = guc_to_gt(guc);
>   	guc_park_engine_stats(guc);
>   
> -	/* Disable and route to host */
> -	if (GRAPHICS_VER(gt->i915) >= 12)
> -		intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, 0x0);
> +	/* Semaphore interrupt disable and route to host */
> +	guc_route_semaphores(guc, false);
>   }
>   
>   static bool __guc_submission_supported(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index 5a95a9f0a8e31..c57b29cdb1a64 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -15,7 +15,7 @@ struct intel_engine_cs;
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc);
>   int intel_guc_submission_init(struct intel_guc *guc);
> -void intel_guc_submission_enable(struct intel_guc *guc);
> +int intel_guc_submission_enable(struct intel_guc *guc);
>   void intel_guc_submission_disable(struct intel_guc *guc);
>   void intel_guc_submission_fini(struct intel_guc *guc);
>   int intel_guc_preempt_work_create(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 9a8a1abf71d7f..8e338b3321a93 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -537,8 +537,11 @@ static int __uc_init_hw(struct intel_uc *uc)
>   	else
>   		intel_huc_auth(huc);
>   
> -	if (intel_uc_uses_guc_submission(uc))
> -		intel_guc_submission_enable(guc);
> +	if (intel_uc_uses_guc_submission(uc)) {
> +		ret = intel_guc_submission_enable(guc);
> +		if (ret)
> +			goto err_log_capture;
> +	}
>   
>   	if (intel_uc_uses_guc_slpc(uc)) {
>   		ret = intel_guc_slpc_enable(&guc->slpc);


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 1/2] drm/i915/guc: Improve clean up of busyness stats worker
  2023-01-25  0:55   ` Ceraolo Spurio, Daniele
@ 2023-02-17 20:13     ` John Harrison
  0 siblings, 0 replies; 15+ messages in thread
From: John Harrison @ 2023-02-17 20:13 UTC (permalink / raw)
  To: Ceraolo Spurio, Daniele, Intel-GFX; +Cc: DRI-Devel

On 1/24/2023 16:55, Ceraolo Spurio, Daniele wrote:
> On 1/11/2023 5:54 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The stats worker thread management was mis-matched between
>> enable/disable call sites. Fix those up. Also, abstract the cancel
>> code into a helper function rather than replicating in multiple places.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 22 ++++++++++++-------
>>   1 file changed, 14 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index b436dd7f12e42..982364777d0c6 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -1435,19 +1435,25 @@ static void guc_init_engine_stats(struct 
>> intel_guc *guc)
>>   {
>>       struct intel_gt *gt = guc_to_gt(guc);
>>       intel_wakeref_t wakeref;
>> +    int ret;
>>         mod_delayed_work(system_highpri_wq, &guc->timestamp.work,
>>                guc->timestamp.ping_delay);
>>   -    with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref) {
>> -        int ret = guc_action_enable_usage_stats(guc);
>> +    with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
>> +        ret = guc_action_enable_usage_stats(guc);
>>   -        if (ret)
>> -            drm_err(&gt->i915->drm,
>> -                "Failed to enable usage stats: %d!\n", ret);
>> +    if (ret) {
>> +        cancel_delayed_work_sync(&guc->timestamp.work);
>
> Wouldn't it be easier to just call mod_delayed_work after the H2G if 
> ret==0, instead of having it before and cancelling if we get a failure?
>
>> +        drm_err(&gt->i915->drm, "Failed to enable usage stats: 
>> %d!\n", ret);
>>       }
>>   }
>>   +static void guc_park_engine_stats(struct intel_guc *guc)
>> +{
>> +    cancel_delayed_work_sync(&guc->timestamp.work);
>> +}
>> +
>
> Now you're asymmetric with the park/unpark, because on the park side 
> you have this wrapper, while on the unpark side you directly call 
> mod_delayed_work.
The point is that submission disable needs to also cancel the worker. 
But calling the actual busyness park function seems excessive - no need 
to do all the updating if we are about to reset the GuC or unload the 
driver.

Thinking about it more, calling this park_engine_stats is actually wrong 
given that engine stats and busyness are the same thing, so basically we 
would have two functions with the same name where one is a subset of the 
other. Is it simpler (and safe?) to just call the full busyness unpark 
from submission_disable? Or is it better to have a 
cancel/enable_busyness_worker() pair for all instances of turning the 
worker on or off?

John.


>
> Daniele
>
>>   void intel_guc_busyness_park(struct intel_gt *gt)
>>   {
>>       struct intel_guc *guc = &gt->uc.guc;
>> @@ -1460,7 +1466,7 @@ void intel_guc_busyness_park(struct intel_gt *gt)
>>        * and causes an unclaimed register access warning. Cancel the 
>> worker
>>        * synchronously here.
>>        */
>> -    cancel_delayed_work_sync(&guc->timestamp.work);
>> +    guc_park_engine_stats(guc);
>>         /*
>>        * Before parking, we should sample engine busyness stats if we 
>> need to.
>> @@ -4409,11 +4415,11 @@ void intel_guc_submission_enable(struct 
>> intel_guc *guc)
>>       guc_init_global_schedule_policy(guc);
>>   }
>>   +/* Note: By the time we're here, GuC may have already been reset */
>>   void intel_guc_submission_disable(struct intel_guc *guc)
>>   {
>>       struct intel_gt *gt = guc_to_gt(guc);
>> -
>> -    /* Note: By the time we're here, GuC may have already been reset */
>> +    guc_park_engine_stats(guc);
>>         /* Disable and route to host */
>>       if (GRAPHICS_VER(gt->i915) >= 12)
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [PATCH 2/2] drm/i915/guc: Fix missing return code checks in submission init
  2023-01-25  1:01     ` [Intel-gfx] " Ceraolo Spurio, Daniele
@ 2023-02-17 20:40       ` John Harrison
  -1 siblings, 0 replies; 15+ messages in thread
From: John Harrison @ 2023-02-17 20:40 UTC (permalink / raw)
  To: Ceraolo Spurio, Daniele, Intel-GFX; +Cc: DRI-Devel

On 1/24/2023 17:01, Ceraolo Spurio, Daniele wrote:
> On 1/11/2023 5:54 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The CI results for the 'fast request' patch set (enables error return
>> codes for fire-and-forget H2G messages) hit an issue with the KMD
>> sending context submission requests on an invalid context. That was
>> caused by a fault injection probe failing the context creation of a
>> kernel context. However, there was no return code checking on any of
>> the kernel context registration paths. So the driver kept going and
>> tried to use the kernel context for the record defaults process.
>>
>> This would not cause any actual problems. The invalid requests would
>> be rejected by GuC and ultimately the start up sequence would
>> correctly wedge due to the context creation failure. But fixing the
>> issue correctly rather ignoring it means we won't get CI complaining
>> when the fast request patch lands and enables the extra error checking.
>>
>> So fix it by checking for errors and aborting as appropriate when
>> creating kernel contexts. While at it, clean up some other submission
>> init related failure cleanup paths. Also, rename guc_init_lrc_mapping
>> to guc_init_submission as the former name hasn't been valid in a long
>> time.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++-----
>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +-
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  7 +-
>>   3 files changed, 75 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 982364777d0c6..dd856fd92945b 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -1431,7 +1431,7 @@ static int guc_action_enable_usage_stats(struct 
>> intel_guc *guc)
>>       return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>   }
>>   -static void guc_init_engine_stats(struct intel_guc *guc)
>> +static int guc_init_engine_stats(struct intel_guc *guc)
>>   {
>>       struct intel_gt *gt = guc_to_gt(guc);
>>       intel_wakeref_t wakeref;
>> @@ -1447,6 +1447,8 @@ static void guc_init_engine_stats(struct 
>> intel_guc *guc)
>>           cancel_delayed_work_sync(&guc->timestamp.work);
>>           drm_err(&gt->i915->drm, "Failed to enable usage stats: 
>> %d!\n", ret);
>>       }
>> +
>> +    return ret;
>>   }
>>     static void guc_park_engine_stats(struct intel_guc *guc)
>> @@ -4108,9 +4110,11 @@ static void guc_set_default_submission(struct 
>> intel_engine_cs *engine)
>>       engine->submit_request = guc_submit_request;
>>   }
>>   -static inline void guc_kernel_context_pin(struct intel_guc *guc,
>> -                      struct intel_context *ce)
>> +static inline int guc_kernel_context_pin(struct intel_guc *guc,
>> +                     struct intel_context *ce)
>>   {
>> +    int ret;
>> +
>>       /*
>>        * Note: we purposefully do not check the returns below because
>>        * the registration can only fail if a reset is just starting.
>> @@ -4118,16 +4122,24 @@ static inline void 
>> guc_kernel_context_pin(struct intel_guc *guc,
>>        * isn't happening and even it did this code would be run again.
>>        */
>>   -    if (context_guc_id_invalid(ce))
>> -        pin_guc_id(guc, ce);
>> +    if (context_guc_id_invalid(ce)) {
>> +        int ret = pin_guc_id(guc, ce);
>
> Why do you need a local ret variable inside this if statement, when 
> you already have a function-level one? or is it just a cut & paste error?
Yeah, copy/paste thing.

>
>> +
>> +        if (ret < 0)
>> +            return ret;
>> +    }
>>         if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))
>>           guc_context_init(ce);
>>   -    try_context_registration(ce, true);
>> +    ret = try_context_registration(ce, true);
>> +    if (ret)
>> +        unpin_guc_id(guc, ce);
>> +
>> +    return ret;
>>   }
>>   -static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>> +static inline int guc_init_submission(struct intel_guc *guc)
>>   {
>>       struct intel_gt *gt = guc_to_gt(guc);
>>       struct intel_engine_cs *engine;
>> @@ -4154,9 +4166,17 @@ static inline void guc_init_lrc_mapping(struct 
>> intel_guc *guc)
>>           struct intel_context *ce;
>>             list_for_each_entry(ce, &engine->pinned_contexts_list,
>> -                    pinned_contexts_link)
>> -            guc_kernel_context_pin(guc, ce);
>> +                    pinned_contexts_link) {
>> +            int ret = guc_kernel_context_pin(guc, ce);
>> +
>> +            if (ret) {
>> +                /* No point in trying to clean up as i915 will wedge 
>> on failure */
>> +                return ret;
>> +            }
>> +        }
>>       }
>> +
>> +    return 0;
>>   }
>>     static void guc_release(struct intel_engine_cs *engine)
>> @@ -4400,30 +4420,57 @@ static int 
>> guc_init_global_schedule_policy(struct intel_guc *guc)
>>       return ret;
>>   }
>>   -void intel_guc_submission_enable(struct intel_guc *guc)
>> +static void guc_route_semaphores(struct intel_guc *guc, bool to_guc)
>>   {
>>       struct intel_gt *gt = guc_to_gt(guc);
>> +    u32 val;
>>   -    /* Enable and route to GuC */
>> -    if (GRAPHICS_VER(gt->i915) >= 12)
>> -        intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES,
>> -                   GUC_SEM_INTR_ROUTE_TO_GUC |
>> -                   GUC_SEM_INTR_ENABLE_ALL);
>> +    if (GRAPHICS_VER(gt->i915) < 12)
>> +        return;
>>   -    guc_init_lrc_mapping(guc);
>> -    guc_init_engine_stats(guc);
>> -    guc_init_global_schedule_policy(guc);
>> +    if (to_guc)
>> +        val = GUC_SEM_INTR_ROUTE_TO_GUC | GUC_SEM_INTR_ENABLE_ALL;
>> +    else
>> +        val = 0;
>> +
>> +    intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, val);
>> +}
>> +
>> +int intel_guc_submission_enable(struct intel_guc *guc)
>> +{
>> +    int ret;
>> +
>> +    /* Semaphore interrupt enable and route to GuC */
>> +    guc_route_semaphores(guc, true);
>> +
>> +    ret = guc_init_submission(guc);
>> +    if (ret)
>> +        goto fail_sem;
>> +
>> +    ret = guc_init_engine_stats(guc);
>> +    if (ret)
>> +        goto fail_sem;
>> +
>> +    ret = guc_init_global_schedule_policy(guc);
>> +    if (ret)
>> +        goto fail_stats;
>> +
>> +    return 0;
>> +
>> +fail_stats:
>> +    guc_park_engine_stats(guc);
>
> personal preference: I'd prefer an extra guc_fini_engine_stats wrapper 
> just so that we're balanced with the naming, but not a blocker.
As per comment on previous patch, I'm thinking I'll rename 
guc_park_engine_stats() to guc_cancel_busyness_worker() (and add a 
matching enable). And I agree on the naming here. But adding yet another 
wrapper would mean having this:
     guc_fini_engine_stats() {
         guc_cancel_busyness_worker() {
             cancel_delayed_work_sync();
         }
     }

Which seems excessive. A wrapper around a wrapper around a one line 
piece of code. I guess the compiler will optimise it all out and it 
leaves room for future expansion if other things need to happen in the 
middle layers in the future.

John.

>
> Daniele
>
>> +fail_sem:
>> +    guc_route_semaphores(guc, false);
>> +    return ret;
>>   }
>>     /* Note: By the time we're here, GuC may have already been reset */
>>   void intel_guc_submission_disable(struct intel_guc *guc)
>>   {
>> -    struct intel_gt *gt = guc_to_gt(guc);
>>       guc_park_engine_stats(guc);
>>   -    /* Disable and route to host */
>> -    if (GRAPHICS_VER(gt->i915) >= 12)
>> -        intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, 
>> 0x0);
>> +    /* Semaphore interrupt disable and route to host */
>> +    guc_route_semaphores(guc, false);
>>   }
>>     static bool __guc_submission_supported(struct intel_guc *guc)
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> index 5a95a9f0a8e31..c57b29cdb1a64 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> @@ -15,7 +15,7 @@ struct intel_engine_cs;
>>     void intel_guc_submission_init_early(struct intel_guc *guc);
>>   int intel_guc_submission_init(struct intel_guc *guc);
>> -void intel_guc_submission_enable(struct intel_guc *guc);
>> +int intel_guc_submission_enable(struct intel_guc *guc);
>>   void intel_guc_submission_disable(struct intel_guc *guc);
>>   void intel_guc_submission_fini(struct intel_guc *guc);
>>   int intel_guc_preempt_work_create(struct intel_guc *guc);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> index 9a8a1abf71d7f..8e338b3321a93 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> @@ -537,8 +537,11 @@ static int __uc_init_hw(struct intel_uc *uc)
>>       else
>>           intel_huc_auth(huc);
>>   -    if (intel_uc_uses_guc_submission(uc))
>> -        intel_guc_submission_enable(guc);
>> +    if (intel_uc_uses_guc_submission(uc)) {
>> +        ret = intel_guc_submission_enable(guc);
>> +        if (ret)
>> +            goto err_log_capture;
>> +    }
>>         if (intel_uc_uses_guc_slpc(uc)) {
>>           ret = intel_guc_slpc_enable(&guc->slpc);
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/i915/guc: Fix missing return code checks in submission init
@ 2023-02-17 20:40       ` John Harrison
  0 siblings, 0 replies; 15+ messages in thread
From: John Harrison @ 2023-02-17 20:40 UTC (permalink / raw)
  To: Ceraolo Spurio, Daniele, Intel-GFX; +Cc: DRI-Devel

On 1/24/2023 17:01, Ceraolo Spurio, Daniele wrote:
> On 1/11/2023 5:54 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The CI results for the 'fast request' patch set (enables error return
>> codes for fire-and-forget H2G messages) hit an issue with the KMD
>> sending context submission requests on an invalid context. That was
>> caused by a fault injection probe failing the context creation of a
>> kernel context. However, there was no return code checking on any of
>> the kernel context registration paths. So the driver kept going and
>> tried to use the kernel context for the record defaults process.
>>
>> This would not cause any actual problems. The invalid requests would
>> be rejected by GuC and ultimately the start up sequence would
>> correctly wedge due to the context creation failure. But fixing the
>> issue correctly rather ignoring it means we won't get CI complaining
>> when the fast request patch lands and enables the extra error checking.
>>
>> So fix it by checking for errors and aborting as appropriate when
>> creating kernel contexts. While at it, clean up some other submission
>> init related failure cleanup paths. Also, rename guc_init_lrc_mapping
>> to guc_init_submission as the former name hasn't been valid in a long
>> time.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++-----
>>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  2 +-
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  7 +-
>>   3 files changed, 75 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> index 982364777d0c6..dd856fd92945b 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>> @@ -1431,7 +1431,7 @@ static int guc_action_enable_usage_stats(struct 
>> intel_guc *guc)
>>       return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>   }
>>   -static void guc_init_engine_stats(struct intel_guc *guc)
>> +static int guc_init_engine_stats(struct intel_guc *guc)
>>   {
>>       struct intel_gt *gt = guc_to_gt(guc);
>>       intel_wakeref_t wakeref;
>> @@ -1447,6 +1447,8 @@ static void guc_init_engine_stats(struct 
>> intel_guc *guc)
>>           cancel_delayed_work_sync(&guc->timestamp.work);
>>           drm_err(&gt->i915->drm, "Failed to enable usage stats: 
>> %d!\n", ret);
>>       }
>> +
>> +    return ret;
>>   }
>>     static void guc_park_engine_stats(struct intel_guc *guc)
>> @@ -4108,9 +4110,11 @@ static void guc_set_default_submission(struct 
>> intel_engine_cs *engine)
>>       engine->submit_request = guc_submit_request;
>>   }
>>   -static inline void guc_kernel_context_pin(struct intel_guc *guc,
>> -                      struct intel_context *ce)
>> +static inline int guc_kernel_context_pin(struct intel_guc *guc,
>> +                     struct intel_context *ce)
>>   {
>> +    int ret;
>> +
>>       /*
>>        * Note: we purposefully do not check the returns below because
>>        * the registration can only fail if a reset is just starting.
>> @@ -4118,16 +4122,24 @@ static inline void 
>> guc_kernel_context_pin(struct intel_guc *guc,
>>        * isn't happening and even it did this code would be run again.
>>        */
>>   -    if (context_guc_id_invalid(ce))
>> -        pin_guc_id(guc, ce);
>> +    if (context_guc_id_invalid(ce)) {
>> +        int ret = pin_guc_id(guc, ce);
>
> Why do you need a local ret variable inside this if statement, when 
> you already have a function-level one? or is it just a cut & paste error?
Yeah, copy/paste thing.

>
>> +
>> +        if (ret < 0)
>> +            return ret;
>> +    }
>>         if (!test_bit(CONTEXT_GUC_INIT, &ce->flags))
>>           guc_context_init(ce);
>>   -    try_context_registration(ce, true);
>> +    ret = try_context_registration(ce, true);
>> +    if (ret)
>> +        unpin_guc_id(guc, ce);
>> +
>> +    return ret;
>>   }
>>   -static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>> +static inline int guc_init_submission(struct intel_guc *guc)
>>   {
>>       struct intel_gt *gt = guc_to_gt(guc);
>>       struct intel_engine_cs *engine;
>> @@ -4154,9 +4166,17 @@ static inline void guc_init_lrc_mapping(struct 
>> intel_guc *guc)
>>           struct intel_context *ce;
>>             list_for_each_entry(ce, &engine->pinned_contexts_list,
>> -                    pinned_contexts_link)
>> -            guc_kernel_context_pin(guc, ce);
>> +                    pinned_contexts_link) {
>> +            int ret = guc_kernel_context_pin(guc, ce);
>> +
>> +            if (ret) {
>> +                /* No point in trying to clean up as i915 will wedge 
>> on failure */
>> +                return ret;
>> +            }
>> +        }
>>       }
>> +
>> +    return 0;
>>   }
>>     static void guc_release(struct intel_engine_cs *engine)
>> @@ -4400,30 +4420,57 @@ static int 
>> guc_init_global_schedule_policy(struct intel_guc *guc)
>>       return ret;
>>   }
>>   -void intel_guc_submission_enable(struct intel_guc *guc)
>> +static void guc_route_semaphores(struct intel_guc *guc, bool to_guc)
>>   {
>>       struct intel_gt *gt = guc_to_gt(guc);
>> +    u32 val;
>>   -    /* Enable and route to GuC */
>> -    if (GRAPHICS_VER(gt->i915) >= 12)
>> -        intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES,
>> -                   GUC_SEM_INTR_ROUTE_TO_GUC |
>> -                   GUC_SEM_INTR_ENABLE_ALL);
>> +    if (GRAPHICS_VER(gt->i915) < 12)
>> +        return;
>>   -    guc_init_lrc_mapping(guc);
>> -    guc_init_engine_stats(guc);
>> -    guc_init_global_schedule_policy(guc);
>> +    if (to_guc)
>> +        val = GUC_SEM_INTR_ROUTE_TO_GUC | GUC_SEM_INTR_ENABLE_ALL;
>> +    else
>> +        val = 0;
>> +
>> +    intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, val);
>> +}
>> +
>> +int intel_guc_submission_enable(struct intel_guc *guc)
>> +{
>> +    int ret;
>> +
>> +    /* Semaphore interrupt enable and route to GuC */
>> +    guc_route_semaphores(guc, true);
>> +
>> +    ret = guc_init_submission(guc);
>> +    if (ret)
>> +        goto fail_sem;
>> +
>> +    ret = guc_init_engine_stats(guc);
>> +    if (ret)
>> +        goto fail_sem;
>> +
>> +    ret = guc_init_global_schedule_policy(guc);
>> +    if (ret)
>> +        goto fail_stats;
>> +
>> +    return 0;
>> +
>> +fail_stats:
>> +    guc_park_engine_stats(guc);
>
> personal preference: I'd prefer an extra guc_fini_engine_stats wrapper 
> just so that we're balanced with the naming, but not a blocker.
As per comment on previous patch, I'm thinking I'll rename 
guc_park_engine_stats() to guc_cancel_busyness_worker() (and add a 
matching enable). And I agree on the naming here. But adding yet another 
wrapper would mean having this:
     guc_fini_engine_stats() {
         guc_cancel_busyness_worker() {
             cancel_delayed_work_sync();
         }
     }

Which seems excessive. A wrapper around a wrapper around a one line 
piece of code. I guess the compiler will optimise it all out and it 
leaves room for future expansion if other things need to happen in the 
middle layers in the future.

John.

>
> Daniele
>
>> +fail_sem:
>> +    guc_route_semaphores(guc, false);
>> +    return ret;
>>   }
>>     /* Note: By the time we're here, GuC may have already been reset */
>>   void intel_guc_submission_disable(struct intel_guc *guc)
>>   {
>> -    struct intel_gt *gt = guc_to_gt(guc);
>>       guc_park_engine_stats(guc);
>>   -    /* Disable and route to host */
>> -    if (GRAPHICS_VER(gt->i915) >= 12)
>> -        intel_uncore_write(gt->uncore, GEN12_GUC_SEM_INTR_ENABLES, 
>> 0x0);
>> +    /* Semaphore interrupt disable and route to host */
>> +    guc_route_semaphores(guc, false);
>>   }
>>     static bool __guc_submission_supported(struct intel_guc *guc)
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h 
>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> index 5a95a9f0a8e31..c57b29cdb1a64 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>> @@ -15,7 +15,7 @@ struct intel_engine_cs;
>>     void intel_guc_submission_init_early(struct intel_guc *guc);
>>   int intel_guc_submission_init(struct intel_guc *guc);
>> -void intel_guc_submission_enable(struct intel_guc *guc);
>> +int intel_guc_submission_enable(struct intel_guc *guc);
>>   void intel_guc_submission_disable(struct intel_guc *guc);
>>   void intel_guc_submission_fini(struct intel_guc *guc);
>>   int intel_guc_preempt_work_create(struct intel_guc *guc);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> index 9a8a1abf71d7f..8e338b3321a93 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> @@ -537,8 +537,11 @@ static int __uc_init_hw(struct intel_uc *uc)
>>       else
>>           intel_huc_auth(huc);
>>   -    if (intel_uc_uses_guc_submission(uc))
>> -        intel_guc_submission_enable(guc);
>> +    if (intel_uc_uses_guc_submission(uc)) {
>> +        ret = intel_guc_submission_enable(guc);
>> +        if (ret)
>> +            goto err_log_capture;
>> +    }
>>         if (intel_uc_uses_guc_slpc(uc)) {
>>           ret = intel_guc_slpc_enable(&guc->slpc);
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-02-17 20:41 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12  1:54 [PATCH 0/2] Clean up some GuC related failure paths John.C.Harrison
2023-01-12  1:54 ` [Intel-gfx] " John.C.Harrison
2023-01-12  1:54 ` [PATCH 1/2] drm/i915/guc: Improve clean up of busyness stats worker John.C.Harrison
2023-01-12  1:54   ` [Intel-gfx] " John.C.Harrison
2023-01-25  0:55   ` Ceraolo Spurio, Daniele
2023-02-17 20:13     ` John Harrison
2023-01-12  1:54 ` [PATCH 2/2] drm/i915/guc: Fix missing return code checks in submission init John.C.Harrison
2023-01-12  1:54   ` [Intel-gfx] " John.C.Harrison
2023-01-25  1:01   ` Ceraolo Spurio, Daniele
2023-01-25  1:01     ` [Intel-gfx] " Ceraolo Spurio, Daniele
2023-02-17 20:40     ` John Harrison
2023-02-17 20:40       ` John Harrison
2023-01-12  2:16 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Clean up some GuC related failure paths Patchwork
2023-01-12  2:33 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-01-12  4:02 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.