intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH v2 0/2] Improvements to GuC load failure handling
@ 2023-03-16 22:06 John.C.Harrison
  2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Improve GuC load error reporting John.C.Harrison
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: John.C.Harrison @ 2023-03-16 22:06 UTC (permalink / raw)
  To: Intel-GFX; +Cc: DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

Add more decoding of the GuC load failures. Also include information
about GT frequency to see if timeouts are due to a failure to boost
the clocks. Finally, increase the timeout to accommodate situations
where the clock boost does fail.

v2: Reduce timeout in release builds, add bug references, make usage
of 'success' variable a litte clearer (review feedback from Daniele).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>


John Harrison (2):
  drm/i915/guc: Improve GuC load error reporting
  drm/i915/guc: Allow for very slow GuC loading

 .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   |  17 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c     | 141 +++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h    |   4 +-
 3 files changed, 140 insertions(+), 22 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Improve GuC load error reporting
  2023-03-16 22:06 [Intel-gfx] [PATCH v2 0/2] Improvements to GuC load failure handling John.C.Harrison
@ 2023-03-16 22:06 ` John.C.Harrison
  2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading John.C.Harrison
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: John.C.Harrison @ 2023-03-16 22:06 UTC (permalink / raw)
  To: Intel-GFX; +Cc: DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

There are multiple ways in which the GuC load can fail. The driver was
reporting the status register as is, but not everyone can read the
matrix unfiltered. So add decoding of the common error cases.

Also, remove the comment about interrupt based load completion
checking being not recommended. The interrupt was removed from the GuC
firmware some time ago so it is no longer an option anyway. While at
it, also abort the timeout if a known error code is reported. No need
to keep waiting if the GuC has already given up the load.

v2: Fix mis-matched case and confusing 'success' variable (Daniele).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h   | 17 ++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c     | 95 +++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h    |  4 +-
 3 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
index 8085fb1812748..bcb1129b36102 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_errors_abi.h
@@ -21,6 +21,9 @@ enum intel_guc_load_status {
 	INTEL_GUC_LOAD_STATUS_ERROR_DEVID_BUILD_MISMATCH       = 0x02,
 	INTEL_GUC_LOAD_STATUS_GUC_PREPROD_BUILD_MISMATCH       = 0x03,
 	INTEL_GUC_LOAD_STATUS_ERROR_DEVID_INVALID_GUCTYPE      = 0x04,
+	INTEL_GUC_LOAD_STATUS_HWCONFIG_START                   = 0x05,
+	INTEL_GUC_LOAD_STATUS_HWCONFIG_DONE                    = 0x06,
+	INTEL_GUC_LOAD_STATUS_HWCONFIG_ERROR                   = 0x07,
 	INTEL_GUC_LOAD_STATUS_GDT_DONE                         = 0x10,
 	INTEL_GUC_LOAD_STATUS_IDT_DONE                         = 0x20,
 	INTEL_GUC_LOAD_STATUS_LAPIC_DONE                       = 0x30,
@@ -38,4 +41,18 @@ enum intel_guc_load_status {
 	INTEL_GUC_LOAD_STATUS_READY                            = 0xF0,
 };
 
+enum intel_bootrom_load_status {
+	INTEL_BOOTROM_STATUS_NO_KEY_FOUND                 = 0x13,
+	INTEL_BOOTROM_STATUS_AES_PROD_KEY_FOUND           = 0x1A,
+	INTEL_BOOTROM_STATUS_RSA_FAILED                   = 0x50,
+	INTEL_BOOTROM_STATUS_PAVPC_FAILED                 = 0x73,
+	INTEL_BOOTROM_STATUS_WOPCM_FAILED                 = 0x74,
+	INTEL_BOOTROM_STATUS_LOADLOC_FAILED               = 0x75,
+	INTEL_BOOTROM_STATUS_JUMP_PASSED                  = 0x76,
+	INTEL_BOOTROM_STATUS_JUMP_FAILED                  = 0x77,
+	INTEL_BOOTROM_STATUS_RC6CTXCONFIG_FAILED          = 0x79,
+	INTEL_BOOTROM_STATUS_MPUMAP_INCORRECT             = 0x7A,
+	INTEL_BOOTROM_STATUS_EXCEPTION                    = 0x7E,
+};
+
 #endif /* _ABI_GUC_ERRORS_ABI_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
index 69133420c78b2..0b49d84a8a9c2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
@@ -88,31 +88,64 @@ static int guc_xfer_rsa(struct intel_uc_fw *guc_fw,
 /*
  * Read the GuC status register (GUC_STATUS) and store it in the
  * specified location; then return a boolean indicating whether
- * the value matches either of two values representing completion
- * of the GuC boot process.
+ * the value matches either completion or a known failure code.
  *
  * This is used for polling the GuC status in a wait_for()
  * loop below.
  */
-static inline bool guc_ready(struct intel_uncore *uncore, u32 *status)
+static inline bool guc_load_done(struct intel_uncore *uncore, u32 *status, bool *success)
 {
 	u32 val = intel_uncore_read(uncore, GUC_STATUS);
 	u32 uk_val = REG_FIELD_GET(GS_UKERNEL_MASK, val);
+	u32 br_val = REG_FIELD_GET(GS_BOOTROM_MASK, val);
 
 	*status = val;
-	return uk_val == INTEL_GUC_LOAD_STATUS_READY;
+	switch (uk_val) {
+	case INTEL_GUC_LOAD_STATUS_READY:
+		*success = true;
+		return true;
+
+	case INTEL_GUC_LOAD_STATUS_ERROR_DEVID_BUILD_MISMATCH:
+	case INTEL_GUC_LOAD_STATUS_GUC_PREPROD_BUILD_MISMATCH:
+	case INTEL_GUC_LOAD_STATUS_ERROR_DEVID_INVALID_GUCTYPE:
+	case INTEL_GUC_LOAD_STATUS_HWCONFIG_ERROR:
+	case INTEL_GUC_LOAD_STATUS_DPC_ERROR:
+	case INTEL_GUC_LOAD_STATUS_EXCEPTION:
+	case INTEL_GUC_LOAD_STATUS_INIT_DATA_INVALID:
+	case INTEL_GUC_LOAD_STATUS_MPU_DATA_INVALID:
+	case INTEL_GUC_LOAD_STATUS_INIT_MMIO_SAVE_RESTORE_INVALID:
+		*success = false;
+		return true;
+	}
+
+	switch (br_val) {
+	case INTEL_BOOTROM_STATUS_NO_KEY_FOUND:
+	case INTEL_BOOTROM_STATUS_RSA_FAILED:
+	case INTEL_BOOTROM_STATUS_PAVPC_FAILED:
+	case INTEL_BOOTROM_STATUS_WOPCM_FAILED:
+	case INTEL_BOOTROM_STATUS_LOADLOC_FAILED:
+	case INTEL_BOOTROM_STATUS_JUMP_FAILED:
+	case INTEL_BOOTROM_STATUS_RC6CTXCONFIG_FAILED:
+	case INTEL_BOOTROM_STATUS_MPUMAP_INCORRECT:
+	case INTEL_BOOTROM_STATUS_EXCEPTION:
+		*success = false;
+		return true;
+	}
+
+	return false;
 }
 
 static int guc_wait_ucode(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	struct intel_uncore *uncore = gt->uncore;
+	bool success;
 	u32 status;
 	int ret;
 
 	/*
 	 * Wait for the GuC to start up.
-	 * NB: Docs recommend not using the interrupt for completion.
+	 *
 	 * Measurements indicate this should take no more than 20ms
 	 * (assuming the GT clock is at maximum frequency). So, a
 	 * timeout here indicates that the GuC has failed and is unusable.
@@ -127,28 +160,52 @@ static int guc_wait_ucode(struct intel_guc *guc)
 	 * 200ms. Even at slowest clock, this should be sufficient. And
 	 * in the working case, a larger timeout makes no difference.
 	 */
-	ret = wait_for(guc_ready(uncore, &status), 200);
-	if (ret) {
-		guc_info(guc, "load failed: status = 0x%08X\n", status);
-		guc_info(guc, "load failed: status: Reset = %d, "
-			"BootROM = 0x%02X, UKernel = 0x%02X, "
-			"MIA = 0x%02X, Auth = 0x%02X\n",
-			REG_FIELD_GET(GS_MIA_IN_RESET, status),
-			REG_FIELD_GET(GS_BOOTROM_MASK, status),
-			REG_FIELD_GET(GS_UKERNEL_MASK, status),
-			REG_FIELD_GET(GS_MIA_MASK, status),
-			REG_FIELD_GET(GS_AUTH_STATUS_MASK, status));
-
-		if ((status & GS_BOOTROM_MASK) == GS_BOOTROM_RSA_FAILED) {
+	ret = wait_for(guc_load_done(uncore, &status, &success), 200);
+	if (ret || !success) {
+		u32 ukernel = REG_FIELD_GET(GS_UKERNEL_MASK, status);
+		u32 bootrom = REG_FIELD_GET(GS_BOOTROM_MASK, status);
+
+		guc_info(guc, "load failed: status = 0x%08X, ret = %d\n", status, ret);
+		guc_info(guc, "load failed: status: Reset = %d, BootROM = 0x%02X, UKernel = 0x%02X, MIA = 0x%02X, Auth = 0x%02X\n",
+			 REG_FIELD_GET(GS_MIA_IN_RESET, status),
+			 bootrom, ukernel,
+			 REG_FIELD_GET(GS_MIA_MASK, status),
+			 REG_FIELD_GET(GS_AUTH_STATUS_MASK, status));
+
+		switch (bootrom) {
+		case INTEL_BOOTROM_STATUS_NO_KEY_FOUND:
+			guc_info(guc, "invalid key requested, header = 0x%08X\n",
+				 intel_uncore_read(uncore, GUC_HEADER_INFO));
+			ret = -ENOEXEC;
+			break;
+
+		case INTEL_BOOTROM_STATUS_RSA_FAILED:
 			guc_info(guc, "firmware signature verification failed\n");
 			ret = -ENOEXEC;
+			break;
 		}
 
-		if (REG_FIELD_GET(GS_UKERNEL_MASK, status) == INTEL_GUC_LOAD_STATUS_EXCEPTION) {
+		switch (ukernel) {
+		case INTEL_GUC_LOAD_STATUS_EXCEPTION:
 			guc_info(guc, "firmware exception. EIP: %#x\n",
 				 intel_uncore_read(uncore, SOFT_SCRATCH(13)));
 			ret = -ENXIO;
+			break;
+
+		case INTEL_GUC_LOAD_STATUS_INIT_MMIO_SAVE_RESTORE_INVALID:
+			guc_info(guc, "illegal register in save/restore workaround list\n");
+			ret = -EPERM;
+			break;
+
+		case INTEL_GUC_LOAD_STATUS_HWCONFIG_START:
+			guc_info(guc, "still extracting hwconfig table.\n");
+			ret = -ETIMEDOUT;
+			break;
 		}
+
+		/* Uncommon/unexpected error, see earlier status code print for details */
+		if (ret == 0)
+			ret = -ENXIO;
 	}
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h
index 9915de32e894e..3fd7988375020 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h
@@ -18,8 +18,6 @@
 #define   GS_MIA_IN_RESET		  (0x01 << GS_RESET_SHIFT)
 #define   GS_BOOTROM_SHIFT		1
 #define   GS_BOOTROM_MASK		  (0x7F << GS_BOOTROM_SHIFT)
-#define   GS_BOOTROM_RSA_FAILED		  (0x50 << GS_BOOTROM_SHIFT)
-#define   GS_BOOTROM_JUMP_PASSED	  (0x76 << GS_BOOTROM_SHIFT)
 #define   GS_UKERNEL_SHIFT		8
 #define   GS_UKERNEL_MASK		  (0xFF << GS_UKERNEL_SHIFT)
 #define   GS_MIA_SHIFT			16
@@ -32,6 +30,8 @@
 #define   GS_AUTH_STATUS_BAD		  (0x01 << GS_AUTH_STATUS_SHIFT)
 #define   GS_AUTH_STATUS_GOOD		  (0x02 << GS_AUTH_STATUS_SHIFT)
 
+#define GUC_HEADER_INFO			_MMIO(0xc014)
+
 #define SOFT_SCRATCH(n)			_MMIO(0xc180 + (n) * 4)
 #define SOFT_SCRATCH_COUNT		16
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading
  2023-03-16 22:06 [Intel-gfx] [PATCH v2 0/2] Improvements to GuC load failure handling John.C.Harrison
  2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Improve GuC load error reporting John.C.Harrison
@ 2023-03-16 22:06 ` John.C.Harrison
  2023-03-17 22:33   ` Dixit, Ashutosh
  2023-03-22 21:14   ` Ceraolo Spurio, Daniele
  2023-03-23  2:40 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Improvements to GuC load failure handling (rev3) Patchwork
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 10+ messages in thread
From: John.C.Harrison @ 2023-03-16 22:06 UTC (permalink / raw)
  To: Intel-GFX; +Cc: DRI-Devel

From: John Harrison <John.C.Harrison@Intel.com>

A failure to load the GuC is occasionally observed where the GuC log
actually showed that the GuC had loaded just fine. The implication
being that the load just took ever so slightly longer than the 200ms
timeout. Given that the actual time should be tens of milliseconds at
the slowest, this should never happen. So far the issue has generally
been caused by a bad IFWI resulting in low frequencies during boot
(depsite the KMD requesting max frequency). However, the issue seems
to happen more often than one would like.

So a) increase the timeout so that the user still gets a working
system even in the case of slow load. And b) report the frequency
during the load to see if that is the case of the slow down.

v2: Reduce timeout in non-debug builds, add references (Daniele)

References: https://gitlab.freedesktop.org/drm/intel/-/issues/7931
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8060
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8083
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8136
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8137
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 50 +++++++++++++++++++++--
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
index 0b49d84a8a9c2..6fda3aec5c66a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
@@ -12,6 +12,7 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_mcr.h"
 #include "gt/intel_gt_regs.h"
+#include "gt/intel_rps.h"
 #include "intel_guc_fw.h"
 #include "intel_guc_print.h"
 #include "i915_drv.h"
@@ -135,13 +136,29 @@ static inline bool guc_load_done(struct intel_uncore *uncore, u32 *status, bool
 	return false;
 }
 
+/*
+ * Use a longer timeout for debug builds so that problems can be detected
+ * and analysed. But a shorter timeout for releases so that user's don't
+ * wait forever to find out there is a problem. Note that the only reason
+ * an end user should hit the timeout is in case of extreme thermal throttling.
+ * And a system that is that hot during boot is probably dead anyway!
+ */
+#if defined(CONFIG_DRM_I915_DEBUG_GEM)
+#define GUC_LOAD_RETRY_LIMIT	20
+#else
+#define GUC_LOAD_RETRY_LIMIT	3
+#endif
+
 static int guc_wait_ucode(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
 	struct intel_uncore *uncore = gt->uncore;
+	ktime_t before, after, delta;
 	bool success;
 	u32 status;
-	int ret;
+	int ret, count;
+	u64 delta_ms;
+	u32 before_freq;
 
 	/*
 	 * Wait for the GuC to start up.
@@ -159,13 +176,32 @@ static int guc_wait_ucode(struct intel_guc *guc)
 	 * issues to be resolved. In the meantime bump the timeout to
 	 * 200ms. Even at slowest clock, this should be sufficient. And
 	 * in the working case, a larger timeout makes no difference.
+	 *
+	 * IFWI updates have also been seen to cause sporadic failures due to
+	 * the requested frequency not being granted and thus the firmware
+	 * load is attempted at minimum frequency. That can lead to load times
+	 * in the seconds range. However, there is a limit on how long an
+	 * individual wait_for() can wait. So wrap it in a loop.
 	 */
-	ret = wait_for(guc_load_done(uncore, &status, &success), 200);
+	before_freq = intel_rps_read_actual_frequency(&uncore->gt->rps);
+	before = ktime_get();
+	for (count = 0; count < GUC_LOAD_RETRY_LIMIT; count++) {
+		ret = wait_for(guc_load_done(uncore, &status, &success), 1000);
+		if (!ret || !success)
+			break;
+
+		guc_dbg(guc, "load still in progress, count = %d, freq = %dMHz\n",
+			count, intel_rps_read_actual_frequency(&uncore->gt->rps));
+	}
+	after = ktime_get();
+	delta = ktime_sub(after, before);
+	delta_ms = ktime_to_ms(delta);
 	if (ret || !success) {
 		u32 ukernel = REG_FIELD_GET(GS_UKERNEL_MASK, status);
 		u32 bootrom = REG_FIELD_GET(GS_BOOTROM_MASK, status);
 
-		guc_info(guc, "load failed: status = 0x%08X, ret = %d\n", status, ret);
+		guc_info(guc, "load failed: status = 0x%08X, time = %lldms, freq = %dMHz, ret = %d\n",
+			 status, delta_ms, intel_rps_read_actual_frequency(&uncore->gt->rps), ret);
 		guc_info(guc, "load failed: status: Reset = %d, BootROM = 0x%02X, UKernel = 0x%02X, MIA = 0x%02X, Auth = 0x%02X\n",
 			 REG_FIELD_GET(GS_MIA_IN_RESET, status),
 			 bootrom, ukernel,
@@ -206,6 +242,14 @@ static int guc_wait_ucode(struct intel_guc *guc)
 		/* Uncommon/unexpected error, see earlier status code print for details */
 		if (ret == 0)
 			ret = -ENXIO;
+	} else if (delta_ms > 200) {
+		guc_warn(guc, "excessive init time: %lldms! [freq = %dMHz, before = %dMHz, status = 0x%08X, count = %d, ret = %d]\n",
+			 delta_ms, intel_rps_read_actual_frequency(&uncore->gt->rps),
+			 before_freq, status, count, ret);
+	} else {
+		guc_dbg(guc, "init took %lldms, freq = %dMHz, before = %dMHz, status = 0x%08X, count = %d, ret = %d\n",
+			delta_ms, intel_rps_read_actual_frequency(&uncore->gt->rps),
+			before_freq, status, count, ret);
 	}
 
 	return ret;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading
  2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading John.C.Harrison
@ 2023-03-17 22:33   ` Dixit, Ashutosh
  2023-03-22 21:14   ` Ceraolo Spurio, Daniele
  1 sibling, 0 replies; 10+ messages in thread
From: Dixit, Ashutosh @ 2023-03-17 22:33 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX, DRI-Devel

On Thu, 16 Mar 2023 15:06:32 -0700, John.C.Harrison@Intel.com wrote:
>
> From: John Harrison <John.C.Harrison@Intel.com>
>
> A failure to load the GuC is occasionally observed where the GuC log
> actually showed that the GuC had loaded just fine. The implication
> being that the load just took ever so slightly longer than the 200ms
> timeout. Given that the actual time should be tens of milliseconds at
> the slowest, this should never happen. So far the issue has generally
> been caused by a bad IFWI resulting in low frequencies during boot
> (depsite the KMD requesting max frequency). However, the issue seems
> to happen more often than one would like.
>
> So a) increase the timeout so that the user still gets a working
> system even in the case of slow load. And b) report the frequency
> during the load to see if that is the case of the slow down.
>
> v2: Reduce timeout in non-debug builds, add references (Daniele)
>
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7931
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8060
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8083
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8136
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8137
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Tested this on ATSM and saw the interrmittent GuC FW load timeouts
disappear:

Tested-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading
  2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading John.C.Harrison
  2023-03-17 22:33   ` Dixit, Ashutosh
@ 2023-03-22 21:14   ` Ceraolo Spurio, Daniele
  1 sibling, 0 replies; 10+ messages in thread
From: Ceraolo Spurio, Daniele @ 2023-03-22 21:14 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX; +Cc: DRI-Devel



On 3/16/2023 3:06 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> A failure to load the GuC is occasionally observed where the GuC log
> actually showed that the GuC had loaded just fine. The implication
> being that the load just took ever so slightly longer than the 200ms
> timeout. Given that the actual time should be tens of milliseconds at
> the slowest, this should never happen. So far the issue has generally
> been caused by a bad IFWI resulting in low frequencies during boot
> (depsite the KMD requesting max frequency). However, the issue seems
> to happen more often than one would like.
>
> So a) increase the timeout so that the user still gets a working
> system even in the case of slow load. And b) report the frequency
> during the load to see if that is the case of the slow down.
>
> v2: Reduce timeout in non-debug builds, add references (Daniele)
>
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7931
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8060
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8083
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8136
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8137
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Daniele

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 50 +++++++++++++++++++++--
>   1 file changed, 47 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> index 0b49d84a8a9c2..6fda3aec5c66a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c
> @@ -12,6 +12,7 @@
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_mcr.h"
>   #include "gt/intel_gt_regs.h"
> +#include "gt/intel_rps.h"
>   #include "intel_guc_fw.h"
>   #include "intel_guc_print.h"
>   #include "i915_drv.h"
> @@ -135,13 +136,29 @@ static inline bool guc_load_done(struct intel_uncore *uncore, u32 *status, bool
>   	return false;
>   }
>   
> +/*
> + * Use a longer timeout for debug builds so that problems can be detected
> + * and analysed. But a shorter timeout for releases so that user's don't
> + * wait forever to find out there is a problem. Note that the only reason
> + * an end user should hit the timeout is in case of extreme thermal throttling.
> + * And a system that is that hot during boot is probably dead anyway!
> + */
> +#if defined(CONFIG_DRM_I915_DEBUG_GEM)
> +#define GUC_LOAD_RETRY_LIMIT	20
> +#else
> +#define GUC_LOAD_RETRY_LIMIT	3
> +#endif
> +
>   static int guc_wait_ucode(struct intel_guc *guc)
>   {
>   	struct intel_gt *gt = guc_to_gt(guc);
>   	struct intel_uncore *uncore = gt->uncore;
> +	ktime_t before, after, delta;
>   	bool success;
>   	u32 status;
> -	int ret;
> +	int ret, count;
> +	u64 delta_ms;
> +	u32 before_freq;
>   
>   	/*
>   	 * Wait for the GuC to start up.
> @@ -159,13 +176,32 @@ static int guc_wait_ucode(struct intel_guc *guc)
>   	 * issues to be resolved. In the meantime bump the timeout to
>   	 * 200ms. Even at slowest clock, this should be sufficient. And
>   	 * in the working case, a larger timeout makes no difference.
> +	 *
> +	 * IFWI updates have also been seen to cause sporadic failures due to
> +	 * the requested frequency not being granted and thus the firmware
> +	 * load is attempted at minimum frequency. That can lead to load times
> +	 * in the seconds range. However, there is a limit on how long an
> +	 * individual wait_for() can wait. So wrap it in a loop.
>   	 */
> -	ret = wait_for(guc_load_done(uncore, &status, &success), 200);
> +	before_freq = intel_rps_read_actual_frequency(&uncore->gt->rps);
> +	before = ktime_get();
> +	for (count = 0; count < GUC_LOAD_RETRY_LIMIT; count++) {
> +		ret = wait_for(guc_load_done(uncore, &status, &success), 1000);
> +		if (!ret || !success)
> +			break;
> +
> +		guc_dbg(guc, "load still in progress, count = %d, freq = %dMHz\n",
> +			count, intel_rps_read_actual_frequency(&uncore->gt->rps));
> +	}
> +	after = ktime_get();
> +	delta = ktime_sub(after, before);
> +	delta_ms = ktime_to_ms(delta);
>   	if (ret || !success) {
>   		u32 ukernel = REG_FIELD_GET(GS_UKERNEL_MASK, status);
>   		u32 bootrom = REG_FIELD_GET(GS_BOOTROM_MASK, status);
>   
> -		guc_info(guc, "load failed: status = 0x%08X, ret = %d\n", status, ret);
> +		guc_info(guc, "load failed: status = 0x%08X, time = %lldms, freq = %dMHz, ret = %d\n",
> +			 status, delta_ms, intel_rps_read_actual_frequency(&uncore->gt->rps), ret);
>   		guc_info(guc, "load failed: status: Reset = %d, BootROM = 0x%02X, UKernel = 0x%02X, MIA = 0x%02X, Auth = 0x%02X\n",
>   			 REG_FIELD_GET(GS_MIA_IN_RESET, status),
>   			 bootrom, ukernel,
> @@ -206,6 +242,14 @@ static int guc_wait_ucode(struct intel_guc *guc)
>   		/* Uncommon/unexpected error, see earlier status code print for details */
>   		if (ret == 0)
>   			ret = -ENXIO;
> +	} else if (delta_ms > 200) {
> +		guc_warn(guc, "excessive init time: %lldms! [freq = %dMHz, before = %dMHz, status = 0x%08X, count = %d, ret = %d]\n",
> +			 delta_ms, intel_rps_read_actual_frequency(&uncore->gt->rps),
> +			 before_freq, status, count, ret);
> +	} else {
> +		guc_dbg(guc, "init took %lldms, freq = %dMHz, before = %dMHz, status = 0x%08X, count = %d, ret = %d\n",
> +			delta_ms, intel_rps_read_actual_frequency(&uncore->gt->rps),
> +			before_freq, status, count, ret);
>   	}
>   
>   	return ret;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Improvements to GuC load failure handling (rev3)
  2023-03-16 22:06 [Intel-gfx] [PATCH v2 0/2] Improvements to GuC load failure handling John.C.Harrison
  2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Improve GuC load error reporting John.C.Harrison
  2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading John.C.Harrison
@ 2023-03-23  2:40 ` Patchwork
  2023-03-23 22:30   ` John Harrison
  2023-03-23  2:40 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
  2023-03-23  2:52 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  4 siblings, 1 reply; 10+ messages in thread
From: Patchwork @ 2023-03-23  2:40 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: Improvements to GuC load failure handling (rev3)
URL   : https://patchwork.freedesktop.org/series/114168/
State : warning

== Summary ==

Error: dim checkpatch failed
b4df7f16c846 drm/i915/guc: Improve GuC load error reporting
2be0fcf3087c drm/i915/guc: Allow for very slow GuC loading
-:21: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
#21: 
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7931

-:22: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
#22: 
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8060

-:23: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
#23: 
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8083

-:24: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
#24: 
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8136

-:25: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
#25: 
References: https://gitlab.freedesktop.org/drm/intel/-/issues/8137

total: 0 errors, 5 warnings, 0 checks, 85 lines checked



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Improvements to GuC load failure handling (rev3)
  2023-03-16 22:06 [Intel-gfx] [PATCH v2 0/2] Improvements to GuC load failure handling John.C.Harrison
                   ` (2 preceding siblings ...)
  2023-03-23  2:40 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Improvements to GuC load failure handling (rev3) Patchwork
@ 2023-03-23  2:40 ` Patchwork
  2023-03-23  2:52 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  4 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2023-03-23  2:40 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: Improvements to GuC load failure handling (rev3)
URL   : https://patchwork.freedesktop.org/series/114168/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for Improvements to GuC load failure handling (rev3)
  2023-03-16 22:06 [Intel-gfx] [PATCH v2 0/2] Improvements to GuC load failure handling John.C.Harrison
                   ` (3 preceding siblings ...)
  2023-03-23  2:40 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2023-03-23  2:52 ` Patchwork
  2023-03-23 22:29   ` John Harrison
  4 siblings, 1 reply; 10+ messages in thread
From: Patchwork @ 2023-03-23  2:52 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 7995 bytes --]

== Series Details ==

Series: Improvements to GuC load failure handling (rev3)
URL   : https://patchwork.freedesktop.org/series/114168/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_12902 -> Patchwork_114168v3
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_114168v3 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_114168v3, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/index.html

Participating hosts (37 -> 35)
------------------------------

  Missing    (2): fi-tgl-1115g4 fi-snb-2520m 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_114168v3:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1:
    - fi-hsw-4770:        [PASS][1] -> [ABORT][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/fi-hsw-4770/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/fi-hsw-4770/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1.html

  
Known issues
------------

  Here are the changes found in Patchwork_114168v3 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-cfl-8109u:       [PASS][3] -> [DMESG-FAIL][4] ([i915#5334])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/fi-cfl-8109u/igt@i915_selftest@live@gt_heartbeat.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/fi-cfl-8109u/igt@i915_selftest@live@gt_heartbeat.html

  * igt@i915_selftest@live@hangcheck:
    - fi-skl-guc:         [PASS][5] -> [DMESG-WARN][6] ([i915#8073])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/fi-skl-guc/igt@i915_selftest@live@hangcheck.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/fi-skl-guc/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@migrate:
    - bat-atsm-1:         [PASS][7] -> [DMESG-FAIL][8] ([i915#7699])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-atsm-1/igt@i915_selftest@live@migrate.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-atsm-1/igt@i915_selftest@live@migrate.html

  * igt@i915_selftest@live@slpc:
    - bat-adln-1:         NOTRUN -> [DMESG-FAIL][9] ([i915#6997])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-adln-1/igt@i915_selftest@live@slpc.html

  * igt@kms_chamelium_hpd@common-hpd-after-suspend:
    - bat-rpls-2:         NOTRUN -> [SKIP][10] ([i915#7828])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-2/igt@kms_chamelium_hpd@common-hpd-after-suspend.html
    - bat-adln-1:         NOTRUN -> [SKIP][11] ([i915#7828])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-adln-1/igt@kms_chamelium_hpd@common-hpd-after-suspend.html
    - bat-rpls-1:         NOTRUN -> [SKIP][12] ([i915#7828])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-1/igt@kms_chamelium_hpd@common-hpd-after-suspend.html

  * igt@kms_pipe_crc_basic@suspend-read-crc:
    - bat-rpls-1:         NOTRUN -> [SKIP][13] ([i915#1845])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-1/igt@kms_pipe_crc_basic@suspend-read-crc.html
    - bat-rpls-2:         NOTRUN -> [SKIP][14] ([i915#1845])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-2/igt@kms_pipe_crc_basic@suspend-read-crc.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s0@smem:
    - bat-rpls-2:         [ABORT][15] -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-rpls-2/igt@gem_exec_suspend@basic-s0@smem.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-2/igt@gem_exec_suspend@basic-s0@smem.html

  * igt@gem_exec_suspend@basic-s3@lmem0:
    - bat-dg2-9:          [FAIL][17] ([fdo#103375]) -> [PASS][18] +3 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-dg2-9/igt@gem_exec_suspend@basic-s3@lmem0.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-dg2-9/igt@gem_exec_suspend@basic-s3@lmem0.html

  * igt@i915_pm_rps@basic-api:
    - bat-dg2-11:         [FAIL][19] -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-dg2-11/igt@i915_pm_rps@basic-api.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-dg2-11/igt@i915_pm_rps@basic-api.html

  * igt@i915_selftest@live@reset:
    - bat-rpls-1:         [ABORT][21] ([i915#4983] / [i915#7981]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-rpls-1/igt@i915_selftest@live@reset.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-1/igt@i915_selftest@live@reset.html

  * igt@i915_selftest@live@workarounds:
    - bat-adln-1:         [INCOMPLETE][23] ([i915#4983] / [i915#7467] / [i915#7981]) -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-adln-1/igt@i915_selftest@live@workarounds.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-adln-1/igt@i915_selftest@live@workarounds.html

  * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-dp-3:
    - bat-dg2-9:          [FAIL][25] ([fdo#103375] / [i915#7932]) -> [PASS][26]
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-dg2-9/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-dp-3.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-dg2-9/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-dp-3.html

  
#### Warnings ####

  * igt@i915_selftest@live@slpc:
    - bat-rpls-2:         [DMESG-FAIL][27] ([i915#6997] / [i915#7913]) -> [DMESG-FAIL][28] ([i915#6367] / [i915#7913] / [i915#7996])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-rpls-2/igt@i915_selftest@live@slpc.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-2/igt@i915_selftest@live@slpc.html

  
  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#4983]: https://gitlab.freedesktop.org/drm/intel/issues/4983
  [i915#5334]: https://gitlab.freedesktop.org/drm/intel/issues/5334
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6997]: https://gitlab.freedesktop.org/drm/intel/issues/6997
  [i915#7467]: https://gitlab.freedesktop.org/drm/intel/issues/7467
  [i915#7699]: https://gitlab.freedesktop.org/drm/intel/issues/7699
  [i915#7828]: https://gitlab.freedesktop.org/drm/intel/issues/7828
  [i915#7913]: https://gitlab.freedesktop.org/drm/intel/issues/7913
  [i915#7932]: https://gitlab.freedesktop.org/drm/intel/issues/7932
  [i915#7981]: https://gitlab.freedesktop.org/drm/intel/issues/7981
  [i915#7996]: https://gitlab.freedesktop.org/drm/intel/issues/7996
  [i915#8073]: https://gitlab.freedesktop.org/drm/intel/issues/8073


Build changes
-------------

  * Linux: CI_DRM_12902 -> Patchwork_114168v3

  CI-20190529: 20190529
  CI_DRM_12902: c8333f1c10ebbdaad7a642cc66041b4f90bc81be @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7211: c0cc1de7b2f4041ca68960362aa55f881d416bac @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_114168v3: c8333f1c10ebbdaad7a642cc66041b4f90bc81be @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

247aa4568644 drm/i915/guc: Allow for very slow GuC loading
0c9bcdd1e7e7 drm/i915/guc: Improve GuC load error reporting

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/index.html

[-- Attachment #2: Type: text/html, Size: 9537 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.BAT: failure for Improvements to GuC load failure handling (rev3)
  2023-03-23  2:52 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
@ 2023-03-23 22:29   ` John Harrison
  0 siblings, 0 replies; 10+ messages in thread
From: John Harrison @ 2023-03-23 22:29 UTC (permalink / raw)
  To: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 8930 bytes --]

On 3/22/2023 19:52, Patchwork wrote:
> Project List - Patchwork *Patch Details*
> *Series:* 	Improvements to GuC load failure handling (rev3)
> *URL:* 	https://patchwork.freedesktop.org/series/114168/
> *State:* 	failure
> *Details:* 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/index.html
>
>
>   CI Bug Log - changes from CI_DRM_12902 -> Patchwork_114168v3
>
>
>     Summary
>
> *FAILURE*
>
> Serious unknown changes coming with Patchwork_114168v3 absolutely need 
> to be
> verified manually.
>
> If you think the reported changes have nothing to do with the changes
> introduced in Patchwork_114168v3, please notify your bug team to allow 
> them
> to document this new failure mode, which will reduce false positives 
> in CI.
>
> External URL: 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/index.html
>
>
>     Participating hosts (37 -> 35)
>
> Missing (2): fi-tgl-1115g4 fi-snb-2520m
>
>
>     Possible new issues
>
> Here are the unknown changes that may have been introduced in 
> Patchwork_114168v3:
>
>
>       IGT changes
>
>
>         Possible regressions
>
>   * igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1:
>       o fi-hsw-4770: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/fi-hsw-4770/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1.html>
>         -> ABORT
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/fi-hsw-4770/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-vga-1.html>
>
>
>     Known issues
>
> Here are the changes found in Patchwork_114168v3 that come from known 
> issues:
>
>
>       IGT changes
>
>
>         Issues hit
>
>  *
>
>     igt@i915_selftest@live@gt_heartbeat:
>
>       o fi-cfl-8109u: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/fi-cfl-8109u/igt@i915_selftest@live@gt_heartbeat.html>
>         -> DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/fi-cfl-8109u/igt@i915_selftest@live@gt_heartbeat.html>
>         (i915#5334 <https://gitlab.freedesktop.org/drm/intel/issues/5334>)
>  *
>
>     igt@i915_selftest@live@hangcheck:
>
>       o fi-skl-guc: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/fi-skl-guc/igt@i915_selftest@live@hangcheck.html>
>         -> DMESG-WARN
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/fi-skl-guc/igt@i915_selftest@live@hangcheck.html>
>         (i915#8073 <https://gitlab.freedesktop.org/drm/intel/issues/8073>)
>  *
>
>     igt@i915_selftest@live@migrate:
>
>       o bat-atsm-1: PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-atsm-1/igt@i915_selftest@live@migrate.html>
>         -> DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-atsm-1/igt@i915_selftest@live@migrate.html>
>         (i915#7699 <https://gitlab.freedesktop.org/drm/intel/issues/7699>)
>  *
>
>     igt@i915_selftest@live@slpc:
>
>       o bat-adln-1: NOTRUN -> DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-adln-1/igt@i915_selftest@live@slpc.html>
>         (i915#6997 <https://gitlab.freedesktop.org/drm/intel/issues/6997>)
>  *
>
>     igt@kms_chamelium_hpd@common-hpd-after-suspend:
>
>      o
>
>         bat-rpls-2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-2/igt@kms_chamelium_hpd@common-hpd-after-suspend.html>
>         (i915#7828 <https://gitlab.freedesktop.org/drm/intel/issues/7828>)
>
>      o
>
>         bat-adln-1: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-adln-1/igt@kms_chamelium_hpd@common-hpd-after-suspend.html>
>         (i915#7828 <https://gitlab.freedesktop.org/drm/intel/issues/7828>)
>
>      o
>
>         bat-rpls-1: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-1/igt@kms_chamelium_hpd@common-hpd-after-suspend.html>
>         (i915#7828 <https://gitlab.freedesktop.org/drm/intel/issues/7828>)
>
>  *
>
>     igt@kms_pipe_crc_basic@suspend-read-crc:
>
>      o
>
>         bat-rpls-1: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-1/igt@kms_pipe_crc_basic@suspend-read-crc.html>
>         (i915#1845 <https://gitlab.freedesktop.org/drm/intel/issues/1845>)
>
>      o
>
>         bat-rpls-2: NOTRUN -> SKIP
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-2/igt@kms_pipe_crc_basic@suspend-read-crc.html>
>         (i915#1845 <https://gitlab.freedesktop.org/drm/intel/issues/1845>)
>
>
>         Possible fixes
>
>  *
>
>     igt@gem_exec_suspend@basic-s0@smem:
>
>       o bat-rpls-2: ABORT
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-rpls-2/igt@gem_exec_suspend@basic-s0@smem.html>
>         -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-2/igt@gem_exec_suspend@basic-s0@smem.html>
>  *
>
>     igt@gem_exec_suspend@basic-s3@lmem0:
>
>       o bat-dg2-9: FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-dg2-9/igt@gem_exec_suspend@basic-s3@lmem0.html>
>         (fdo#103375
>         <https://bugs.freedesktop.org/show_bug.cgi?id=103375>) -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-dg2-9/igt@gem_exec_suspend@basic-s3@lmem0.html>
>         +3 similar issues
>  *
>
>     igt@i915_pm_rps@basic-api:
>
>       o bat-dg2-11: FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-dg2-11/igt@i915_pm_rps@basic-api.html>
>         -> PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-dg2-11/igt@i915_pm_rps@basic-api.html>
>  *
>
>     igt@i915_selftest@live@reset:
>
>       o bat-rpls-1: ABORT
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-rpls-1/igt@i915_selftest@live@reset.html>
>         (i915#4983
>         <https://gitlab.freedesktop.org/drm/intel/issues/4983> /
>         i915#7981
>         <https://gitlab.freedesktop.org/drm/intel/issues/7981>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-1/igt@i915_selftest@live@reset.html>
>  *
>
>     igt@i915_selftest@live@workarounds:
>
>       o bat-adln-1: INCOMPLETE
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-adln-1/igt@i915_selftest@live@workarounds.html>
>         (i915#4983
>         <https://gitlab.freedesktop.org/drm/intel/issues/4983> /
>         i915#7467
>         <https://gitlab.freedesktop.org/drm/intel/issues/7467> /
>         i915#7981
>         <https://gitlab.freedesktop.org/drm/intel/issues/7981>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-adln-1/igt@i915_selftest@live@workarounds.html>
>  *
>
>     igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-dp-3:
>
>       o bat-dg2-9: FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-dg2-9/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-dp-3.html>
>         (fdo#103375
>         <https://bugs.freedesktop.org/show_bug.cgi?id=103375> /
>         i915#7932
>         <https://gitlab.freedesktop.org/drm/intel/issues/7932>) ->
>         PASS
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-dg2-9/igt@kms_pipe_crc_basic@suspend-read-crc@pipe-c-dp-3.html>
>
>
>         Warnings
>
>   * igt@i915_selftest@live@slpc:
>       o bat-rpls-2: DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12902/bat-rpls-2/igt@i915_selftest@live@slpc.html>
>         (i915#6997
>         <https://gitlab.freedesktop.org/drm/intel/issues/6997> /
>         i915#7913
>         <https://gitlab.freedesktop.org/drm/intel/issues/7913>) ->
>         DMESG-FAIL
>         <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114168v3/bat-rpls-2/igt@i915_selftest@live@slpc.html>
>         (i915#6367
>         <https://gitlab.freedesktop.org/drm/intel/issues/6367> /
>         i915#7913
>         <https://gitlab.freedesktop.org/drm/intel/issues/7913> /
>         i915#7996 <https://gitlab.freedesktop.org/drm/intel/issues/7996>)
>
These patches only change GuC firmware loading (reporting of errors and 
longer timeouts). None of the above issues are related to GuC firmware 
loading. Therefore, they are not caused by these changes.

John.

>  *
>
>
>     Build changes
>
>   * Linux: CI_DRM_12902 -> Patchwork_114168v3
>
> CI-20190529: 20190529
> CI_DRM_12902: c8333f1c10ebbdaad7a642cc66041b4f90bc81be @ 
> git://anongit.freedesktop.org/gfx-ci/linux
> IGT_7211: c0cc1de7b2f4041ca68960362aa55f881d416bac @ 
> https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
> Patchwork_114168v3: c8333f1c10ebbdaad7a642cc66041b4f90bc81be @ 
> git://anongit.freedesktop.org/gfx-ci/linux
>
>
>       Linux commits
>
> 247aa4568644 drm/i915/guc: Allow for very slow GuC loading
> 0c9bcdd1e7e7 drm/i915/guc: Improve GuC load error reporting
>

[-- Attachment #2: Type: text/html, Size: 13120 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx]  ✗ Fi.CI.CHECKPATCH: warning for Improvements to GuC load failure handling (rev3)
  2023-03-23  2:40 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Improvements to GuC load failure handling (rev3) Patchwork
@ 2023-03-23 22:30   ` John Harrison
  0 siblings, 0 replies; 10+ messages in thread
From: John Harrison @ 2023-03-23 22:30 UTC (permalink / raw)
  To: intel-gfx

On 3/22/2023 19:40, Patchwork wrote:
> == Series Details ==
>
> Series: Improvements to GuC load failure handling (rev3)
> URL   : https://patchwork.freedesktop.org/series/114168/
> State : warning
>
> == Summary ==
>
> Error: dim checkpatch failed
> b4df7f16c846 drm/i915/guc: Improve GuC load error reporting
> 2be0fcf3087c drm/i915/guc: Allow for very slow GuC loading
> -:21: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
> #21:
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7931
>
> -:22: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
> #22:
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8060
>
> -:23: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
> #23:
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8083
>
> -:24: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
> #24:
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8136
>
> -:25: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' instead
These issues appear to be the tool getting confused about bug references 
versus patchwork links. Other patches in the tree use the references tag 
for bug links.

John.

> #25:
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/8137
>
> total: 0 errors, 5 warnings, 0 checks, 85 lines checked
>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-03-23 22:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-16 22:06 [Intel-gfx] [PATCH v2 0/2] Improvements to GuC load failure handling John.C.Harrison
2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 1/2] drm/i915/guc: Improve GuC load error reporting John.C.Harrison
2023-03-16 22:06 ` [Intel-gfx] [PATCH v2 2/2] drm/i915/guc: Allow for very slow GuC loading John.C.Harrison
2023-03-17 22:33   ` Dixit, Ashutosh
2023-03-22 21:14   ` Ceraolo Spurio, Daniele
2023-03-23  2:40 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Improvements to GuC load failure handling (rev3) Patchwork
2023-03-23 22:30   ` John Harrison
2023-03-23  2:40 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-03-23  2:52 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-03-23 22:29   ` John Harrison

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).