All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume
@ 2018-10-15 22:10 Daniele Ceraolo Spurio
  2018-10-15 22:10 ` [PATCH 2/2] HAX enable GuC for CI Daniele Ceraolo Spurio
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-15 22:10 UTC (permalink / raw)
  To: intel-gfx

The ENTER/EXIT_S_STATE actions queue the save/restore operation in GuC
FW and then return, so waiting on the H2H is not enough to guarantee
GuC is done.
When all the processing is done, GuC writes 0 to scratch register 14,
so we can poll on that. Note that GuC does not ensure that the value
in the register is different from 0 while the action is in progress
so we need to take care of that ourselves as well.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/intel_guc.c      | 28 +++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_guc_fwif.h |  6 ++++++
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c
index 230aea69385d..f238cd7a9dcf 100644
--- a/drivers/gpu/drm/i915/intel_guc.c
+++ b/drivers/gpu/drm/i915/intel_guc.c
@@ -521,6 +521,30 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
 	return intel_guc_send(guc, action, ARRAY_SIZE(action));
 }
 
+/*
+ * The ENTER/EXIT_S_STATE actions queue the save/restore operation in GuC FW and
+ * then return, so waiting on the H2H is not enough to guarantee GuC is done.
+ * When all the processing is done, GuC writes 0 to scratch register 14, so we
+ * can poll on that. Note that GuC does not ensure that the value in the
+ * register is different from 0 while the action is in progress so we need to
+ * take care of that ourselves as well.
+ */
+static int guc_sleep_state_action(struct intel_guc *guc,
+				  const u32 *action, u32 len)
+{
+	struct drm_i915_private *dev_priv = guc_to_i915(guc);
+	int ret;
+
+	I915_WRITE(SOFT_SCRATCH(14), ~0x0);
+
+	ret = intel_guc_send(guc, action, len);
+	if (ret)
+		return ret;
+
+	return intel_wait_for_register(dev_priv, SOFT_SCRATCH(14), ~0x0,
+				       INTEL_GUC_SLEEP_STATE_SUCCESS, 10);
+}
+
 /**
  * intel_guc_suspend() - notify GuC entering suspend state
  * @guc:	the guc
@@ -533,7 +557,7 @@ int intel_guc_suspend(struct intel_guc *guc)
 		intel_guc_ggtt_offset(guc, guc->shared_data)
 	};
 
-	return intel_guc_send(guc, data, ARRAY_SIZE(data));
+	return guc_sleep_state_action(guc, data, ARRAY_SIZE(data));
 }
 
 /**
@@ -571,7 +595,7 @@ int intel_guc_resume(struct intel_guc *guc)
 		intel_guc_ggtt_offset(guc, guc->shared_data)
 	};
 
-	return intel_guc_send(guc, data, ARRAY_SIZE(data));
+	return guc_sleep_state_action(guc, data, ARRAY_SIZE(data));
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
index 8382d591c784..b0eb5aabe0a7 100644
--- a/drivers/gpu/drm/i915/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
@@ -687,6 +687,12 @@ enum intel_guc_report_status {
 	INTEL_GUC_REPORT_STATUS_COMPLETE = 0x4,
 };
 
+enum intel_guc_sleep_state_status {
+	INTEL_GUC_SLEEP_STATE_SUCCESS = 0x0,
+	INTEL_GUC_SLEEP_STATE_PREEMPT_TO_IDLE_FAILED = 0x1,
+	INTEL_GUC_SLEEP_STATE_ENGINE_RESET_FAILED = 0x2
+};
+
 #define GUC_LOG_CONTROL_LOGGING_ENABLED	(1 << 0)
 #define GUC_LOG_CONTROL_VERBOSITY_SHIFT	4
 #define GUC_LOG_CONTROL_VERBOSITY_MASK	(0xF << GUC_LOG_CONTROL_VERBOSITY_SHIFT)
-- 
2.19.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] HAX enable GuC for CI
  2018-10-15 22:10 [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume Daniele Ceraolo Spurio
@ 2018-10-15 22:10 ` Daniele Ceraolo Spurio
  2018-10-15 22:21 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume Patchwork
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-15 22:10 UTC (permalink / raw)
  To: intel-gfx

From: Michal Wajdeczko <michal.wajdeczko@intel.com>

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
---
 drivers/gpu/drm/i915/i915_params.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 7e56c516c815..c681537bcb92 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -45,7 +45,7 @@ struct drm_printer;
 	param(int, disable_power_well, -1) \
 	param(int, enable_ips, 1) \
 	param(int, invert_brightness, 0) \
-	param(int, enable_guc, 0) \
+	param(int, enable_guc, -1) \
 	param(int, guc_log_level, -1) \
 	param(char *, guc_firmware_path, NULL) \
 	param(char *, huc_firmware_path, NULL) \
-- 
2.19.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-15 22:10 [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume Daniele Ceraolo Spurio
  2018-10-15 22:10 ` [PATCH 2/2] HAX enable GuC for CI Daniele Ceraolo Spurio
@ 2018-10-15 22:21 ` Patchwork
  2018-10-15 22:47 ` ✗ Fi.CI.BAT: failure " Patchwork
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2018-10-15 22:21 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
URL   : https://patchwork.freedesktop.org/series/51033/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
4454d4d05ce3 drm/i915/guc: fix GuC suspend/resume
c88fb110ee82 HAX enable GuC for CI
-:7: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

total: 0 errors, 1 warnings, 0 checks, 8 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-15 22:10 [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume Daniele Ceraolo Spurio
  2018-10-15 22:10 ` [PATCH 2/2] HAX enable GuC for CI Daniele Ceraolo Spurio
  2018-10-15 22:21 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume Patchwork
@ 2018-10-15 22:47 ` Patchwork
  2018-10-15 23:43   ` Daniele Ceraolo Spurio
  2018-10-16  8:53 ` [PATCH 1/2] " Chris Wilson
  2018-10-16 10:26 ` Michal Wajdeczko
  4 siblings, 1 reply; 13+ messages in thread
From: Patchwork @ 2018-10-15 22:47 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
URL   : https://patchwork.freedesktop.org/series/51033/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4984 -> Patchwork_10464 =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_10464 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10464, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/51033/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10464:

  === IGT changes ===

    ==== Possible regressions ====

    igt@drv_selftest@live_execlists:
      fi-skl-6700hq:      PASS -> INCOMPLETE

    igt@drv_selftest@live_guc:
      fi-kbl-7567u:       PASS -> DMESG-WARN
      fi-skl-6600u:       PASS -> DMESG-WARN
      fi-skl-gvtdvm:      PASS -> DMESG-WARN
      fi-skl-iommu:       PASS -> DMESG-WARN
      fi-skl-6260u:       PASS -> DMESG-WARN
      fi-bxt-dsi:         PASS -> DMESG-WARN
      fi-skl-6700k2:      PASS -> DMESG-WARN
      fi-whl-u:           PASS -> DMESG-WARN
      fi-skl-6770hq:      PASS -> DMESG-WARN
      fi-kbl-7560u:       PASS -> DMESG-WARN
      fi-kbl-8809g:       PASS -> DMESG-WARN
      fi-kbl-r:           PASS -> DMESG-WARN
      fi-kbl-x1275:       PASS -> DMESG-WARN
      fi-bxt-j4205:       PASS -> DMESG-WARN
      fi-cfl-s3:          PASS -> DMESG-WARN
      fi-cfl-8109u:       PASS -> DMESG-WARN
      fi-kbl-7500u:       PASS -> DMESG-WARN
      fi-cfl-8700k:       PASS -> DMESG-WARN

    igt@drv_selftest@live_hangcheck:
      fi-skl-gvtdvm:      PASS -> DMESG-FAIL

    
== Known issues ==

  Here are the changes found in Patchwork_10464 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@drv_selftest@live_guc:
      {fi-apl-guc}:       NOTRUN -> DMESG-WARN (fdo#107258)

    igt@gem_exec_suspend@basic-s4-devices:
      fi-blb-e6850:       PASS -> INCOMPLETE (fdo#107718)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      fi-snb-2520m:       PASS -> DMESG-FAIL (fdo#103713)

    igt@kms_setmode@basic-clone-single-crtc:
      fi-snb-2520m:       PASS -> DMESG-WARN (fdo#103713)

    igt@pm_backlight@basic-brightness:
      fi-snb-2520m:       PASS -> INCOMPLETE (fdo#103713)

    
    ==== Possible fixes ====

    igt@drv_selftest@live_gem:
      {fi-apl-guc}:       INCOMPLETE (fdo#106693) -> PASS

    igt@kms_frontbuffer_tracking@basic:
      fi-byt-clapper:     FAIL (fdo#103167) -> PASS

    
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
  fdo#106693 https://bugs.freedesktop.org/show_bug.cgi?id=106693
  fdo#107258 https://bugs.freedesktop.org/show_bug.cgi?id=107258
  fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718


== Participating hosts (53 -> 47) ==

  Missing    (6): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 


== Build changes ==

    * Linux: CI_DRM_4984 -> Patchwork_10464

  CI_DRM_4984: 90b59df999a13a6405f8d7ece08a69120a9b361a @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4678: 9310a1265ceabeec736bdf0a76e1e0357c76c0b1 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10464: c88fb110ee8261c636d63f4f6d9fa9440891b3a6 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

c88fb110ee82 HAX enable GuC for CI
4454d4d05ce3 drm/i915/guc: fix GuC suspend/resume

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10464/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-15 22:47 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2018-10-15 23:43   ` Daniele Ceraolo Spurio
  2018-10-16  8:49     ` Chris Wilson
  2018-10-16  9:21     ` Daniel Vetter
  0 siblings, 2 replies; 13+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-15 23:43 UTC (permalink / raw)
  To: intel-gfx, Michal Wajdeczko



On 15/10/18 15:47, Patchwork wrote:
> == Series Details ==
> 
> Series: series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
> URL   : https://patchwork.freedesktop.org/series/51033/
> State : failure
> 
> == Summary ==
> 
> = CI Bug Log - changes from CI_DRM_4984 -> Patchwork_10464 =
> 
> == Summary - FAILURE ==
> 
>    Serious unknown changes coming with Patchwork_10464 absolutely need to be
>    verified manually.
>    
>    If you think the reported changes have nothing to do with the changes
>    introduced in Patchwork_10464, please notify your bug team to allow them
>    to document this new failure mode, which will reduce false positives in CI.
> 
>    External URL: https://patchwork.freedesktop.org/api/1.0/series/51033/revisions/1/mbox/
> 
> == Possible new issues ==
> 
>    Here are the unknown changes that may have been introduced in Patchwork_10464:
> 
>    === IGT changes ===
> 
>      ==== Possible regressions ====
> 
>      igt@drv_selftest@live_execlists:
>        fi-skl-6700hq:      PASS -> INCOMPLETE
> 

Log seem to be cut for this one. Since it is stopping inside 
live_preempt_smoke it is probably a known issue that Chris mentioned.
Can't reproduce on my skylake even with the test in a loop.

>      igt@drv_selftest@live_guc:
>        fi-kbl-7567u:       PASS -> DMESG-WARN
>        fi-skl-6600u:       PASS -> DMESG-WARN
>        fi-skl-gvtdvm:      PASS -> DMESG-WARN
>        fi-skl-iommu:       PASS -> DMESG-WARN
>        fi-skl-6260u:       PASS -> DMESG-WARN
>        fi-bxt-dsi:         PASS -> DMESG-WARN
>        fi-skl-6700k2:      PASS -> DMESG-WARN
>        fi-whl-u:           PASS -> DMESG-WARN
>        fi-skl-6770hq:      PASS -> DMESG-WARN
>        fi-kbl-7560u:       PASS -> DMESG-WARN
>        fi-kbl-8809g:       PASS -> DMESG-WARN
>        fi-kbl-r:           PASS -> DMESG-WARN
>        fi-kbl-x1275:       PASS -> DMESG-WARN
>        fi-bxt-j4205:       PASS -> DMESG-WARN
>        fi-cfl-s3:          PASS -> DMESG-WARN
>        fi-cfl-8109u:       PASS -> DMESG-WARN
>        fi-kbl-7500u:       PASS -> DMESG-WARN
>        fi-cfl-8700k:       PASS -> DMESG-WARN

These are all:

[drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x10 failed 
with error -5 0xf000f000

Which is not a real failure since the test is triggering it on purpose

> 
>      igt@drv_selftest@live_hangcheck:
>        fi-skl-gvtdvm:      PASS -> DMESG-FAIL
> 

<7> [464.966238] [drm:guc_fw_xfer [i915]] GuC status 0x20
<3> [464.966361] [drm:guc_fw_xfer [i915]] *ERROR* GuC firmware xfer 
error -110

This looks like GuC is stuck very early in the boot flow (even before 
the RSA check). On SKL there are known issues that could cause this and 
we should reset GuC and retry, but we aren't. Looks like we indirectly 
stopped applying  WaEnableuKernelHeaderValidFix and 
WaEnableGuCBootHashCheckNotSet by not returning -EAGAIN from 
intel_guc_fw_upload in any case. Michal?

Thanks,
Daniele

>      
> == Known issues ==
> 
>    Here are the changes found in Patchwork_10464 that come from known issues:
> 
>    === IGT changes ===
> 
>      ==== Issues hit ====
> 
>      igt@drv_selftest@live_guc:
>        {fi-apl-guc}:       NOTRUN -> DMESG-WARN (fdo#107258)
> 
>      igt@gem_exec_suspend@basic-s4-devices:
>        fi-blb-e6850:       PASS -> INCOMPLETE (fdo#107718)
> 
>      igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
>        fi-snb-2520m:       PASS -> DMESG-FAIL (fdo#103713)
> 
>      igt@kms_setmode@basic-clone-single-crtc:
>        fi-snb-2520m:       PASS -> DMESG-WARN (fdo#103713)
> 
>      igt@pm_backlight@basic-brightness:
>        fi-snb-2520m:       PASS -> INCOMPLETE (fdo#103713)
> 
>      
>      ==== Possible fixes ====
> 
>      igt@drv_selftest@live_gem:
>        {fi-apl-guc}:       INCOMPLETE (fdo#106693) -> PASS
> 
>      igt@kms_frontbuffer_tracking@basic:
>        fi-byt-clapper:     FAIL (fdo#103167) -> PASS
> 
>      
>    {name}: This element is suppressed. This means it is ignored when computing
>            the status of the difference (SUCCESS, WARNING, or FAILURE).
> 
>    fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
>    fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
>    fdo#106693 https://bugs.freedesktop.org/show_bug.cgi?id=106693
>    fdo#107258 https://bugs.freedesktop.org/show_bug.cgi?id=107258
>    fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718
> 
> 
> == Participating hosts (53 -> 47) ==
> 
>    Missing    (6): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600
> 
> 
> == Build changes ==
> 
>      * Linux: CI_DRM_4984 -> Patchwork_10464
> 
>    CI_DRM_4984: 90b59df999a13a6405f8d7ece08a69120a9b361a @ git://anongit.freedesktop.org/gfx-ci/linux
>    IGT_4678: 9310a1265ceabeec736bdf0a76e1e0357c76c0b1 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
>    Patchwork_10464: c88fb110ee8261c636d63f4f6d9fa9440891b3a6 @ git://anongit.freedesktop.org/gfx-ci/linux
> 
> 
> == Linux commits ==
> 
> c88fb110ee82 HAX enable GuC for CI
> 4454d4d05ce3 drm/i915/guc: fix GuC suspend/resume
> 
> == Logs ==
> 
> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10464/issues.html
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-15 23:43   ` Daniele Ceraolo Spurio
@ 2018-10-16  8:49     ` Chris Wilson
  2018-10-16  9:12       ` Michal Wajdeczko
  2018-10-16  9:21     ` Daniel Vetter
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2018-10-16  8:49 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Michal Wajdeczko, intel-gfx

Quoting Daniele Ceraolo Spurio (2018-10-16 00:43:59)
> 
> 
> On 15/10/18 15:47, Patchwork wrote:
> > == Series Details ==
> > 
> > Series: series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
> > URL   : https://patchwork.freedesktop.org/series/51033/
> > State : failure
> > 
> > == Summary ==
> > 
> > = CI Bug Log - changes from CI_DRM_4984 -> Patchwork_10464 =
> > 
> > == Summary - FAILURE ==
> > 
> >    Serious unknown changes coming with Patchwork_10464 absolutely need to be
> >    verified manually.
> >    
> >    If you think the reported changes have nothing to do with the changes
> >    introduced in Patchwork_10464, please notify your bug team to allow them
> >    to document this new failure mode, which will reduce false positives in CI.
> > 
> >    External URL: https://patchwork.freedesktop.org/api/1.0/series/51033/revisions/1/mbox/
> > 
> > == Possible new issues ==
> > 
> >    Here are the unknown changes that may have been introduced in Patchwork_10464:
> > 
> >    === IGT changes ===
> > 
> >      ==== Possible regressions ====
> > 
> >      igt@drv_selftest@live_execlists:
> >        fi-skl-6700hq:      PASS -> INCOMPLETE
> > 
> 
> Log seem to be cut for this one. Since it is stopping inside 
> live_preempt_smoke it is probably a known issue that Chris mentioned.
> Can't reproduce on my skylake even with the test in a loop.

Yeah, if we have the oops report for that one it might have been
clearer.

> > 
> >      igt@drv_selftest@live_hangcheck:
> >        fi-skl-gvtdvm:      PASS -> DMESG-FAIL
> > 
> 
> <7> [464.966238] [drm:guc_fw_xfer [i915]] GuC status 0x20
> <3> [464.966361] [drm:guc_fw_xfer [i915]] *ERROR* GuC firmware xfer 
> error -110
> 
> This looks like GuC is stuck very early in the boot flow (even before 
> the RSA check). On SKL there are known issues that could cause this and 
> we should reset GuC and retry, but we aren't. Looks like we indirectly 
> stopped applying  WaEnableuKernelHeaderValidFix and 
> WaEnableGuCBootHashCheckNotSet by not returning -EAGAIN from 
> intel_guc_fw_upload in any case. Michal?

That would be useful to fix, as it does fail regularly enough to be a
nuisance.

More importantly, no live_gem failures across the board so the patch does
what it says on the tin. Nice work.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-15 22:10 [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume Daniele Ceraolo Spurio
                   ` (2 preceding siblings ...)
  2018-10-15 22:47 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2018-10-16  8:53 ` Chris Wilson
  2018-10-16 10:26 ` Michal Wajdeczko
  4 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2018-10-16  8:53 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx

Quoting Daniele Ceraolo Spurio (2018-10-15 23:10:08)
> The ENTER/EXIT_S_STATE actions queue the save/restore operation in GuC
> FW and then return, so waiting on the H2H is not enough to guarantee
> GuC is done.
> When all the processing is done, GuC writes 0 to scratch register 14,
> so we can poll on that. Note that GuC does not ensure that the value
> in the register is different from 0 while the action is in progress
> so we need to take care of that ourselves as well.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Explains itself very well,
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-16  8:49     ` Chris Wilson
@ 2018-10-16  9:12       ` Michal Wajdeczko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Wajdeczko @ 2018-10-16  9:12 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx, Chris Wilson

On Tue, 16 Oct 2018 10:49:54 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> Quoting Daniele Ceraolo Spurio (2018-10-16 00:43:59)
...
>> >
>> >      igt@drv_selftest@live_hangcheck:
>> >        fi-skl-gvtdvm:      PASS -> DMESG-FAIL
>> >
>>
>> <7> [464.966238] [drm:guc_fw_xfer [i915]] GuC status 0x20
>> <3> [464.966361] [drm:guc_fw_xfer [i915]] *ERROR* GuC firmware xfer
>> error -110
>>
>> This looks like GuC is stuck very early in the boot flow (even before
>> the RSA check). On SKL there are known issues that could cause this and
>> we should reset GuC and retry, but we aren't. Looks like we indirectly
>> stopped applying  WaEnableuKernelHeaderValidFix and
>> WaEnableGuCBootHashCheckNotSet by not returning -EAGAIN from
>> intel_guc_fw_upload in any case. Michal?
>
> That would be useful to fix, as it does fail regularly enough to be a
> nuisance.

Proposed fix is here [1]. Note that earlier patch was trying to avoid the
case when we were repeating loading of the GuC firmware that simply didn't
pass signature verification (not fixable that way ;)

Michal

[1] https://patchwork.freedesktop.org/patch/256913/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-15 23:43   ` Daniele Ceraolo Spurio
  2018-10-16  8:49     ` Chris Wilson
@ 2018-10-16  9:21     ` Daniel Vetter
  2018-10-16 17:15       ` Daniele Ceraolo Spurio
  1 sibling, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2018-10-16  9:21 UTC (permalink / raw)
  To: Ceraolo Spurio, Daniele; +Cc: intel-gfx

On Tue, Oct 16, 2018 at 1:44 AM Daniele Ceraolo Spurio
<daniele.ceraolospurio@intel.com> wrote:
>
>
>
> On 15/10/18 15:47, Patchwork wrote:
> > == Series Details ==
> >
> > Series: series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
> > URL   : https://patchwork.freedesktop.org/series/51033/
> > State : failure
> >
> > == Summary ==
> >
> > = CI Bug Log - changes from CI_DRM_4984 -> Patchwork_10464 =
> >
> > == Summary - FAILURE ==
> >
> >    Serious unknown changes coming with Patchwork_10464 absolutely need to be
> >    verified manually.
> >
> >    If you think the reported changes have nothing to do with the changes
> >    introduced in Patchwork_10464, please notify your bug team to allow them
> >    to document this new failure mode, which will reduce false positives in CI.
> >
> >    External URL: https://patchwork.freedesktop.org/api/1.0/series/51033/revisions/1/mbox/
> >
> > == Possible new issues ==
> >
> >    Here are the unknown changes that may have been introduced in Patchwork_10464:
> >
> >    === IGT changes ===
> >
> >      ==== Possible regressions ====
> >
> >      igt@drv_selftest@live_execlists:
> >        fi-skl-6700hq:      PASS -> INCOMPLETE
> >
>
> Log seem to be cut for this one. Since it is stopping inside
> live_preempt_smoke it is probably a known issue that Chris mentioned.
> Can't reproduce on my skylake even with the test in a loop.
>
> >      igt@drv_selftest@live_guc:
> >        fi-kbl-7567u:       PASS -> DMESG-WARN
> >        fi-skl-6600u:       PASS -> DMESG-WARN
> >        fi-skl-gvtdvm:      PASS -> DMESG-WARN
> >        fi-skl-iommu:       PASS -> DMESG-WARN
> >        fi-skl-6260u:       PASS -> DMESG-WARN
> >        fi-bxt-dsi:         PASS -> DMESG-WARN
> >        fi-skl-6700k2:      PASS -> DMESG-WARN
> >        fi-whl-u:           PASS -> DMESG-WARN
> >        fi-skl-6770hq:      PASS -> DMESG-WARN
> >        fi-kbl-7560u:       PASS -> DMESG-WARN
> >        fi-kbl-8809g:       PASS -> DMESG-WARN
> >        fi-kbl-r:           PASS -> DMESG-WARN
> >        fi-kbl-x1275:       PASS -> DMESG-WARN
> >        fi-bxt-j4205:       PASS -> DMESG-WARN
> >        fi-cfl-s3:          PASS -> DMESG-WARN
> >        fi-cfl-8109u:       PASS -> DMESG-WARN
> >        fi-kbl-7500u:       PASS -> DMESG-WARN
> >        fi-cfl-8700k:       PASS -> DMESG-WARN
>
> These are all:
>
> [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x10 failed
> with error -5 0xf000f000
>
> Which is not a real failure since the test is triggering it on purpose

You still need to make them shut up. dmesg errors should only be used
for stuff we really don't expect. E.g. gpu hangs provoked by igt also
don't result in dmesg errors/warnings and failed tests.
-Daniel

> >
> >      igt@drv_selftest@live_hangcheck:
> >        fi-skl-gvtdvm:      PASS -> DMESG-FAIL
> >
>
> <7> [464.966238] [drm:guc_fw_xfer [i915]] GuC status 0x20
> <3> [464.966361] [drm:guc_fw_xfer [i915]] *ERROR* GuC firmware xfer
> error -110
>
> This looks like GuC is stuck very early in the boot flow (even before
> the RSA check). On SKL there are known issues that could cause this and
> we should reset GuC and retry, but we aren't. Looks like we indirectly
> stopped applying  WaEnableuKernelHeaderValidFix and
> WaEnableGuCBootHashCheckNotSet by not returning -EAGAIN from
> intel_guc_fw_upload in any case. Michal?
>
> Thanks,
> Daniele
>
> >
> > == Known issues ==
> >
> >    Here are the changes found in Patchwork_10464 that come from known issues:
> >
> >    === IGT changes ===
> >
> >      ==== Issues hit ====
> >
> >      igt@drv_selftest@live_guc:
> >        {fi-apl-guc}:       NOTRUN -> DMESG-WARN (fdo#107258)
> >
> >      igt@gem_exec_suspend@basic-s4-devices:
> >        fi-blb-e6850:       PASS -> INCOMPLETE (fdo#107718)
> >
> >      igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
> >        fi-snb-2520m:       PASS -> DMESG-FAIL (fdo#103713)
> >
> >      igt@kms_setmode@basic-clone-single-crtc:
> >        fi-snb-2520m:       PASS -> DMESG-WARN (fdo#103713)
> >
> >      igt@pm_backlight@basic-brightness:
> >        fi-snb-2520m:       PASS -> INCOMPLETE (fdo#103713)
> >
> >
> >      ==== Possible fixes ====
> >
> >      igt@drv_selftest@live_gem:
> >        {fi-apl-guc}:       INCOMPLETE (fdo#106693) -> PASS
> >
> >      igt@kms_frontbuffer_tracking@basic:
> >        fi-byt-clapper:     FAIL (fdo#103167) -> PASS
> >
> >
> >    {name}: This element is suppressed. This means it is ignored when computing
> >            the status of the difference (SUCCESS, WARNING, or FAILURE).
> >
> >    fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
> >    fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
> >    fdo#106693 https://bugs.freedesktop.org/show_bug.cgi?id=106693
> >    fdo#107258 https://bugs.freedesktop.org/show_bug.cgi?id=107258
> >    fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718
> >
> >
> > == Participating hosts (53 -> 47) ==
> >
> >    Missing    (6): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600
> >
> >
> > == Build changes ==
> >
> >      * Linux: CI_DRM_4984 -> Patchwork_10464
> >
> >    CI_DRM_4984: 90b59df999a13a6405f8d7ece08a69120a9b361a @ git://anongit.freedesktop.org/gfx-ci/linux
> >    IGT_4678: 9310a1265ceabeec736bdf0a76e1e0357c76c0b1 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
> >    Patchwork_10464: c88fb110ee8261c636d63f4f6d9fa9440891b3a6 @ git://anongit.freedesktop.org/gfx-ci/linux
> >
> >
> > == Linux commits ==
> >
> > c88fb110ee82 HAX enable GuC for CI
> > 4454d4d05ce3 drm/i915/guc: fix GuC suspend/resume
> >
> > == Logs ==
> >
> > For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10464/issues.html
> >
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-15 22:10 [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume Daniele Ceraolo Spurio
                   ` (3 preceding siblings ...)
  2018-10-16  8:53 ` [PATCH 1/2] " Chris Wilson
@ 2018-10-16 10:26 ` Michal Wajdeczko
  2018-10-16 16:35   ` Daniele Ceraolo Spurio
  4 siblings, 1 reply; 13+ messages in thread
From: Michal Wajdeczko @ 2018-10-16 10:26 UTC (permalink / raw)
  To: intel-gfx, Daniele Ceraolo Spurio

On Tue, 16 Oct 2018 00:10:08 +0200, Daniele Ceraolo Spurio  
<daniele.ceraolospurio@intel.com> wrote:

> The ENTER/EXIT_S_STATE actions queue the save/restore operation in GuC
> FW and then return, so waiting on the H2H is not enough to guarantee
> GuC is done.
> When all the processing is done, GuC writes 0 to scratch register 14,
> so we can poll on that. Note that GuC does not ensure that the value
> in the register is different from 0 while the action is in progress
> so we need to take care of that ourselves as well.
>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_guc.c      | 28 +++++++++++++++++++++++++--
>  drivers/gpu/drm/i915/intel_guc_fwif.h |  6 ++++++
>  2 files changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_guc.c  
> b/drivers/gpu/drm/i915/intel_guc.c
> index 230aea69385d..f238cd7a9dcf 100644
> --- a/drivers/gpu/drm/i915/intel_guc.c
> +++ b/drivers/gpu/drm/i915/intel_guc.c
> @@ -521,6 +521,30 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32  
> rsa_offset)
>  	return intel_guc_send(guc, action, ARRAY_SIZE(action));
>  }
> +/*
> + * The ENTER/EXIT_S_STATE actions queue the save/restore operation in  
> GuC FW and
> + * then return, so waiting on the H2H is not enough to guarantee GuC is  
> done.
> + * When all the processing is done, GuC writes 0 to scratch register  
> 14, so we

s/writes 0/writes INTEL_GUC_SLEEP_STATE_SUCCESS ?

> + * can poll on that. Note that GuC does not ensure that the value in the
> + * register is different from 0 while the action is in progress so we  
> need to
> + * take care of that ourselves as well.
> + */
> +static int guc_sleep_state_action(struct intel_guc *guc,
> +				  const u32 *action, u32 len)
> +{
> +	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> +	int ret;
> +
> +	I915_WRITE(SOFT_SCRATCH(14), ~0x0);

Do we want to add dedicated name for that scratch register?

Also, as GuC is using scratch[14] then we need to remove it from our send
register pool - patch is here [1]

> +
> +	ret = intel_guc_send(guc, action, len);
> +	if (ret)
> +		return ret;
> +
> +	return intel_wait_for_register(dev_priv, SOFT_SCRATCH(14), ~0x0,
> +				       INTEL_GUC_SLEEP_STATE_SUCCESS, 10);

Maybe it is worth to use __intel_wait_for_register() and print sleep
state error code in case of failure ? And do we really need to wait
full 10ms for SUCCESS if GuC already has reported PREEMPT_TO_IDLE_FAILED
or ENGINE_RESET_FAILED ?

	u32 state;

	ret = __intel_wait_for_register(dev_priv, SOFT_SCRATCH(14), 0x80000000,
				        0, 10, &state);
	if (ret)
		return ret;
	if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) {
		DRM_ERROR("... %u\n", state);
		return -EIO;
	}
	return 0;

> +}
> +
>  /**
>   * intel_guc_suspend() - notify GuC entering suspend state
>   * @guc:	the guc
> @@ -533,7 +557,7 @@ int intel_guc_suspend(struct intel_guc *guc)
>  		intel_guc_ggtt_offset(guc, guc->shared_data)
>  	};
> -	return intel_guc_send(guc, data, ARRAY_SIZE(data));
> +	return guc_sleep_state_action(guc, data, ARRAY_SIZE(data));
>  }
> /**
> @@ -571,7 +595,7 @@ int intel_guc_resume(struct intel_guc *guc)
>  		intel_guc_ggtt_offset(guc, guc->shared_data)
>  	};
> -	return intel_guc_send(guc, data, ARRAY_SIZE(data));
> +	return guc_sleep_state_action(guc, data, ARRAY_SIZE(data));
>  }
> /**
> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h  
> b/drivers/gpu/drm/i915/intel_guc_fwif.h
> index 8382d591c784..b0eb5aabe0a7 100644
> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
> @@ -687,6 +687,12 @@ enum intel_guc_report_status {
>  	INTEL_GUC_REPORT_STATUS_COMPLETE = 0x4,
>  };
> +enum intel_guc_sleep_state_status {
> +	INTEL_GUC_SLEEP_STATE_SUCCESS = 0x0,
> +	INTEL_GUC_SLEEP_STATE_PREEMPT_TO_IDLE_FAILED = 0x1,
> +	INTEL_GUC_SLEEP_STATE_ENGINE_RESET_FAILED = 0x2

as we waiting for state change, maybe we should explicitly define
	INTEL_GUC_SLEEP_STATE_INVALID_MASK = 0x80000000,
and use it in __intel_wait_for_register ?

> +};
> +
>  #define GUC_LOG_CONTROL_LOGGING_ENABLED	(1 << 0)
>  #define GUC_LOG_CONTROL_VERBOSITY_SHIFT	4
>  #define GUC_LOG_CONTROL_VERBOSITY_MASK	(0xF <<  
> GUC_LOG_CONTROL_VERBOSITY_SHIFT)

Thanks,
Michal

[1] https://patchwork.freedesktop.org/patch/256921/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-16 10:26 ` Michal Wajdeczko
@ 2018-10-16 16:35   ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 13+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-16 16:35 UTC (permalink / raw)
  To: Michal Wajdeczko, intel-gfx



On 10/16/2018 3:26 AM, Michal Wajdeczko wrote:
> On Tue, 16 Oct 2018 00:10:08 +0200, Daniele Ceraolo Spurio 
> <daniele.ceraolospurio@intel.com> wrote:
>
>> The ENTER/EXIT_S_STATE actions queue the save/restore operation in GuC
>> FW and then return, so waiting on the H2H is not enough to guarantee
>> GuC is done.
>> When all the processing is done, GuC writes 0 to scratch register 14,
>> so we can poll on that. Note that GuC does not ensure that the value
>> in the register is different from 0 while the action is in progress
>> so we need to take care of that ourselves as well.
>>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> ---
>>  drivers/gpu/drm/i915/intel_guc.c      | 28 +++++++++++++++++++++++++--
>>  drivers/gpu/drm/i915/intel_guc_fwif.h |  6 ++++++
>>  2 files changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_guc.c 
>> b/drivers/gpu/drm/i915/intel_guc.c
>> index 230aea69385d..f238cd7a9dcf 100644
>> --- a/drivers/gpu/drm/i915/intel_guc.c
>> +++ b/drivers/gpu/drm/i915/intel_guc.c
>> @@ -521,6 +521,30 @@ int intel_guc_auth_huc(struct intel_guc *guc, 
>> u32 rsa_offset)
>>      return intel_guc_send(guc, action, ARRAY_SIZE(action));
>>  }
>> +/*
>> + * The ENTER/EXIT_S_STATE actions queue the save/restore operation 
>> in GuC FW and
>> + * then return, so waiting on the H2H is not enough to guarantee GuC 
>> is done.
>> + * When all the processing is done, GuC writes 0 to scratch register 
>> 14, so we
>
> s/writes 0/writes INTEL_GUC_SLEEP_STATE_SUCCESS ?
>
>> + * can poll on that. Note that GuC does not ensure that the value in 
>> the
>> + * register is different from 0 while the action is in progress so 
>> we need to
>> + * take care of that ourselves as well.
>> + */
>> +static int guc_sleep_state_action(struct intel_guc *guc,
>> +                  const u32 *action, u32 len)
>> +{
>> +    struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> +    int ret;
>> +
>> +    I915_WRITE(SOFT_SCRATCH(14), ~0x0);
>
> Do we want to add dedicated name for that scratch register?

As I replied on your patch, the register is used for other purposes in 
other actions so I don't think it is a good idea to rename it.

>
> Also, as GuC is using scratch[14] then we need to remove it from our send
> register pool - patch is here [1]
>
>> +
>> +    ret = intel_guc_send(guc, action, len);
>> +    if (ret)
>> +        return ret;
>> +
>> +    return intel_wait_for_register(dev_priv, SOFT_SCRATCH(14), ~0x0,
>> +                       INTEL_GUC_SLEEP_STATE_SUCCESS, 10);
>
> Maybe it is worth to use __intel_wait_for_register() and print sleep
> state error code in case of failure ? And do we really need to wait
> full 10ms for SUCCESS if GuC already has reported PREEMPT_TO_IDLE_FAILED
> or ENGINE_RESET_FAILED ?
>
>     u32 state;
>
>     ret = __intel_wait_for_register(dev_priv, SOFT_SCRATCH(14), 
> 0x80000000,
>                         0, 10, &state);
>     if (ret)
>         return ret;
>     if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) {
>         DRM_ERROR("... %u\n", state);
>         return -EIO;
>     }
>     return 0;
>

It should be impossible for us to get PREEMPT_TO_IDLE_FAILED or 
ENGINE_RESET_FAILED because those only come in play if guc_suspend() is 
called while there are outstanding request inside GuC. However, no real 
downsides in going with your solution either so I'll do the changes ;)

>> +}
>> +
>>  /**
>>   * intel_guc_suspend() - notify GuC entering suspend state
>>   * @guc:    the guc
>> @@ -533,7 +557,7 @@ int intel_guc_suspend(struct intel_guc *guc)
>>          intel_guc_ggtt_offset(guc, guc->shared_data)
>>      };
>> -    return intel_guc_send(guc, data, ARRAY_SIZE(data));
>> +    return guc_sleep_state_action(guc, data, ARRAY_SIZE(data));
>>  }
>> /**
>> @@ -571,7 +595,7 @@ int intel_guc_resume(struct intel_guc *guc)
>>          intel_guc_ggtt_offset(guc, guc->shared_data)
>>      };
>> -    return intel_guc_send(guc, data, ARRAY_SIZE(data));
>> +    return guc_sleep_state_action(guc, data, ARRAY_SIZE(data));
>>  }
>> /**
>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h 
>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>> index 8382d591c784..b0eb5aabe0a7 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>> @@ -687,6 +687,12 @@ enum intel_guc_report_status {
>>      INTEL_GUC_REPORT_STATUS_COMPLETE = 0x4,
>>  };
>> +enum intel_guc_sleep_state_status {
>> +    INTEL_GUC_SLEEP_STATE_SUCCESS = 0x0,
>> +    INTEL_GUC_SLEEP_STATE_PREEMPT_TO_IDLE_FAILED = 0x1,
>> +    INTEL_GUC_SLEEP_STATE_ENGINE_RESET_FAILED = 0x2
>
> as we waiting for state change, maybe we should explicitly define
>     INTEL_GUC_SLEEP_STATE_INVALID_MASK = 0x80000000,
> and use it in __intel_wait_for_register ?

ack

>
>> +};
>> +
>>  #define GUC_LOG_CONTROL_LOGGING_ENABLED    (1 << 0)
>>  #define GUC_LOG_CONTROL_VERBOSITY_SHIFT    4
>>  #define GUC_LOG_CONTROL_VERBOSITY_MASK    (0xF << 
>> GUC_LOG_CONTROL_VERBOSITY_SHIFT)
>

Note for you while I'm it: I've been told that the gen11 GuC FW has a 
simplified and improved flow for suspend/resume, so some changes will be 
required in your series. Not sure about the details.

Thanks,
Daniele

> Thanks,
> Michal
>
> [1] https://patchwork.freedesktop.org/patch/256921/

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-16  9:21     ` Daniel Vetter
@ 2018-10-16 17:15       ` Daniele Ceraolo Spurio
  2018-10-16 18:58         ` Michal Wajdeczko
  0 siblings, 1 reply; 13+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-16 17:15 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx



On 10/16/2018 2:21 AM, Daniel Vetter wrote:
> On Tue, Oct 16, 2018 at 1:44 AM Daniele Ceraolo Spurio
> <daniele.ceraolospurio@intel.com> wrote:
>>
>>
>> On 15/10/18 15:47, Patchwork wrote:
>>> == Series Details ==
>>>
>>> Series: series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
>>> URL   : https://patchwork.freedesktop.org/series/51033/
>>> State : failure
>>>
>>> == Summary ==
>>>
>>> = CI Bug Log - changes from CI_DRM_4984 -> Patchwork_10464 =
>>>
>>> == Summary - FAILURE ==
>>>
>>>     Serious unknown changes coming with Patchwork_10464 absolutely need to be
>>>     verified manually.
>>>
>>>     If you think the reported changes have nothing to do with the changes
>>>     introduced in Patchwork_10464, please notify your bug team to allow them
>>>     to document this new failure mode, which will reduce false positives in CI.
>>>
>>>     External URL: https://patchwork.freedesktop.org/api/1.0/series/51033/revisions/1/mbox/
>>>
>>> == Possible new issues ==
>>>
>>>     Here are the unknown changes that may have been introduced in Patchwork_10464:
>>>
>>>     === IGT changes ===
>>>
>>>       ==== Possible regressions ====
>>>
>>>       igt@drv_selftest@live_execlists:
>>>         fi-skl-6700hq:      PASS -> INCOMPLETE
>>>
>> Log seem to be cut for this one. Since it is stopping inside
>> live_preempt_smoke it is probably a known issue that Chris mentioned.
>> Can't reproduce on my skylake even with the test in a loop.
>>
>>>       igt@drv_selftest@live_guc:
>>>         fi-kbl-7567u:       PASS -> DMESG-WARN
>>>         fi-skl-6600u:       PASS -> DMESG-WARN
>>>         fi-skl-gvtdvm:      PASS -> DMESG-WARN
>>>         fi-skl-iommu:       PASS -> DMESG-WARN
>>>         fi-skl-6260u:       PASS -> DMESG-WARN
>>>         fi-bxt-dsi:         PASS -> DMESG-WARN
>>>         fi-skl-6700k2:      PASS -> DMESG-WARN
>>>         fi-whl-u:           PASS -> DMESG-WARN
>>>         fi-skl-6770hq:      PASS -> DMESG-WARN
>>>         fi-kbl-7560u:       PASS -> DMESG-WARN
>>>         fi-kbl-8809g:       PASS -> DMESG-WARN
>>>         fi-kbl-r:           PASS -> DMESG-WARN
>>>         fi-kbl-x1275:       PASS -> DMESG-WARN
>>>         fi-bxt-j4205:       PASS -> DMESG-WARN
>>>         fi-cfl-s3:          PASS -> DMESG-WARN
>>>         fi-cfl-8109u:       PASS -> DMESG-WARN
>>>         fi-kbl-7500u:       PASS -> DMESG-WARN
>>>         fi-cfl-8700k:       PASS -> DMESG-WARN
>> These are all:
>>
>> [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x10 failed
>> with error -5 0xf000f000
>>
>> Which is not a real failure since the test is triggering it on purpose
> You still need to make them shut up. dmesg errors should only be used
> for stuff we really don't expect. E.g. gpu hangs provoked by igt also
> don't result in dmesg errors/warnings and failed tests.
> -Daniel

I wasn't trying to imply that we don't care that we have a failure or 
that we shouldn't make it shut up, just that it is not a regression 
introduced by this patch, because it doesn't even get near that code. I 
recall that there was a small discussion in the past about how to 
silence this, I'll try to dig it up and see if there was an agreed solution.

Daniele

>>>       igt@drv_selftest@live_hangcheck:
>>>         fi-skl-gvtdvm:      PASS -> DMESG-FAIL
>>>
>> <7> [464.966238] [drm:guc_fw_xfer [i915]] GuC status 0x20
>> <3> [464.966361] [drm:guc_fw_xfer [i915]] *ERROR* GuC firmware xfer
>> error -110
>>
>> This looks like GuC is stuck very early in the boot flow (even before
>> the RSA check). On SKL there are known issues that could cause this and
>> we should reset GuC and retry, but we aren't. Looks like we indirectly
>> stopped applying  WaEnableuKernelHeaderValidFix and
>> WaEnableGuCBootHashCheckNotSet by not returning -EAGAIN from
>> intel_guc_fw_upload in any case. Michal?
>>
>> Thanks,
>> Daniele
>>
>>> == Known issues ==
>>>
>>>     Here are the changes found in Patchwork_10464 that come from known issues:
>>>
>>>     === IGT changes ===
>>>
>>>       ==== Issues hit ====
>>>
>>>       igt@drv_selftest@live_guc:
>>>         {fi-apl-guc}:       NOTRUN -> DMESG-WARN (fdo#107258)
>>>
>>>       igt@gem_exec_suspend@basic-s4-devices:
>>>         fi-blb-e6850:       PASS -> INCOMPLETE (fdo#107718)
>>>
>>>       igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
>>>         fi-snb-2520m:       PASS -> DMESG-FAIL (fdo#103713)
>>>
>>>       igt@kms_setmode@basic-clone-single-crtc:
>>>         fi-snb-2520m:       PASS -> DMESG-WARN (fdo#103713)
>>>
>>>       igt@pm_backlight@basic-brightness:
>>>         fi-snb-2520m:       PASS -> INCOMPLETE (fdo#103713)
>>>
>>>
>>>       ==== Possible fixes ====
>>>
>>>       igt@drv_selftest@live_gem:
>>>         {fi-apl-guc}:       INCOMPLETE (fdo#106693) -> PASS
>>>
>>>       igt@kms_frontbuffer_tracking@basic:
>>>         fi-byt-clapper:     FAIL (fdo#103167) -> PASS
>>>
>>>
>>>     {name}: This element is suppressed. This means it is ignored when computing
>>>             the status of the difference (SUCCESS, WARNING, or FAILURE).
>>>
>>>     fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
>>>     fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
>>>     fdo#106693 https://bugs.freedesktop.org/show_bug.cgi?id=106693
>>>     fdo#107258 https://bugs.freedesktop.org/show_bug.cgi?id=107258
>>>     fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718
>>>
>>>
>>> == Participating hosts (53 -> 47) ==
>>>
>>>     Missing    (6): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600
>>>
>>>
>>> == Build changes ==
>>>
>>>       * Linux: CI_DRM_4984 -> Patchwork_10464
>>>
>>>     CI_DRM_4984: 90b59df999a13a6405f8d7ece08a69120a9b361a @ git://anongit.freedesktop.org/gfx-ci/linux
>>>     IGT_4678: 9310a1265ceabeec736bdf0a76e1e0357c76c0b1 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
>>>     Patchwork_10464: c88fb110ee8261c636d63f4f6d9fa9440891b3a6 @ git://anongit.freedesktop.org/gfx-ci/linux
>>>
>>>
>>> == Linux commits ==
>>>
>>> c88fb110ee82 HAX enable GuC for CI
>>> 4454d4d05ce3 drm/i915/guc: fix GuC suspend/resume
>>>
>>> == Logs ==
>>>
>>> For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10464/issues.html
>>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ✗ Fi.CI.BAT: failure for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume
  2018-10-16 17:15       ` Daniele Ceraolo Spurio
@ 2018-10-16 18:58         ` Michal Wajdeczko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Wajdeczko @ 2018-10-16 18:58 UTC (permalink / raw)
  To: Daniel Vetter, Daniele Ceraolo Spurio; +Cc: intel-gfx

On Tue, 16 Oct 2018 19:15:19 +0200, Daniele Ceraolo Spurio  
<daniele.ceraolospurio@intel.com> wrote:

>
>
> On 10/16/2018 2:21 AM, Daniel Vetter wrote:
>> On Tue, Oct 16, 2018 at 1:44 AM Daniele Ceraolo Spurio
>> <daniele.ceraolospurio@intel.com> wrote:
>>>
>>>
>>> On 15/10/18 15:47, Patchwork wrote:
>>>> == Series Details ==
>>>>
>>>> Series: series starting with [1/2] drm/i915/guc: fix GuC  
>>>> suspend/resume
>>>> URL   : https://patchwork.freedesktop.org/series/51033/
>>>> State : failure
>>>>
>>>> == Summary ==
>>>>
>>>> = CI Bug Log - changes from CI_DRM_4984 -> Patchwork_10464 =
>>>>
>>>> == Summary - FAILURE ==
>>>>
>>>>     Serious unknown changes coming with Patchwork_10464 absolutely  
>>>> need to be
>>>>     verified manually.
>>>>
>>>>     If you think the reported changes have nothing to do with the  
>>>> changes
>>>>     introduced in Patchwork_10464, please notify your bug team to  
>>>> allow them
>>>>     to document this new failure mode, which will reduce false  
>>>> positives in CI.
>>>>
>>>>     External URL:  
>>>> https://patchwork.freedesktop.org/api/1.0/series/51033/revisions/1/mbox/
>>>>
>>>> == Possible new issues ==
>>>>
>>>>     Here are the unknown changes that may have been introduced in  
>>>> Patchwork_10464:
>>>>
>>>>     === IGT changes ===
>>>>
>>>>       ==== Possible regressions ====
>>>>
>>>>       igt@drv_selftest@live_execlists:
>>>>         fi-skl-6700hq:      PASS -> INCOMPLETE
>>>>
>>> Log seem to be cut for this one. Since it is stopping inside
>>> live_preempt_smoke it is probably a known issue that Chris mentioned.
>>> Can't reproduce on my skylake even with the test in a loop.
>>>
>>>>       igt@drv_selftest@live_guc:
>>>>         fi-kbl-7567u:       PASS -> DMESG-WARN
>>>>         fi-skl-6600u:       PASS -> DMESG-WARN
>>>>         fi-skl-gvtdvm:      PASS -> DMESG-WARN
>>>>         fi-skl-iommu:       PASS -> DMESG-WARN
>>>>         fi-skl-6260u:       PASS -> DMESG-WARN
>>>>         fi-bxt-dsi:         PASS -> DMESG-WARN
>>>>         fi-skl-6700k2:      PASS -> DMESG-WARN
>>>>         fi-whl-u:           PASS -> DMESG-WARN
>>>>         fi-skl-6770hq:      PASS -> DMESG-WARN
>>>>         fi-kbl-7560u:       PASS -> DMESG-WARN
>>>>         fi-kbl-8809g:       PASS -> DMESG-WARN
>>>>         fi-kbl-r:           PASS -> DMESG-WARN
>>>>         fi-kbl-x1275:       PASS -> DMESG-WARN
>>>>         fi-bxt-j4205:       PASS -> DMESG-WARN
>>>>         fi-cfl-s3:          PASS -> DMESG-WARN
>>>>         fi-cfl-8109u:       PASS -> DMESG-WARN
>>>>         fi-kbl-7500u:       PASS -> DMESG-WARN
>>>>         fi-cfl-8700k:       PASS -> DMESG-WARN
>>> These are all:
>>>
>>> [drm:intel_guc_send_mmio [i915]] *ERROR* MMIO: GuC action 0x10 failed
>>> with error -5 0xf000f000
>>>
>>> Which is not a real failure since the test is triggering it on purpose
>> You still need to make them shut up. dmesg errors should only be used
>> for stuff we really don't expect. E.g. gpu hangs provoked by igt also
>> don't result in dmesg errors/warnings and failed tests.
>> -Daniel
>
> I wasn't trying to imply that we don't care that we have a failure or  
> that we shouldn't make it shut up, just that it is not a regression  
> introduced by this patch, because it doesn't even get near that code. I  
> recall that there was a small discussion in the past about how to  
> silence this, I'll try to dig it up and see if there was an agreed  
> solution.

Preferred solution was to remove negative GuC tests from i915 selftests.

Note that this "low level" error message is our guard that we are always
correctly communicating with the GuC, no silent drop of unexpected GuC  
errors.

GuC negative testing shall be done by the fw team.

Michal

>
> Daniele
>
>>>>       igt@drv_selftest@live_hangcheck:
>>>>         fi-skl-gvtdvm:      PASS -> DMESG-FAIL
>>>>
>>> <7> [464.966238] [drm:guc_fw_xfer [i915]] GuC status 0x20
>>> <3> [464.966361] [drm:guc_fw_xfer [i915]] *ERROR* GuC firmware xfer
>>> error -110
>>>
>>> This looks like GuC is stuck very early in the boot flow (even before
>>> the RSA check). On SKL there are known issues that could cause this and
>>> we should reset GuC and retry, but we aren't. Looks like we indirectly
>>> stopped applying  WaEnableuKernelHeaderValidFix and
>>> WaEnableGuCBootHashCheckNotSet by not returning -EAGAIN from
>>> intel_guc_fw_upload in any case. Michal?
>>>
>>> Thanks,
>>> Daniele
>>>
>>>> == Known issues ==
>>>>
>>>>     Here are the changes found in Patchwork_10464 that come from  
>>>> known issues:
>>>>
>>>>     === IGT changes ===
>>>>
>>>>       ==== Issues hit ====
>>>>
>>>>       igt@drv_selftest@live_guc:
>>>>         {fi-apl-guc}:       NOTRUN -> DMESG-WARN (fdo#107258)
>>>>
>>>>       igt@gem_exec_suspend@basic-s4-devices:
>>>>         fi-blb-e6850:       PASS -> INCOMPLETE (fdo#107718)
>>>>
>>>>       igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
>>>>         fi-snb-2520m:       PASS -> DMESG-FAIL (fdo#103713)
>>>>
>>>>       igt@kms_setmode@basic-clone-single-crtc:
>>>>         fi-snb-2520m:       PASS -> DMESG-WARN (fdo#103713)
>>>>
>>>>       igt@pm_backlight@basic-brightness:
>>>>         fi-snb-2520m:       PASS -> INCOMPLETE (fdo#103713)
>>>>
>>>>
>>>>       ==== Possible fixes ====
>>>>
>>>>       igt@drv_selftest@live_gem:
>>>>         {fi-apl-guc}:       INCOMPLETE (fdo#106693) -> PASS
>>>>
>>>>       igt@kms_frontbuffer_tracking@basic:
>>>>         fi-byt-clapper:     FAIL (fdo#103167) -> PASS
>>>>
>>>>
>>>>     {name}: This element is suppressed. This means it is ignored when  
>>>> computing
>>>>             the status of the difference (SUCCESS, WARNING, or  
>>>> FAILURE).
>>>>
>>>>     fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
>>>>     fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
>>>>     fdo#106693 https://bugs.freedesktop.org/show_bug.cgi?id=106693
>>>>     fdo#107258 https://bugs.freedesktop.org/show_bug.cgi?id=107258
>>>>     fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718
>>>>
>>>>
>>>> == Participating hosts (53 -> 47) ==
>>>>
>>>>     Missing    (6): fi-kbl-soraka fi-ilk-m540 fi-hsw-4200u  
>>>> fi-byt-squawks fi-bsw-cyan fi-ctg-p8600
>>>>
>>>>
>>>> == Build changes ==
>>>>
>>>>       * Linux: CI_DRM_4984 -> Patchwork_10464
>>>>
>>>>     CI_DRM_4984: 90b59df999a13a6405f8d7ece08a69120a9b361a @  
>>>> git://anongit.freedesktop.org/gfx-ci/linux
>>>>     IGT_4678: 9310a1265ceabeec736bdf0a76e1e0357c76c0b1 @  
>>>> git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
>>>>     Patchwork_10464: c88fb110ee8261c636d63f4f6d9fa9440891b3a6 @  
>>>> git://anongit.freedesktop.org/gfx-ci/linux
>>>>
>>>>
>>>> == Linux commits ==
>>>>
>>>> c88fb110ee82 HAX enable GuC for CI
>>>> 4454d4d05ce3 drm/i915/guc: fix GuC suspend/resume
>>>>
>>>> == Logs ==
>>>>
>>>> For more details see:  
>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10464/issues.html
>>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-10-16 18:58 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-15 22:10 [PATCH 1/2] drm/i915/guc: fix GuC suspend/resume Daniele Ceraolo Spurio
2018-10-15 22:10 ` [PATCH 2/2] HAX enable GuC for CI Daniele Ceraolo Spurio
2018-10-15 22:21 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/2] drm/i915/guc: fix GuC suspend/resume Patchwork
2018-10-15 22:47 ` ✗ Fi.CI.BAT: failure " Patchwork
2018-10-15 23:43   ` Daniele Ceraolo Spurio
2018-10-16  8:49     ` Chris Wilson
2018-10-16  9:12       ` Michal Wajdeczko
2018-10-16  9:21     ` Daniel Vetter
2018-10-16 17:15       ` Daniele Ceraolo Spurio
2018-10-16 18:58         ` Michal Wajdeczko
2018-10-16  8:53 ` [PATCH 1/2] " Chris Wilson
2018-10-16 10:26 ` Michal Wajdeczko
2018-10-16 16:35   ` Daniele Ceraolo Spurio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.