[PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity
@ 2022-10-10 18:48 ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2022-10-10 18:48 UTC (permalink / raw)
  To: intel-gfx; +Cc: Tony Ye, Daniele Ceraolo Spurio, John Harrison, dri-devel

We're observing sporadic HuC delayed load timeouts in CI, due to mei_pxp
binding completing later than we expected. HuC is still loaded when the
bind occurs, but in the meantime i915 has started allowing submission to
the VCS engines even if HuC is not there.
In most of the cases I've observed, the timeout was due to the
init/resume of another driver between i915 and mei hitting errors and
thus adding an extra delay, but HuC was still loaded before userspace
could submit, because the whole resume process time was increased by the
delays.

Given that there is no upper bound to the delay that can be introduced
by other drivers, I've reached the following compromise with the media
team:

1) i915 is going to bump the timeout to 5s, to reduce the probability
of reaching it. We still expect HuC to be loaded before userspace
starts submitting, so increasing the timeout should have no impact on
normal operations, but in case something weird happens we don't want to
stall video submissions for too long.

2) The media driver will cope with the failing submissions that manage
to go through between i915 init/resume complete and HuC loading, if any
ever happen. This could cause a small corruption of video playback
immediately after a resume (we should be safe on boot because the media
driver polls the HUC_STATUS ioctl before starting submissions).

Since we're accepting the timeout as a valid outcome, I'm also reducing
the print verbosity from error to notice.

v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)

References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index 4d1cc383b681..41c032ab34b3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -52,10 +52,12 @@
  * guaranteed for this to happen during boot, so the big timeout is a safety net
  * that we never expect to need.
  * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs to be resumed
- * and/or reset, this can take longer.
+ * and/or reset, this can take longer. Note that the kernel might schedule
+ * other work between the i915 init/resume and the MEI one, which can add to
+ * the delay.
  */
 #define GSC_INIT_TIMEOUT_MS 10000
-#define PXP_INIT_TIMEOUT_MS 2000
+#define PXP_INIT_TIMEOUT_MS 5000

 static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 				 enum i915_sw_fence_notify state)
@@ -104,8 +106,12 @@ static enum hrtimer_restart huc_delayed_load_timer_callback(struct hrtimer *hrti
 	struct intel_huc *huc = container_of(hrtimer, struct intel_huc, delayed_load.timer);

 	if (!intel_huc_is_authenticated(huc)) {
-		drm_err(&huc_to_gt(huc)->i915->drm,
-			"timed out waiting for GSC init to load HuC\n");
+		if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC)
+			drm_notice(&huc_to_gt(huc)->i915->drm,
+				   "timed out waiting for MEI GSC init to load HuC\n");
+		else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP)
+			drm_notice(&huc_to_gt(huc)->i915->drm,
+				   "timed out waiting for MEI PXP init to load HuC\n");

 		__gsc_init_error(huc);
 	}
-- 
2.37.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity
@ 2022-10-10 18:48 ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 10+ messages in thread
From: Daniele Ceraolo Spurio @ 2022-10-10 18:48 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel

We're observing sporadic HuC delayed load timeouts in CI, due to mei_pxp
binding completing later than we expected. HuC is still loaded when the
bind occurs, but in the meantime i915 has started allowing submission to
the VCS engines even if HuC is not there.
In most of the cases I've observed, the timeout was due to the
init/resume of another driver between i915 and mei hitting errors and
thus adding an extra delay, but HuC was still loaded before userspace
could submit, because the whole resume process time was increased by the
delays.

Given that there is no upper bound to the delay that can be introduced
by other drivers, I've reached the following compromise with the media
team:

1) i915 is going to bump the timeout to 5s, to reduce the probability
of reaching it. We still expect HuC to be loaded before userspace
starts submitting, so increasing the timeout should have no impact on
normal operations, but in case something weird happens we don't want to
stall video submissions for too long.

2) The media driver will cope with the failing submissions that manage
to go through between i915 init/resume complete and HuC loading, if any
ever happen. This could cause a small corruption of video playback
immediately after a resume (we should be safe on boot because the media
driver polls the HUC_STATUS ioctl before starting submissions).

Since we're accepting the timeout as a valid outcome, I'm also reducing
the print verbosity from error to notice.

v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)

References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
index 4d1cc383b681..41c032ab34b3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
@@ -52,10 +52,12 @@
  * guaranteed for this to happen during boot, so the big timeout is a safety net
  * that we never expect to need.
  * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs to be resumed
- * and/or reset, this can take longer.
+ * and/or reset, this can take longer. Note that the kernel might schedule
+ * other work between the i915 init/resume and the MEI one, which can add to
+ * the delay.
  */
 #define GSC_INIT_TIMEOUT_MS 10000
-#define PXP_INIT_TIMEOUT_MS 2000
+#define PXP_INIT_TIMEOUT_MS 5000

 static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
 				 enum i915_sw_fence_notify state)
@@ -104,8 +106,12 @@ static enum hrtimer_restart huc_delayed_load_timer_callback(struct hrtimer *hrti
 	struct intel_huc *huc = container_of(hrtimer, struct intel_huc, delayed_load.timer);

 	if (!intel_huc_is_authenticated(huc)) {
-		drm_err(&huc_to_gt(huc)->i915->drm,
-			"timed out waiting for GSC init to load HuC\n");
+		if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC)
+			drm_notice(&huc_to_gt(huc)->i915->drm,
+				   "timed out waiting for MEI GSC init to load HuC\n");
+		else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP)
+			drm_notice(&huc_to_gt(huc)->i915->drm,
+				   "timed out waiting for MEI PXP init to load HuC\n");

 		__gsc_init_error(huc);
 	}
-- 
2.37.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/huc: bump timeout for delayed load and reduce print verbosity (rev2)
  2022-10-10 18:48 ` [Intel-gfx] " Daniele Ceraolo Spurio
  (?)
@ 2022-10-10 20:37 ` Patchwork
  -1 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2022-10-10 20:37 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 4956 bytes --]

== Series Details ==

Series: drm/i915/huc: bump timeout for delayed load and reduce print verbosity (rev2)
URL   : https://patchwork.freedesktop.org/series/109455/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12231 -> Patchwork_109455v2
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/index.html

Participating hosts (46 -> 43)
------------------------------

  Missing    (3): fi-ctg-p8600 fi-icl-u2 fi-tgl-dsi 

Known issues
------------

  Here are the changes found in Patchwork_109455v2 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_suspend@basic-s2idle-without-i915:
    - fi-apl-guc:         [PASS][1] -> [DMESG-WARN][2] ([i915#180] / [i915#5904] / [i915#62])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/fi-apl-guc/igt@i915_suspend@basic-s2idle-without-i915.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/fi-apl-guc/igt@i915_suspend@basic-s2idle-without-i915.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-ivb-3770:        NOTRUN -> [SKIP][3] ([fdo#109271] / [fdo#111827])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/fi-ivb-3770/igt@kms_chamelium@common-hpd-after-suspend.html
    - fi-bdw-5557u:       NOTRUN -> [SKIP][4] ([fdo#109271] / [fdo#111827])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/fi-bdw-5557u/igt@kms_chamelium@common-hpd-after-suspend.html

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s0@smem:
    - {bat-rplp-1}:       [DMESG-WARN][5] ([i915#2867]) -> [PASS][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/bat-rplp-1/igt@gem_exec_suspend@basic-s0@smem.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/bat-rplp-1/igt@gem_exec_suspend@basic-s0@smem.html
    - {bat-adlm-1}:       [DMESG-WARN][7] ([i915#2867]) -> [PASS][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/bat-adlm-1/igt@gem_exec_suspend@basic-s0@smem.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/bat-adlm-1/igt@gem_exec_suspend@basic-s0@smem.html

  * igt@gem_exec_suspend@basic-s3@lmem0:
    - {bat-dg2-11}:       [DMESG-WARN][9] ([i915#6816]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/bat-dg2-11/igt@gem_exec_suspend@basic-s3@lmem0.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/bat-dg2-11/igt@gem_exec_suspend@basic-s3@lmem0.html

  * igt@i915_selftest@live@hangcheck:
    - fi-ivb-3770:        [INCOMPLETE][11] -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/fi-ivb-3770/igt@i915_selftest@live@hangcheck.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/fi-ivb-3770/igt@i915_selftest@live@hangcheck.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-bdw-5557u:       [INCOMPLETE][13] ([i915#146] / [i915#6712]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/fi-bdw-5557u/igt@i915_suspend@basic-s3-without-i915.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/fi-bdw-5557u/igt@i915_suspend@basic-s3-without-i915.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#146]: https://gitlab.freedesktop.org/drm/intel/issues/146
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867
  [i915#5278]: https://gitlab.freedesktop.org/drm/intel/issues/5278
  [i915#5904]: https://gitlab.freedesktop.org/drm/intel/issues/5904
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#6367]: https://gitlab.freedesktop.org/drm/intel/issues/6367
  [i915#6712]: https://gitlab.freedesktop.org/drm/intel/issues/6712
  [i915#6816]: https://gitlab.freedesktop.org/drm/intel/issues/6816
  [i915#6997]: https://gitlab.freedesktop.org/drm/intel/issues/6997


Build changes
-------------

  * Linux: CI_DRM_12231 -> Patchwork_109455v2

  CI-20190529: 20190529
  CI_DRM_12231: bb84c1baa34eed834400e9a3cf9642840be002e1 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7007: 39a979fb4453c557022f0477c609afe10a049e48 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_109455v2: bb84c1baa34eed834400e9a3cf9642840be002e1 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

6913e22cc010 drm/i915/huc: bump timeout for delayed load and reduce print verbosity

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/index.html

[-- Attachment #2: Type: text/html, Size: 5675 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity
  2022-10-10 18:48 ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2022-10-10 22:50   ` John Harrison
  -1 siblings, 0 replies; 10+ messages in thread
From: John Harrison @ 2022-10-10 22:50 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx; +Cc: Tony Ye, dri-devel

On 10/10/2022 11:48, Daniele Ceraolo Spurio wrote:
> We're observing sporadic HuC delayed load timeouts in CI, due to mei_pxp
> binding completing later than we expected. HuC is still loaded when the
> bind occurs, but in the meantime i915 has started allowing submission to
> the VCS engines even if HuC is not there.
> In most of the cases I've observed, the timeout was due to the
> init/resume of another driver between i915 and mei hitting errors and
> thus adding an extra delay, but HuC was still loaded before userspace
> could submit, because the whole resume process time was increased by the
> delays.
>
> Given that there is no upper bound to the delay that can be introduced
> by other drivers, I've reached the following compromise with the media
> team:
>
> 1) i915 is going to bump the timeout to 5s, to reduce the probability
> of reaching it. We still expect HuC to be loaded before userspace
> starts submitting, so increasing the timeout should have no impact on
> normal operations, but in case something weird happens we don't want to
> stall video submissions for too long.
>
> 2) The media driver will cope with the failing submissions that manage
> to go through between i915 init/resume complete and HuC loading, if any
> ever happen. This could cause a small corruption of video playback
> immediately after a resume (we should be safe on boot because the media
> driver polls the HUC_STATUS ioctl before starting submissions).
>
> Since we're accepting the timeout as a valid outcome, I'm also reducing
> the print verbosity from error to notice.
>
> v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)
>
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: John Harrison <john.c.harrison@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index 4d1cc383b681..41c032ab34b3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -52,10 +52,12 @@
>    * guaranteed for this to happen during boot, so the big timeout is a safety net
>    * that we never expect to need.
>    * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs to be resumed
> - * and/or reset, this can take longer.
> + * and/or reset, this can take longer. Note that the kernel might schedule
> + * other work between the i915 init/resume and the MEI one, which can add to
> + * the delay.
>    */
>   #define GSC_INIT_TIMEOUT_MS 10000
> -#define PXP_INIT_TIMEOUT_MS 2000
> +#define PXP_INIT_TIMEOUT_MS 5000
>   
>   static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
>   				 enum i915_sw_fence_notify state)
> @@ -104,8 +106,12 @@ static enum hrtimer_restart huc_delayed_load_timer_callback(struct hrtimer *hrti
>   	struct intel_huc *huc = container_of(hrtimer, struct intel_huc, delayed_load.timer);
>   
>   	if (!intel_huc_is_authenticated(huc)) {
> -		drm_err(&huc_to_gt(huc)->i915->drm,
> -			"timed out waiting for GSC init to load HuC\n");
> +		if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC)
> +			drm_notice(&huc_to_gt(huc)->i915->drm,
> +				   "timed out waiting for MEI GSC init to load HuC\n");
> +		else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP)
> +			drm_notice(&huc_to_gt(huc)->i915->drm,
> +				   "timed out waiting for MEI PXP init to load HuC\n");
Hmm. It feels wrong to assume that the status can only be GSC or PXP. 
Granted, it certainly should be one of those two values if we get here. 
But it just seems like bad practice to have a partial if/elsif ladder 
for an enum value instead of a switch with a default path. What I meant 
originally was just have a single print that says 'timed out waiting for 
MEI GSC/PXP to load...', maybe with the status value being printed. I 
don't think it is critically important to differentiate between the two, 
I just meant that it was wrong to explicitly state GSC when it might not be.

John.

>   
>   		__gsc_init_error(huc);
>   	}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity
@ 2022-10-10 22:50   ` John Harrison
  0 siblings, 0 replies; 10+ messages in thread
From: John Harrison @ 2022-10-10 22:50 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx; +Cc: dri-devel

On 10/10/2022 11:48, Daniele Ceraolo Spurio wrote:
> We're observing sporadic HuC delayed load timeouts in CI, due to mei_pxp
> binding completing later than we expected. HuC is still loaded when the
> bind occurs, but in the meantime i915 has started allowing submission to
> the VCS engines even if HuC is not there.
> In most of the cases I've observed, the timeout was due to the
> init/resume of another driver between i915 and mei hitting errors and
> thus adding an extra delay, but HuC was still loaded before userspace
> could submit, because the whole resume process time was increased by the
> delays.
>
> Given that there is no upper bound to the delay that can be introduced
> by other drivers, I've reached the following compromise with the media
> team:
>
> 1) i915 is going to bump the timeout to 5s, to reduce the probability
> of reaching it. We still expect HuC to be loaded before userspace
> starts submitting, so increasing the timeout should have no impact on
> normal operations, but in case something weird happens we don't want to
> stall video submissions for too long.
>
> 2) The media driver will cope with the failing submissions that manage
> to go through between i915 init/resume complete and HuC loading, if any
> ever happen. This could cause a small corruption of video playback
> immediately after a resume (we should be safe on boot because the media
> driver polls the HUC_STATUS ioctl before starting submissions).
>
> Since we're accepting the timeout as a valid outcome, I'm also reducing
> the print verbosity from error to notice.
>
> v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)
>
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a fence")
> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Tony Ye <tony.ye@intel.com>
> Cc: John Harrison <john.c.harrison@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> index 4d1cc383b681..41c032ab34b3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
> @@ -52,10 +52,12 @@
>    * guaranteed for this to happen during boot, so the big timeout is a safety net
>    * that we never expect to need.
>    * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs to be resumed
> - * and/or reset, this can take longer.
> + * and/or reset, this can take longer. Note that the kernel might schedule
> + * other work between the i915 init/resume and the MEI one, which can add to
> + * the delay.
>    */
>   #define GSC_INIT_TIMEOUT_MS 10000
> -#define PXP_INIT_TIMEOUT_MS 2000
> +#define PXP_INIT_TIMEOUT_MS 5000
>   
>   static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
>   				 enum i915_sw_fence_notify state)
> @@ -104,8 +106,12 @@ static enum hrtimer_restart huc_delayed_load_timer_callback(struct hrtimer *hrti
>   	struct intel_huc *huc = container_of(hrtimer, struct intel_huc, delayed_load.timer);
>   
>   	if (!intel_huc_is_authenticated(huc)) {
> -		drm_err(&huc_to_gt(huc)->i915->drm,
> -			"timed out waiting for GSC init to load HuC\n");
> +		if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC)
> +			drm_notice(&huc_to_gt(huc)->i915->drm,
> +				   "timed out waiting for MEI GSC init to load HuC\n");
> +		else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP)
> +			drm_notice(&huc_to_gt(huc)->i915->drm,
> +				   "timed out waiting for MEI PXP init to load HuC\n");
Hmm. It feels wrong to assume that the status can only be GSC or PXP. 
Granted, it certainly should be one of those two values if we get here. 
But it just seems like bad practice to have a partial if/elsif ladder 
for an enum value instead of a switch with a default path. What I meant 
originally was just have a single print that says 'timed out waiting for 
MEI GSC/PXP to load...', maybe with the status value being printed. I 
don't think it is critically important to differentiate between the two, 
I just meant that it was wrong to explicitly state GSC when it might not be.

John.

>   
>   		__gsc_init_error(huc);
>   	}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity
  2022-10-10 22:50   ` [Intel-gfx] " John Harrison
@ 2022-10-10 22:57     ` Ceraolo Spurio, Daniele
  -1 siblings, 0 replies; 10+ messages in thread
From: Ceraolo Spurio, Daniele @ 2022-10-10 22:57 UTC (permalink / raw)
  To: John Harrison, intel-gfx; +Cc: Tony Ye, dri-devel



On 10/10/2022 3:50 PM, John Harrison wrote:
> On 10/10/2022 11:48, Daniele Ceraolo Spurio wrote:
>> We're observing sporadic HuC delayed load timeouts in CI, due to mei_pxp
>> binding completing later than we expected. HuC is still loaded when the
>> bind occurs, but in the meantime i915 has started allowing submission to
>> the VCS engines even if HuC is not there.
>> In most of the cases I've observed, the timeout was due to the
>> init/resume of another driver between i915 and mei hitting errors and
>> thus adding an extra delay, but HuC was still loaded before userspace
>> could submit, because the whole resume process time was increased by the
>> delays.
>>
>> Given that there is no upper bound to the delay that can be introduced
>> by other drivers, I've reached the following compromise with the media
>> team:
>>
>> 1) i915 is going to bump the timeout to 5s, to reduce the probability
>> of reaching it. We still expect HuC to be loaded before userspace
>> starts submitting, so increasing the timeout should have no impact on
>> normal operations, but in case something weird happens we don't want to
>> stall video submissions for too long.
>>
>> 2) The media driver will cope with the failing submissions that manage
>> to go through between i915 init/resume complete and HuC loading, if any
>> ever happen. This could cause a small corruption of video playback
>> immediately after a resume (we should be safe on boot because the media
>> driver polls the HUC_STATUS ioctl before starting submissions).
>>
>> Since we're accepting the timeout as a valid outcome, I'm also reducing
>> the print verbosity from error to notice.
>>
>> v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)
>>
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
>> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a 
>> fence")
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Tony Ye <tony.ye@intel.com>
>> Cc: John Harrison <john.c.harrison@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++++++----
>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> index 4d1cc383b681..41c032ab34b3 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> @@ -52,10 +52,12 @@
>>    * guaranteed for this to happen during boot, so the big timeout is 
>> a safety net
>>    * that we never expect to need.
>>    * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs to 
>> be resumed
>> - * and/or reset, this can take longer.
>> + * and/or reset, this can take longer. Note that the kernel might 
>> schedule
>> + * other work between the i915 init/resume and the MEI one, which 
>> can add to
>> + * the delay.
>>    */
>>   #define GSC_INIT_TIMEOUT_MS 10000
>> -#define PXP_INIT_TIMEOUT_MS 2000
>> +#define PXP_INIT_TIMEOUT_MS 5000
>>     static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
>>                    enum i915_sw_fence_notify state)
>> @@ -104,8 +106,12 @@ static enum hrtimer_restart 
>> huc_delayed_load_timer_callback(struct hrtimer *hrti
>>       struct intel_huc *huc = container_of(hrtimer, struct intel_huc, 
>> delayed_load.timer);
>>         if (!intel_huc_is_authenticated(huc)) {
>> -        drm_err(&huc_to_gt(huc)->i915->drm,
>> -            "timed out waiting for GSC init to load HuC\n");
>> +        if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC)
>> +            drm_notice(&huc_to_gt(huc)->i915->drm,
>> +                   "timed out waiting for MEI GSC init to load HuC\n");
>> +        else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP)
>> +            drm_notice(&huc_to_gt(huc)->i915->drm,
>> +                   "timed out waiting for MEI PXP init to load HuC\n");
> Hmm. It feels wrong to assume that the status can only be GSC or PXP. 
> Granted, it certainly should be one of those two values if we get 
> here. But it just seems like bad practice to have a partial if/elsif 
> ladder for an enum value instead of a switch with a default path. What 
> I meant originally was just have a single print that says 'timed out 
> waiting for MEI GSC/PXP to load...', maybe with the status value being 
> printed. I don't think it is critically important to differentiate 
> between the two, I just meant that it was wrong to explicitly state 
> GSC when it might not be.

It is guaranteed that the state is one of those 2 in this callback. The 
only other option is an error state, which we set from the 
__gsc_init_error() below. I can make it an unconditional else, or add an 
else MISSING_CASE(status) if you think that works better, but given the 
errors we've seen I'd prefer to keep the prints separate for extra clarity.

Daniele

>
> John.
>
>>             __gsc_init_error(huc);
>>       }
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity
@ 2022-10-10 22:57     ` Ceraolo Spurio, Daniele
  0 siblings, 0 replies; 10+ messages in thread
From: Ceraolo Spurio, Daniele @ 2022-10-10 22:57 UTC (permalink / raw)
  To: John Harrison, intel-gfx; +Cc: dri-devel



On 10/10/2022 3:50 PM, John Harrison wrote:
> On 10/10/2022 11:48, Daniele Ceraolo Spurio wrote:
>> We're observing sporadic HuC delayed load timeouts in CI, due to mei_pxp
>> binding completing later than we expected. HuC is still loaded when the
>> bind occurs, but in the meantime i915 has started allowing submission to
>> the VCS engines even if HuC is not there.
>> In most of the cases I've observed, the timeout was due to the
>> init/resume of another driver between i915 and mei hitting errors and
>> thus adding an extra delay, but HuC was still loaded before userspace
>> could submit, because the whole resume process time was increased by the
>> delays.
>>
>> Given that there is no upper bound to the delay that can be introduced
>> by other drivers, I've reached the following compromise with the media
>> team:
>>
>> 1) i915 is going to bump the timeout to 5s, to reduce the probability
>> of reaching it. We still expect HuC to be loaded before userspace
>> starts submitting, so increasing the timeout should have no impact on
>> normal operations, but in case something weird happens we don't want to
>> stall video submissions for too long.
>>
>> 2) The media driver will cope with the failing submissions that manage
>> to go through between i915 init/resume complete and HuC loading, if any
>> ever happen. This could cause a small corruption of video playback
>> immediately after a resume (we should be safe on boot because the media
>> driver polls the HUC_STATUS ioctl before starting submissions).
>>
>> Since we're accepting the timeout as a valid outcome, I'm also reducing
>> the print verbosity from error to notice.
>>
>> v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)
>>
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
>> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a 
>> fence")
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Tony Ye <tony.ye@intel.com>
>> Cc: John Harrison <john.c.harrison@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++++++----
>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c 
>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> index 4d1cc383b681..41c032ab34b3 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>> @@ -52,10 +52,12 @@
>>    * guaranteed for this to happen during boot, so the big timeout is 
>> a safety net
>>    * that we never expect to need.
>>    * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs to 
>> be resumed
>> - * and/or reset, this can take longer.
>> + * and/or reset, this can take longer. Note that the kernel might 
>> schedule
>> + * other work between the i915 init/resume and the MEI one, which 
>> can add to
>> + * the delay.
>>    */
>>   #define GSC_INIT_TIMEOUT_MS 10000
>> -#define PXP_INIT_TIMEOUT_MS 2000
>> +#define PXP_INIT_TIMEOUT_MS 5000
>>     static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
>>                    enum i915_sw_fence_notify state)
>> @@ -104,8 +106,12 @@ static enum hrtimer_restart 
>> huc_delayed_load_timer_callback(struct hrtimer *hrti
>>       struct intel_huc *huc = container_of(hrtimer, struct intel_huc, 
>> delayed_load.timer);
>>         if (!intel_huc_is_authenticated(huc)) {
>> -        drm_err(&huc_to_gt(huc)->i915->drm,
>> -            "timed out waiting for GSC init to load HuC\n");
>> +        if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC)
>> +            drm_notice(&huc_to_gt(huc)->i915->drm,
>> +                   "timed out waiting for MEI GSC init to load HuC\n");
>> +        else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP)
>> +            drm_notice(&huc_to_gt(huc)->i915->drm,
>> +                   "timed out waiting for MEI PXP init to load HuC\n");
> Hmm. It feels wrong to assume that the status can only be GSC or PXP. 
> Granted, it certainly should be one of those two values if we get 
> here. But it just seems like bad practice to have a partial if/elsif 
> ladder for an enum value instead of a switch with a default path. What 
> I meant originally was just have a single print that says 'timed out 
> waiting for MEI GSC/PXP to load...', maybe with the status value being 
> printed. I don't think it is critically important to differentiate 
> between the two, I just meant that it was wrong to explicitly state 
> GSC when it might not be.

It is guaranteed that the state is one of those 2 in this callback. The 
only other option is an error state, which we set from the 
__gsc_init_error() below. I can make it an unconditional else, or add an 
else MISSING_CASE(status) if you think that works better, but given the 
errors we've seen I'd prefer to keep the prints separate for extra clarity.

Daniele

>
> John.
>
>>             __gsc_init_error(huc);
>>       }
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/huc: bump timeout for delayed load and reduce print verbosity (rev2)
  2022-10-10 18:48 ` [Intel-gfx] " Daniele Ceraolo Spurio
                   ` (2 preceding siblings ...)
  (?)
@ 2022-10-11  3:36 ` Patchwork
  -1 siblings, 0 replies; 10+ messages in thread
From: Patchwork @ 2022-10-11  3:36 UTC (permalink / raw)
  To: Ceraolo Spurio, Daniele; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 25024 bytes --]

== Series Details ==

Series: drm/i915/huc: bump timeout for delayed load and reduce print verbosity (rev2)
URL   : https://patchwork.freedesktop.org/series/109455/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_12231_full -> Patchwork_109455v2_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (9 -> 9)
------------------------------

  No changes in participating hosts

Known issues
------------

  Here are the changes found in Patchwork_109455v2_full that come from known issues:

### CI changes ###

#### Issues hit ####

  * boot:
    - shard-snb:          ([PASS][1], [PASS][2], [PASS][3], [PASS][4], [PASS][5], [PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], [PASS][25]) -> ([PASS][26], [PASS][27], [PASS][28], [PASS][29], [PASS][30], [FAIL][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36], [PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], [PASS][49], [PASS][50]) ([i915#4338])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb6/boot.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb6/boot.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb5/boot.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb5/boot.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb5/boot.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb5/boot.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb5/boot.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb6/boot.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb6/boot.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb6/boot.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb2/boot.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb2/boot.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb7/boot.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb2/boot.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb2/boot.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb7/boot.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb2/boot.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb2/boot.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb7/boot.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb7/boot.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb4/boot.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb7/boot.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb4/boot.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb4/boot.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-snb4/boot.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb7/boot.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb7/boot.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb7/boot.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb7/boot.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb6/boot.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb6/boot.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb6/boot.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb6/boot.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb6/boot.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb6/boot.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb5/boot.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb5/boot.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb5/boot.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb5/boot.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb5/boot.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb4/boot.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb4/boot.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb4/boot.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb4/boot.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb4/boot.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb2/boot.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb2/boot.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb2/boot.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb2/boot.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-snb2/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_isolation@preservation-s3@rcs0:
    - shard-apl:          [PASS][51] -> [DMESG-WARN][52] ([i915#180])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-apl8/igt@gem_ctx_isolation@preservation-s3@rcs0.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl6/igt@gem_ctx_isolation@preservation-s3@rcs0.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [PASS][53] -> [SKIP][54] ([i915#2190])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-tglb3/igt@gem_huc_copy@huc-copy.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-tglb7/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@random:
    - shard-skl:          NOTRUN -> [SKIP][55] ([fdo#109271] / [i915#4613]) +2 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl1/igt@gem_lmem_swapping@random.html
    - shard-apl:          NOTRUN -> [SKIP][56] ([fdo#109271] / [i915#4613])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl7/igt@gem_lmem_swapping@random.html

  * igt@gem_userptr_blits@input-checking:
    - shard-apl:          NOTRUN -> [DMESG-WARN][57] ([i915#4991])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl7/igt@gem_userptr_blits@input-checking.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-skl:          NOTRUN -> [FAIL][58] ([i915#454])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl1/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-iclb:         [PASS][59] -> [FAIL][60] ([i915#3989] / [i915#454])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb2/igt@i915_pm_dc@dc6-psr.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb8/igt@i915_pm_dc@dc6-psr.html

  * igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-skl:          NOTRUN -> [SKIP][61] ([fdo#109271] / [i915#3886]) +7 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl1/igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_mc_ccs.html
    - shard-apl:          NOTRUN -> [SKIP][62] ([fdo#109271] / [i915#3886])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl7/igt@kms_ccs@pipe-a-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_chamelium@hdmi-edid-read:
    - shard-glk:          NOTRUN -> [SKIP][63] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-glk9/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_color_chamelium@ctm-limited-range:
    - shard-skl:          NOTRUN -> [SKIP][64] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl1/igt@kms_color_chamelium@ctm-limited-range.html

  * igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions:
    - shard-glk:          [PASS][65] -> [FAIL][66] ([i915#2346])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-glk7/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-glk3/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions.html

  * igt@kms_flip@flip-vs-expired-vblank@b-dp1:
    - shard-apl:          [PASS][67] -> [FAIL][68] ([i915#79])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-apl6/igt@kms_flip@flip-vs-expired-vblank@b-dp1.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl1/igt@kms_flip@flip-vs-expired-vblank@b-dp1.html

  * igt@kms_flip@plain-flip-fb-recreate-interruptible@c-edp1:
    - shard-skl:          [PASS][69] -> [FAIL][70] ([i915#2122])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-skl10/igt@kms_flip@plain-flip-fb-recreate-interruptible@c-edp1.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl6/igt@kms_flip@plain-flip-fb-recreate-interruptible@c-edp1.html

  * igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-16bpp-4tile-downscaling@pipe-a-valid-mode:
    - shard-iclb:         NOTRUN -> [SKIP][71] ([i915#2587] / [i915#2672]) +1 similar issue
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb8/igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-16bpp-4tile-downscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-32bpp-yftile-upscaling@pipe-a-default-mode:
    - shard-iclb:         NOTRUN -> [SKIP][72] ([i915#2672]) +1 similar issue
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-32bpp-yftile-upscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-16bpp-ytile-downscaling@pipe-a-default-mode:
    - shard-iclb:         [PASS][73] -> [SKIP][74] ([i915#3555])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb3/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-16bpp-ytile-downscaling@pipe-a-default-mode.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-16bpp-ytile-downscaling@pipe-a-default-mode.html

  * igt@kms_frontbuffer_tracking@fbcpsr-slowdraw:
    - shard-glk:          NOTRUN -> [SKIP][75] ([fdo#109271]) +12 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-glk9/igt@kms_frontbuffer_tracking@fbcpsr-slowdraw.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-blt:
    - shard-skl:          [PASS][76] -> [DMESG-WARN][77] ([i915#1982]) +1 similar issue
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-skl3/igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-blt.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl5/igt@kms_frontbuffer_tracking@psr-1p-primscrn-cur-indfb-draw-blt.html

  * igt@kms_plane_alpha_blend@alpha-basic@pipe-a-edp-1:
    - shard-skl:          NOTRUN -> [FAIL][78] ([i915#4573]) +2 similar issues
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl9/igt@kms_plane_alpha_blend@alpha-basic@pipe-a-edp-1.html

  * igt@kms_psr2_sf@overlay-plane-move-continuous-sf:
    - shard-skl:          NOTRUN -> [SKIP][79] ([fdo#109271] / [i915#658])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl1/igt@kms_psr2_sf@overlay-plane-move-continuous-sf.html

  * igt@kms_psr2_su@page_flip-p010:
    - shard-iclb:         NOTRUN -> [SKIP][80] ([fdo#109642] / [fdo#111068] / [i915#658])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb8/igt@kms_psr2_su@page_flip-p010.html

  * igt@kms_psr@psr2_sprite_render:
    - shard-iclb:         [PASS][81] -> [SKIP][82] ([fdo#109441]) +2 similar issues
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb2/igt@kms_psr@psr2_sprite_render.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb8/igt@kms_psr@psr2_sprite_render.html

  * igt@kms_rotation_crc@primary-4-tiled-reflect-x-180:
    - shard-skl:          NOTRUN -> [SKIP][83] ([fdo#109271]) +103 similar issues
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl9/igt@kms_rotation_crc@primary-4-tiled-reflect-x-180.html

  * igt@kms_scaling_modes@scaling-mode-center:
    - shard-apl:          NOTRUN -> [SKIP][84] ([fdo#109271]) +26 similar issues
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl6/igt@kms_scaling_modes@scaling-mode-center.html

  * igt@kms_writeback@writeback-invalid-parameters:
    - shard-apl:          NOTRUN -> [SKIP][85] ([fdo#109271] / [i915#2437])
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl7/igt@kms_writeback@writeback-invalid-parameters.html
    - shard-skl:          NOTRUN -> [SKIP][86] ([fdo#109271] / [i915#2437])
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl1/igt@kms_writeback@writeback-invalid-parameters.html

  * igt@perf@polling-parameterized:
    - shard-apl:          [PASS][87] -> [FAIL][88] ([i915#5639])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-apl7/igt@perf@polling-parameterized.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl2/igt@perf@polling-parameterized.html
    - shard-glk:          [PASS][89] -> [FAIL][90] ([i915#5639])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-glk6/igt@perf@polling-parameterized.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-glk6/igt@perf@polling-parameterized.html

  * igt@perf@stress-open-close:
    - shard-glk:          [PASS][91] -> [INCOMPLETE][92] ([i915#5213])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-glk9/igt@perf@stress-open-close.html
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-glk7/igt@perf@stress-open-close.html

  * igt@sysfs_clients@fair-1:
    - shard-skl:          NOTRUN -> [SKIP][93] ([fdo#109271] / [i915#2994])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl9/igt@sysfs_clients@fair-1.html

  
#### Possible fixes ####

  * igt@gem_exec_balancer@parallel-keep-in-fence:
    - shard-iclb:         [SKIP][94] ([i915#4525]) -> [PASS][95]
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb5/igt@gem_exec_balancer@parallel-keep-in-fence.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb1/igt@gem_exec_balancer@parallel-keep-in-fence.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-glk:          [FAIL][96] ([i915#2842]) -> [PASS][97]
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-glk8/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-glk5/igt@gem_exec_fair@basic-pace-share@rcs0.html
    - shard-apl:          [FAIL][98] ([i915#2842]) -> [PASS][99]
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-apl7/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl3/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-glk:          [DMESG-WARN][100] ([i915#5566] / [i915#716]) -> [PASS][101]
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-glk2/igt@gen9_exec_parse@allowed-all.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-glk9/igt@gen9_exec_parse@allowed-all.html

  * igt@i915_selftest@live@hangcheck:
    - shard-tglb:         [DMESG-WARN][102] ([i915#5591]) -> [PASS][103]
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-tglb2/igt@i915_selftest@live@hangcheck.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-tglb1/igt@i915_selftest@live@hangcheck.html

  * igt@kms_addfb_basic@legacy-format:
    - shard-tglb:         [INCOMPLETE][104] ([i915#6987]) -> [PASS][105]
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-tglb3/igt@kms_addfb_basic@legacy-format.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-tglb7/igt@kms_addfb_basic@legacy-format.html

  * igt@kms_async_flips@alternate-sync-async-flip@pipe-c-edp-1:
    - shard-skl:          [FAIL][106] ([i915#2521]) -> [PASS][107]
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-skl7/igt@kms_async_flips@alternate-sync-async-flip@pipe-c-edp-1.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl9/igt@kms_async_flips@alternate-sync-async-flip@pipe-c-edp-1.html

  * igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic:
    - shard-skl:          [FAIL][108] ([i915#2346]) -> [PASS][109]
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-skl6/igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl10/igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1:
    - shard-skl:          [FAIL][110] ([i915#79]) -> [PASS][111]
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-skl4/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl7/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@b-dp1:
    - shard-apl:          [DMESG-WARN][112] ([i915#180]) -> [PASS][113] +2 similar issues
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-apl2/igt@kms_flip@flip-vs-suspend-interruptible@b-dp1.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-apl7/igt@kms_flip@flip-vs-suspend-interruptible@b-dp1.html

  * igt@kms_flip@flip-vs-suspend@b-edp1:
    - shard-skl:          [INCOMPLETE][114] ([i915#4839]) -> [PASS][115]
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-skl3/igt@kms_flip@flip-vs-suspend@b-edp1.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl4/igt@kms_flip@flip-vs-suspend@b-edp1.html

  * igt@kms_flip@plain-flip-ts-check@c-edp1:
    - shard-skl:          [FAIL][116] ([i915#2122]) -> [PASS][117] +3 similar issues
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-skl6/igt@kms_flip@plain-flip-ts-check@c-edp1.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl9/igt@kms_flip@plain-flip-ts-check@c-edp1.html

  * igt@kms_psr@psr2_cursor_blt:
    - shard-iclb:         [SKIP][118] ([fdo#109441]) -> [PASS][119] +1 similar issue
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb3/igt@kms_psr@psr2_cursor_blt.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb2/igt@kms_psr@psr2_cursor_blt.html

  
#### Warnings ####

  * igt@gem_exec_balancer@parallel-ordering:
    - shard-iclb:         [FAIL][120] ([i915#6117]) -> [SKIP][121] ([i915#4525])
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb1/igt@gem_exec_balancer@parallel-ordering.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb6/igt@gem_exec_balancer@parallel-ordering.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-blt:
    - shard-skl:          [SKIP][122] ([fdo#109271]) -> [SKIP][123] ([fdo#109271] / [i915#1888]) +1 similar issue
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-skl9/igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-blt.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-skl3/igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-blt.html

  * igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf:
    - shard-iclb:         [SKIP][124] ([i915#658]) -> [SKIP][125] ([i915#2920])
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb3/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb2/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area:
    - shard-iclb:         [SKIP][126] ([i915#2920]) -> [SKIP][127] ([fdo#111068] / [i915#658])
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb2/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb8/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area:
    - shard-iclb:         [SKIP][128] ([fdo#111068] / [i915#658]) -> [SKIP][129] ([i915#2920])
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12231/shard-iclb3/igt@kms_psr2_sf@plane-move-sf-dmg-area.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/shard-iclb2/igt@kms_psr2_sf@plane-move-sf-dmg-area.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2521]: https://gitlab.freedesktop.org/drm/intel/issues/2521
  [i915#2587]: https://gitlab.freedesktop.org/drm/intel/issues/2587
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3989]: https://gitlab.freedesktop.org/drm/intel/issues/3989
  [i915#4338]: https://gitlab.freedesktop.org/drm/intel/issues/4338
  [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#4573]: https://gitlab.freedesktop.org/drm/intel/issues/4573
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4839]: https://gitlab.freedesktop.org/drm/intel/issues/4839
  [i915#4991]: https://gitlab.freedesktop.org/drm/intel/issues/4991
  [i915#5213]: https://gitlab.freedesktop.org/drm/intel/issues/5213
  [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566
  [i915#5591]: https://gitlab.freedesktop.org/drm/intel/issues/5591
  [i915#5639]: https://gitlab.freedesktop.org/drm/intel/issues/5639
  [i915#6117]: https://gitlab.freedesktop.org/drm/intel/issues/6117
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#6987]: https://gitlab.freedesktop.org/drm/intel/issues/6987
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79


Build changes
-------------

  * Linux: CI_DRM_12231 -> Patchwork_109455v2

  CI-20190529: 20190529
  CI_DRM_12231: bb84c1baa34eed834400e9a3cf9642840be002e1 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_7007: 39a979fb4453c557022f0477c609afe10a049e48 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_109455v2: bb84c1baa34eed834400e9a3cf9642840be002e1 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_109455v2/index.html

[-- Attachment #2: Type: text/html, Size: 29450 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity
  2022-10-10 22:57     ` [Intel-gfx] " Ceraolo Spurio, Daniele
@ 2022-10-12  0:03       ` John Harrison
  -1 siblings, 0 replies; 10+ messages in thread
From: John Harrison @ 2022-10-12  0:03 UTC (permalink / raw)
  To: Ceraolo Spurio, Daniele, intel-gfx; +Cc: Tony Ye, dri-devel

On 10/10/2022 15:57, Ceraolo Spurio, Daniele wrote:
> On 10/10/2022 3:50 PM, John Harrison wrote:
>> On 10/10/2022 11:48, Daniele Ceraolo Spurio wrote:
>>> We're observing sporadic HuC delayed load timeouts in CI, due to 
>>> mei_pxp
>>> binding completing later than we expected. HuC is still loaded when the
>>> bind occurs, but in the meantime i915 has started allowing 
>>> submission to
>>> the VCS engines even if HuC is not there.
>>> In most of the cases I've observed, the timeout was due to the
>>> init/resume of another driver between i915 and mei hitting errors and
>>> thus adding an extra delay, but HuC was still loaded before userspace
>>> could submit, because the whole resume process time was increased by 
>>> the
>>> delays.
>>>
>>> Given that there is no upper bound to the delay that can be introduced
>>> by other drivers, I've reached the following compromise with the media
>>> team:
>>>
>>> 1) i915 is going to bump the timeout to 5s, to reduce the probability
>>> of reaching it. We still expect HuC to be loaded before userspace
>>> starts submitting, so increasing the timeout should have no impact on
>>> normal operations, but in case something weird happens we don't want to
>>> stall video submissions for too long.
>>>
>>> 2) The media driver will cope with the failing submissions that manage
>>> to go through between i915 init/resume complete and HuC loading, if any
>>> ever happen. This could cause a small corruption of video playback
>>> immediately after a resume (we should be safe on boot because the media
>>> driver polls the HUC_STATUS ioctl before starting submissions).
>>>
>>> Since we're accepting the timeout as a valid outcome, I'm also reducing
>>> the print verbosity from error to notice.
>>>
>>> v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)
>>>
>>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
>>> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a 
>>> fence")
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Tony Ye <tony.ye@intel.com>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++++++----
>>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>>> index 4d1cc383b681..41c032ab34b3 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>>> @@ -52,10 +52,12 @@
>>>    * guaranteed for this to happen during boot, so the big timeout 
>>> is a safety net
>>>    * that we never expect to need.
>>>    * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs 
>>> to be resumed
>>> - * and/or reset, this can take longer.
>>> + * and/or reset, this can take longer. Note that the kernel might 
>>> schedule
>>> + * other work between the i915 init/resume and the MEI one, which 
>>> can add to
>>> + * the delay.
>>>    */
>>>   #define GSC_INIT_TIMEOUT_MS 10000
>>> -#define PXP_INIT_TIMEOUT_MS 2000
>>> +#define PXP_INIT_TIMEOUT_MS 5000
>>>     static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
>>>                    enum i915_sw_fence_notify state)
>>> @@ -104,8 +106,12 @@ static enum hrtimer_restart 
>>> huc_delayed_load_timer_callback(struct hrtimer *hrti
>>>       struct intel_huc *huc = container_of(hrtimer, struct 
>>> intel_huc, delayed_load.timer);
>>>         if (!intel_huc_is_authenticated(huc)) {
>>> -        drm_err(&huc_to_gt(huc)->i915->drm,
>>> -            "timed out waiting for GSC init to load HuC\n");
>>> +        if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC)
>>> +            drm_notice(&huc_to_gt(huc)->i915->drm,
>>> +                   "timed out waiting for MEI GSC init to load 
>>> HuC\n");
>>> +        else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP)
>>> +            drm_notice(&huc_to_gt(huc)->i915->drm,
>>> +                   "timed out waiting for MEI PXP init to load 
>>> HuC\n");
>> Hmm. It feels wrong to assume that the status can only be GSC or PXP. 
>> Granted, it certainly should be one of those two values if we get 
>> here. But it just seems like bad practice to have a partial if/elsif 
>> ladder for an enum value instead of a switch with a default path. 
>> What I meant originally was just have a single print that says 'timed 
>> out waiting for MEI GSC/PXP to load...', maybe with the status value 
>> being printed. I don't think it is critically important to 
>> differentiate between the two, I just meant that it was wrong to 
>> explicitly state GSC when it might not be.
>
> It is guaranteed that the state is one of those 2 in this callback. 
> The only other option is an error state, which we set from the 
> __gsc_init_error() below. I can make it an unconditional else, or add 
> an else MISSING_CASE(status) if you think that works better, but given 
> the errors we've seen I'd prefer to keep the prints separate for extra 
> clarity.
I agree a third state is theoretically impossible. Currently. But yeah 
the MISSING_CASE thing works if you think two separate prints is beneficial.

John.

>
> Daniele
>
>>
>> John.
>>
>>>             __gsc_init_error(huc);
>>>       }
>>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity
@ 2022-10-12  0:03       ` John Harrison
  0 siblings, 0 replies; 10+ messages in thread
From: John Harrison @ 2022-10-12  0:03 UTC (permalink / raw)
  To: Ceraolo Spurio, Daniele, intel-gfx; +Cc: dri-devel

On 10/10/2022 15:57, Ceraolo Spurio, Daniele wrote:
> On 10/10/2022 3:50 PM, John Harrison wrote:
>> On 10/10/2022 11:48, Daniele Ceraolo Spurio wrote:
>>> We're observing sporadic HuC delayed load timeouts in CI, due to 
>>> mei_pxp
>>> binding completing later than we expected. HuC is still loaded when the
>>> bind occurs, but in the meantime i915 has started allowing 
>>> submission to
>>> the VCS engines even if HuC is not there.
>>> In most of the cases I've observed, the timeout was due to the
>>> init/resume of another driver between i915 and mei hitting errors and
>>> thus adding an extra delay, but HuC was still loaded before userspace
>>> could submit, because the whole resume process time was increased by 
>>> the
>>> delays.
>>>
>>> Given that there is no upper bound to the delay that can be introduced
>>> by other drivers, I've reached the following compromise with the media
>>> team:
>>>
>>> 1) i915 is going to bump the timeout to 5s, to reduce the probability
>>> of reaching it. We still expect HuC to be loaded before userspace
>>> starts submitting, so increasing the timeout should have no impact on
>>> normal operations, but in case something weird happens we don't want to
>>> stall video submissions for too long.
>>>
>>> 2) The media driver will cope with the failing submissions that manage
>>> to go through between i915 init/resume complete and HuC loading, if any
>>> ever happen. This could cause a small corruption of video playback
>>> immediately after a resume (we should be safe on boot because the media
>>> driver polls the HUC_STATUS ioctl before starting submissions).
>>>
>>> Since we're accepting the timeout as a valid outcome, I'm also reducing
>>> the print verbosity from error to notice.
>>>
>>> v2: use separate prints for MEI GSC and MEI PXP init timeouts (John)
>>>
>>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/7033
>>> Fixes: 27536e03271d ("drm/i915/huc: track delayed HuC load with a 
>>> fence")
>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Cc: Tony Ye <tony.ye@intel.com>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/uc/intel_huc.c | 14 ++++++++++----
>>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_huc.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>>> index 4d1cc383b681..41c032ab34b3 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_huc.c
>>> @@ -52,10 +52,12 @@
>>>    * guaranteed for this to happen during boot, so the big timeout 
>>> is a safety net
>>>    * that we never expect to need.
>>>    * MEI-PXP + HuC load usually takes ~300ms, but if the GSC needs 
>>> to be resumed
>>> - * and/or reset, this can take longer.
>>> + * and/or reset, this can take longer. Note that the kernel might 
>>> schedule
>>> + * other work between the i915 init/resume and the MEI one, which 
>>> can add to
>>> + * the delay.
>>>    */
>>>   #define GSC_INIT_TIMEOUT_MS 10000
>>> -#define PXP_INIT_TIMEOUT_MS 2000
>>> +#define PXP_INIT_TIMEOUT_MS 5000
>>>     static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
>>>                    enum i915_sw_fence_notify state)
>>> @@ -104,8 +106,12 @@ static enum hrtimer_restart 
>>> huc_delayed_load_timer_callback(struct hrtimer *hrti
>>>       struct intel_huc *huc = container_of(hrtimer, struct 
>>> intel_huc, delayed_load.timer);
>>>         if (!intel_huc_is_authenticated(huc)) {
>>> -        drm_err(&huc_to_gt(huc)->i915->drm,
>>> -            "timed out waiting for GSC init to load HuC\n");
>>> +        if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_GSC)
>>> +            drm_notice(&huc_to_gt(huc)->i915->drm,
>>> +                   "timed out waiting for MEI GSC init to load 
>>> HuC\n");
>>> +        else if (huc->delayed_load.status == INTEL_HUC_WAITING_ON_PXP)
>>> +            drm_notice(&huc_to_gt(huc)->i915->drm,
>>> +                   "timed out waiting for MEI PXP init to load 
>>> HuC\n");
>> Hmm. It feels wrong to assume that the status can only be GSC or PXP. 
>> Granted, it certainly should be one of those two values if we get 
>> here. But it just seems like bad practice to have a partial if/elsif 
>> ladder for an enum value instead of a switch with a default path. 
>> What I meant originally was just have a single print that says 'timed 
>> out waiting for MEI GSC/PXP to load...', maybe with the status value 
>> being printed. I don't think it is critically important to 
>> differentiate between the two, I just meant that it was wrong to 
>> explicitly state GSC when it might not be.
>
> It is guaranteed that the state is one of those 2 in this callback. 
> The only other option is an error state, which we set from the 
> __gsc_init_error() below. I can make it an unconditional else, or add 
> an else MISSING_CASE(status) if you think that works better, but given 
> the errors we've seen I'd prefer to keep the prints separate for extra 
> clarity.
I agree a third state is theoretically impossible. Currently. But yeah 
the MISSING_CASE thing works if you think two separate prints is beneficial.

John.

>
> Daniele
>
>>
>> John.
>>
>>>             __gsc_init_error(huc);
>>>       }
>>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-12  0:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-10 18:48 [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity Daniele Ceraolo Spurio
2022-10-10 18:48 ` [Intel-gfx] " Daniele Ceraolo Spurio
2022-10-10 20:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/huc: bump timeout for delayed load and reduce print verbosity (rev2) Patchwork
2022-10-10 22:50 ` [PATCH v2] drm/i915/huc: bump timeout for delayed load and reduce print verbosity John Harrison
2022-10-10 22:50   ` [Intel-gfx] " John Harrison
2022-10-10 22:57   ` Ceraolo Spurio, Daniele
2022-10-10 22:57     ` [Intel-gfx] " Ceraolo Spurio, Daniele
2022-10-12  0:03     ` John Harrison
2022-10-12  0:03       ` [Intel-gfx] " John Harrison
2022-10-11  3:36 ` [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915/huc: bump timeout for delayed load and reduce print verbosity (rev2) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.