All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915/guc: Propagate the fw xfer timeout
@ 2018-10-17 20:29 Chris Wilson
  2018-10-17 21:12 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Chris Wilson @ 2018-10-17 20:29 UTC (permalink / raw)
  To: intel-gfx

Propagate the timeout on transferring the fw back to the caller where it
may act upon it, usually by restarting the xfer before failing.

Testcase: igt/drv_selftest/live_hangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_fw.c | 23 ++++++-----------------
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c
index a9e6fcce467c..b68a05892dab 100644
--- a/drivers/gpu/drm/i915/intel_guc_fw.c
+++ b/drivers/gpu/drm/i915/intel_guc_fw.c
@@ -125,22 +125,17 @@ static void guc_prepare_xfer(struct intel_guc *guc)
 }
 
 /* Copy RSA signature from the fw image to HW for verification */
-static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
+static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct intel_uc_fw *guc_fw = &guc->fw;
-	struct sg_table *sg = vma->pages;
 	u32 rsa[UOS_RSA_SCRATCH_COUNT];
 	int i;
 
-	if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
-			       guc_fw->rsa_offset) != sizeof(rsa))
-		return -EINVAL;
+	sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
+			   rsa, sizeof(rsa), guc->fw.rsa_offset);
 
 	for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
 		I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
-
-	return 0;
 }
 
 /*
@@ -251,17 +246,11 @@ static int guc_fw_xfer(struct intel_uc_fw *guc_fw, struct i915_vma *vma)
 	 * by the DMA engine in one operation, whereas the RSA signature is
 	 * loaded via MMIO.
 	 */
-	ret = guc_xfer_rsa(guc, vma);
-	if (ret)
-		DRM_WARN("GuC firmware signature xfer error %d\n", ret);
+	guc_xfer_rsa(guc, vma);
 
 	ret = guc_xfer_ucode(guc, vma);
-	if (ret)
-		DRM_WARN("GuC firmware code xfer error %d\n", ret);
-
-	ret = guc_wait_ucode(guc);
-	if (ret)
-		DRM_ERROR("GuC firmware xfer error %d\n", ret);
+	if (ret == 0)
+		ret = guc_wait_ucode(guc);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915/guc: Propagate the fw xfer timeout
  2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
@ 2018-10-17 21:12 ` Patchwork
  2018-10-17 22:14 ` ✓ Fi.CI.IGT: " Patchwork
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2018-10-17 21:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: Propagate the fw xfer timeout
URL   : https://patchwork.freedesktop.org/series/51140/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_5000 -> Patchwork_10495 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/51140/revisions/1/mbox/

== Known issues ==

  Here are the changes found in Patchwork_10495 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_flip@basic-flip-vs-dpms:
      fi-skl-6700hq:      PASS -> DMESG-WARN (fdo#105998)

    igt@kms_frontbuffer_tracking@basic:
      fi-icl-u2:          PASS -> FAIL (fdo#103167)

    
    ==== Possible fixes ====

    igt@kms_pipe_crc_basic@nonblocking-crc-pipe-b:
      fi-byt-clapper:     FAIL (fdo#107362) -> PASS

    
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#105998 https://bugs.freedesktop.org/show_bug.cgi?id=105998
  fdo#107362 https://bugs.freedesktop.org/show_bug.cgi?id=107362


== Participating hosts (47 -> 38) ==

  Missing    (9): fi-kbl-soraka fi-ilk-m540 fi-icl-u fi-hsw-4200u fi-byt-j1900 fi-byt-squawks fi-bsw-cyan fi-pnv-d510 fi-kbl-7560u 


== Build changes ==

    * Linux: CI_DRM_5000 -> Patchwork_10495

  CI_DRM_5000: b9543c130d4f6edd76ec98090c46044ba6d9493e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4683: 7766b1e2348b32cc8ed58a972c6fd53b20279549 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10495: c4abf98c4adb62eb18f400a2fb498477e6864af3 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

c4abf98c4adb drm/i915/guc: Propagate the fw xfer timeout

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10495/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915/guc: Propagate the fw xfer timeout
  2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
  2018-10-17 21:12 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-10-17 22:14 ` Patchwork
  2018-10-17 23:09 ` [PATCH] " Daniele Ceraolo Spurio
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2018-10-17 22:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: Propagate the fw xfer timeout
URL   : https://patchwork.freedesktop.org/series/51140/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_5000_full -> Patchwork_10495_full =

== Summary - SUCCESS ==

  No regressions found.

  

== Known issues ==

  Here are the changes found in Patchwork_10495_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking-fencing:
      shard-apl:          PASS -> DMESG-WARN (fdo#105602, fdo#103558) +3

    igt@kms_busy@extended-modeset-hang-newfb-render-a:
      shard-hsw:          PASS -> DMESG-WARN (fdo#107956)

    igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
      shard-skl:          PASS -> DMESG-WARN (fdo#107956)

    igt@kms_ccs@pipe-a-crc-sprite-planes-basic:
      shard-skl:          NOTRUN -> FAIL (fdo#107725, fdo#108145)

    igt@kms_cursor_legacy@2x-long-nonblocking-modeset-vs-cursor-atomic:
      shard-glk:          PASS -> FAIL (fdo#105454, fdo#106509)

    igt@kms_fbcon_fbt@fbc-suspend:
      shard-kbl:          PASS -> FAIL (fdo#103375, fdo#105682)

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
      shard-glk:          PASS -> FAIL (fdo#103167)

    igt@kms_frontbuffer_tracking@psr-1p-primscrn-spr-indfb-draw-render:
      shard-skl:          NOTRUN -> FAIL (fdo#103167)

    igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
      shard-kbl:          PASS -> FAIL (fdo#103375)

    igt@kms_plane_alpha_blend@pipe-b-alpha-transparant-fb:
      shard-skl:          NOTRUN -> FAIL (fdo#108145)

    igt@kms_plane_multiple@atomic-pipe-a-tiling-x:
      shard-apl:          PASS -> FAIL (fdo#103166) +2

    igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
      shard-glk:          PASS -> FAIL (fdo#103166) +2

    igt@kms_setmode@basic:
      shard-hsw:          PASS -> FAIL (fdo#99912)

    igt@perf@polling:
      shard-hsw:          PASS -> FAIL (fdo#102252)

    
    ==== Possible fixes ====

    igt@drv_suspend@fence-restore-untiled:
      shard-kbl:          INCOMPLETE (fdo#103665) -> PASS

    igt@gem_userptr_blits@readonly-unsync:
      shard-skl:          INCOMPLETE (fdo#108074) -> PASS

    igt@kms_atomic_transition@1x-modeset-transitions-fencing:
      shard-skl:          FAIL (fdo#108470) -> PASS

    igt@kms_available_modes_crc@available_mode_test_crc:
      shard-apl:          FAIL (fdo#106641) -> PASS

    igt@kms_color@pipe-c-ctm-max:
      shard-skl:          FAIL (fdo#108147) -> PASS

    igt@kms_cursor_crc@cursor-128x128-suspend:
      shard-apl:          FAIL (fdo#103232, fdo#103191) -> PASS

    igt@kms_cursor_crc@cursor-64x64-suspend:
      shard-skl:          INCOMPLETE (fdo#104108) -> PASS

    igt@kms_draw_crc@draw-method-xrgb8888-pwrite-untiled:
      shard-skl:          FAIL (fdo#108472) -> PASS

    igt@kms_flip@flip-vs-expired-vblank:
      shard-glk:          FAIL (fdo#102887, fdo#105363) -> PASS

    igt@kms_flip@flip-vs-expired-vblank-interruptible:
      shard-skl:          FAIL (fdo#105363) -> PASS

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-render:
      shard-apl:          FAIL (fdo#103167) -> PASS

    igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-render:
      shard-glk:          DMESG-FAIL (fdo#106538) -> PASS

    igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-indfb-draw-mmap-gtt:
      shard-skl:          FAIL (fdo#103167) -> PASS

    igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-draw-blt:
      shard-snb:          INCOMPLETE (fdo#105411) -> SKIP

    igt@kms_plane@plane-position-covered-pipe-c-planes:
      shard-glk:          FAIL (fdo#103166) -> PASS +1

    igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
      shard-skl:          FAIL (fdo#107815, fdo#108145) -> PASS

    igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
      shard-apl:          FAIL (fdo#103166) -> PASS

    igt@kms_setmode@basic:
      shard-kbl:          FAIL (fdo#99912) -> PASS

    igt@kms_vblank@pipe-b-ts-continuation-idle-hang:
      shard-glk:          DMESG-WARN (fdo#105763, fdo#106538) -> PASS +2

    
    ==== Warnings ====

    igt@kms_available_modes_crc@available_mode_test_crc:
      shard-glk:          FAIL (fdo#106641) -> DMESG-FAIL (fdo#106538)

    
  fdo#102252 https://bugs.freedesktop.org/show_bug.cgi?id=102252
  fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
  fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
  fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#104108 https://bugs.freedesktop.org/show_bug.cgi?id=104108
  fdo#105363 https://bugs.freedesktop.org/show_bug.cgi?id=105363
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105454 https://bugs.freedesktop.org/show_bug.cgi?id=105454
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#105682 https://bugs.freedesktop.org/show_bug.cgi?id=105682
  fdo#105763 https://bugs.freedesktop.org/show_bug.cgi?id=105763
  fdo#106509 https://bugs.freedesktop.org/show_bug.cgi?id=106509
  fdo#106538 https://bugs.freedesktop.org/show_bug.cgi?id=106538
  fdo#106641 https://bugs.freedesktop.org/show_bug.cgi?id=106641
  fdo#107725 https://bugs.freedesktop.org/show_bug.cgi?id=107725
  fdo#107815 https://bugs.freedesktop.org/show_bug.cgi?id=107815
  fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
  fdo#108074 https://bugs.freedesktop.org/show_bug.cgi?id=108074
  fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
  fdo#108147 https://bugs.freedesktop.org/show_bug.cgi?id=108147
  fdo#108470 https://bugs.freedesktop.org/show_bug.cgi?id=108470
  fdo#108472 https://bugs.freedesktop.org/show_bug.cgi?id=108472
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912


== Participating hosts (6 -> 6) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_5000 -> Patchwork_10495

  CI_DRM_5000: b9543c130d4f6edd76ec98090c46044ba6d9493e @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4683: 7766b1e2348b32cc8ed58a972c6fd53b20279549 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10495: c4abf98c4adb62eb18f400a2fb498477e6864af3 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10495/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
  2018-10-17 21:12 ` ✓ Fi.CI.BAT: success for " Patchwork
  2018-10-17 22:14 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-10-17 23:09 ` Daniele Ceraolo Spurio
  2018-10-17 23:22   ` Michal Wajdeczko
  2018-10-18 19:54 ` [PATCH v2] " Chris Wilson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-17 23:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 17/10/18 13:29, Chris Wilson wrote:
> Propagate the timeout on transferring the fw back to the caller where it
> may act upon it, usually by restarting the xfer before failing.
> 

Did you see any case where we failed the xfer and didn't get a timeout 
out of guc_wait_ucode? that'd be quite weird

Anyway, definitely better and cleaner than before:
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

small nitpick below

> Testcase: igt/drv_selftest/live_hangcheck
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_guc_fw.c | 23 ++++++-----------------
>   1 file changed, 6 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c
> index a9e6fcce467c..b68a05892dab 100644
> --- a/drivers/gpu/drm/i915/intel_guc_fw.c
> +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
> @@ -125,22 +125,17 @@ static void guc_prepare_xfer(struct intel_guc *guc)
>   }
>   
>   /* Copy RSA signature from the fw image to HW for verification */
> -static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
> +static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>   {
>   	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> -	struct intel_uc_fw *guc_fw = &guc->fw;
> -	struct sg_table *sg = vma->pages;
>   	u32 rsa[UOS_RSA_SCRATCH_COUNT];
>   	int i;
>   
> -	if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
> -			       guc_fw->rsa_offset) != sizeof(rsa))
> -		return -EINVAL;
> +	sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
> +			   rsa, sizeof(rsa), guc->fw.rsa_offset);
>   
>   	for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
>   		I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
> -
> -	return 0;
>   }
>   
>   /*
> @@ -251,17 +246,11 @@ static int guc_fw_xfer(struct intel_uc_fw *guc_fw, struct i915_vma *vma)
>   	 * by the DMA engine in one operation, whereas the RSA signature is
>   	 * loaded via MMIO.
>   	 */
> -	ret = guc_xfer_rsa(guc, vma);
> -	if (ret)
> -		DRM_WARN("GuC firmware signature xfer error %d\n", ret);
> +	guc_xfer_rsa(guc, vma);
>   
>   	ret = guc_xfer_ucode(guc, vma);
> -	if (ret)
> -		DRM_WARN("GuC firmware code xfer error %d\n", ret);
> -
> -	ret = guc_wait_ucode(guc);
> -	if (ret)
> -		DRM_ERROR("GuC firmware xfer error %d\n", ret);

With the logs gone we don't have any more error message to understand 
where exactly we hit issue (xfer vs wait_ucode). We do have debug logs 
printing the status that would give the info, but might be worth 
retaining at least 1 error-level log.

Daniele

> +	if (ret == 0)
> +		ret = guc_wait_ucode(guc);
>   
>   	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-17 23:09 ` [PATCH] " Daniele Ceraolo Spurio
@ 2018-10-17 23:22   ` Michal Wajdeczko
  2018-10-18  9:13     ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Wajdeczko @ 2018-10-17 23:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Daniele Ceraolo Spurio

On Thu, 18 Oct 2018 01:09:19 +0200, Daniele Ceraolo Spurio  
<daniele.ceraolospurio@intel.com> wrote:

>
>
> On 17/10/18 13:29, Chris Wilson wrote:
>> Propagate the timeout on transferring the fw back to the caller where it
>> may act upon it, usually by restarting the xfer before failing.
>>
>
> Did you see any case where we failed the xfer and didn't get a timeout  
> out of guc_wait_ucode? that'd be quite weird
>
> Anyway, definitely better and cleaner than before:
> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>
> small nitpick below
>
>> Testcase: igt/drv_selftest/live_hangcheck
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_guc_fw.c | 23 ++++++-----------------
>>   1 file changed, 6 insertions(+), 17 deletions(-)
>>  diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c  
>> b/drivers/gpu/drm/i915/intel_guc_fw.c
>> index a9e6fcce467c..b68a05892dab 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_fw.c
>> +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
>> @@ -125,22 +125,17 @@ static void guc_prepare_xfer(struct intel_guc  
>> *guc)
>>   }
>>     /* Copy RSA signature from the fw image to HW for verification */
>> -static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>> +static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>>   {
>>   	struct drm_i915_private *dev_priv = guc_to_i915(guc);
>> -	struct intel_uc_fw *guc_fw = &guc->fw;
>> -	struct sg_table *sg = vma->pages;
>>   	u32 rsa[UOS_RSA_SCRATCH_COUNT];
>>   	int i;
>>   -	if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
>> -			       guc_fw->rsa_offset) != sizeof(rsa))
>> -		return -EINVAL;
>> +	sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
>> +			   rsa, sizeof(rsa), guc->fw.rsa_offset);
>>     	for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
>>   		I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
>> -
>> -	return 0;
>>   }
>>     /*
>> @@ -251,17 +246,11 @@ static int guc_fw_xfer(struct intel_uc_fw  
>> *guc_fw, struct i915_vma *vma)
>>   	 * by the DMA engine in one operation, whereas the RSA signature is
>>   	 * loaded via MMIO.
>>   	 */
>> -	ret = guc_xfer_rsa(guc, vma);
>> -	if (ret)
>> -		DRM_WARN("GuC firmware signature xfer error %d\n", ret);
>> +	guc_xfer_rsa(guc, vma);
>>     	ret = guc_xfer_ucode(guc, vma);
>> -	if (ret)
>> -		DRM_WARN("GuC firmware code xfer error %d\n", ret);
>> -
>> -	ret = guc_wait_ucode(guc);
>> -	if (ret)
>> -		DRM_ERROR("GuC firmware xfer error %d\n", ret);
>
> With the logs gone we don't have any more error message to understand  
> where exactly we hit issue (xfer vs wait_ucode). We do have debug logs  
> printing the status that would give the info, but might be worth  
> retaining at least 1 error-level log.

+1 for leaving error-level logs (here or at helper functions)
also, please let CI re-run this patch with GuC enabled ;)

Michal

>
> Daniele
>
>> +	if (ret == 0)
>> +		ret = guc_wait_ucode(guc);
>>     	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-17 23:22   ` Michal Wajdeczko
@ 2018-10-18  9:13     ` Chris Wilson
  2018-10-18 18:18       ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2018-10-18  9:13 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Michal Wajdeczko, intel-gfx

Quoting Michal Wajdeczko (2018-10-18 00:22:43)
> On Thu, 18 Oct 2018 01:09:19 +0200, Daniele Ceraolo Spurio  
> <daniele.ceraolospurio@intel.com> wrote:
> 
> >
> >
> > On 17/10/18 13:29, Chris Wilson wrote:
> >> Propagate the timeout on transferring the fw back to the caller where it
> >> may act upon it, usually by restarting the xfer before failing.
> >>
> >
> > Did you see any case where we failed the xfer and didn't get a timeout  
> > out of guc_wait_ucode? that'd be quite weird

Yes the logs show the xfer error but not the wait error. So we ended up
returning 0.

> > Anyway, definitely better and cleaner than before:
> > Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >
> > small nitpick below
> >
> >> Testcase: igt/drv_selftest/live_hangcheck
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> >> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/intel_guc_fw.c | 23 ++++++-----------------
> >>   1 file changed, 6 insertions(+), 17 deletions(-)
> >>  diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c  
> >> b/drivers/gpu/drm/i915/intel_guc_fw.c
> >> index a9e6fcce467c..b68a05892dab 100644
> >> --- a/drivers/gpu/drm/i915/intel_guc_fw.c
> >> +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
> >> @@ -125,22 +125,17 @@ static void guc_prepare_xfer(struct intel_guc  
> >> *guc)
> >>   }
> >>     /* Copy RSA signature from the fw image to HW for verification */
> >> -static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
> >> +static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
> >>   {
> >>      struct drm_i915_private *dev_priv = guc_to_i915(guc);
> >> -    struct intel_uc_fw *guc_fw = &guc->fw;
> >> -    struct sg_table *sg = vma->pages;
> >>      u32 rsa[UOS_RSA_SCRATCH_COUNT];
> >>      int i;
> >>   -  if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
> >> -                           guc_fw->rsa_offset) != sizeof(rsa))
> >> -            return -EINVAL;
> >> +    sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
> >> +                       rsa, sizeof(rsa), guc->fw.rsa_offset);
> >>      for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
> >>              I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
> >> -
> >> -    return 0;
> >>   }
> >>     /*
> >> @@ -251,17 +246,11 @@ static int guc_fw_xfer(struct intel_uc_fw  
> >> *guc_fw, struct i915_vma *vma)
> >>       * by the DMA engine in one operation, whereas the RSA signature is
> >>       * loaded via MMIO.
> >>       */
> >> -    ret = guc_xfer_rsa(guc, vma);
> >> -    if (ret)
> >> -            DRM_WARN("GuC firmware signature xfer error %d\n", ret);
> >> +    guc_xfer_rsa(guc, vma);
> >>      ret = guc_xfer_ucode(guc, vma);
> >> -    if (ret)
> >> -            DRM_WARN("GuC firmware code xfer error %d\n", ret);
> >> -
> >> -    ret = guc_wait_ucode(guc);
> >> -    if (ret)
> >> -            DRM_ERROR("GuC firmware xfer error %d\n", ret);
> >
> > With the logs gone we don't have any more error message to understand  
> > where exactly we hit issue (xfer vs wait_ucode). We do have debug logs  
> > printing the status that would give the info, but might be worth  
> > retaining at least 1 error-level log.
> 
> +1 for leaving error-level logs (here or at helper functions)
> also, please let CI re-run this patch with GuC enabled ;)

I disagree since the errors are noise as the error is propagated and
handled by the caller -- so not strictly errors, but maybe
DRM_DEBUG_DRIVER().
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-18  9:13     ` Chris Wilson
@ 2018-10-18 18:18       ` Daniele Ceraolo Spurio
  2018-10-18 18:27         ` Michal Wajdeczko
  0 siblings, 1 reply; 17+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-18 18:18 UTC (permalink / raw)
  To: Chris Wilson, Michal Wajdeczko, intel-gfx



On 18/10/18 02:13, Chris Wilson wrote:
> Quoting Michal Wajdeczko (2018-10-18 00:22:43)
>> On Thu, 18 Oct 2018 01:09:19 +0200, Daniele Ceraolo Spurio
>> <daniele.ceraolospurio@intel.com> wrote:
>>
>>>
>>>
>>> On 17/10/18 13:29, Chris Wilson wrote:
>>>> Propagate the timeout on transferring the fw back to the caller where it
>>>> may act upon it, usually by restarting the xfer before failing.
>>>>
>>>
>>> Did you see any case where we failed the xfer and didn't get a timeout
>>> out of guc_wait_ucode? that'd be quite weird
> 
> Yes the logs show the xfer error but not the wait error. So we ended up
> returning 0.
> 

The guc status register is correctly cleared by HW on guc reset (just 
checked that), so if the wait_ucode succeeded in matching a "ready" 
value in there it means that the guc FW had loaded correctly. Maybe it 
managed to complete the xfer just after the timeout elapsed or the DMA 
got confused and reported the wrong status. Still weird, but your 
changes should help make the error more visible.

>>> Anyway, definitely better and cleaner than before:
>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>
>>> small nitpick below
>>>
>>>> Testcase: igt/drv_selftest/live_hangcheck
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/intel_guc_fw.c | 23 ++++++-----------------
>>>>    1 file changed, 6 insertions(+), 17 deletions(-)
>>>>   diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c
>>>> b/drivers/gpu/drm/i915/intel_guc_fw.c
>>>> index a9e6fcce467c..b68a05892dab 100644
>>>> --- a/drivers/gpu/drm/i915/intel_guc_fw.c
>>>> +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
>>>> @@ -125,22 +125,17 @@ static void guc_prepare_xfer(struct intel_guc
>>>> *guc)
>>>>    }
>>>>      /* Copy RSA signature from the fw image to HW for verification */
>>>> -static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>>>> +static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>>>>    {
>>>>       struct drm_i915_private *dev_priv = guc_to_i915(guc);
>>>> -    struct intel_uc_fw *guc_fw = &guc->fw;
>>>> -    struct sg_table *sg = vma->pages;
>>>>       u32 rsa[UOS_RSA_SCRATCH_COUNT];
>>>>       int i;
>>>>    -  if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
>>>> -                           guc_fw->rsa_offset) != sizeof(rsa))
>>>> -            return -EINVAL;
>>>> +    sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
>>>> +                       rsa, sizeof(rsa), guc->fw.rsa_offset);
>>>>       for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
>>>>               I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
>>>> -
>>>> -    return 0;
>>>>    }
>>>>      /*
>>>> @@ -251,17 +246,11 @@ static int guc_fw_xfer(struct intel_uc_fw
>>>> *guc_fw, struct i915_vma *vma)
>>>>        * by the DMA engine in one operation, whereas the RSA signature is
>>>>        * loaded via MMIO.
>>>>        */
>>>> -    ret = guc_xfer_rsa(guc, vma);
>>>> -    if (ret)
>>>> -            DRM_WARN("GuC firmware signature xfer error %d\n", ret);
>>>> +    guc_xfer_rsa(guc, vma);
>>>>       ret = guc_xfer_ucode(guc, vma);
>>>> -    if (ret)
>>>> -            DRM_WARN("GuC firmware code xfer error %d\n", ret);
>>>> -
>>>> -    ret = guc_wait_ucode(guc);
>>>> -    if (ret)
>>>> -            DRM_ERROR("GuC firmware xfer error %d\n", ret);
>>>
>>> With the logs gone we don't have any more error message to understand
>>> where exactly we hit issue (xfer vs wait_ucode). We do have debug logs
>>> printing the status that would give the info, but might be worth
>>> retaining at least 1 error-level log.
>>
>> +1 for leaving error-level logs (here or at helper functions)
>> also, please let CI re-run this patch with GuC enabled ;)
> 
> I disagree since the errors are noise as the error is propagated and
> handled by the caller -- so not strictly errors, but maybe
> DRM_DEBUG_DRIVER().
> -Chris
> 

True, we don't want an error message if the caller decides that a 
failure can be expected and retries. we already have DRM_DEBUG_DRIVER 
printing status inside the GuC wait and GuC xfer functions, those should 
be enough.

Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-18 18:18       ` Daniele Ceraolo Spurio
@ 2018-10-18 18:27         ` Michal Wajdeczko
  2018-10-18 19:42           ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Michal Wajdeczko @ 2018-10-18 18:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Daniele Ceraolo Spurio

On Thu, 18 Oct 2018 20:18:53 +0200, Daniele Ceraolo Spurio  
<daniele.ceraolospurio@intel.com> wrote:

>
>
> On 18/10/18 02:13, Chris Wilson wrote:
>> Quoting Michal Wajdeczko (2018-10-18 00:22:43)
>>> On Thu, 18 Oct 2018 01:09:19 +0200, Daniele Ceraolo Spurio
>>> <daniele.ceraolospurio@intel.com> wrote:
>>>
>>>>
>>>>
>>>> On 17/10/18 13:29, Chris Wilson wrote:
>>>>> Propagate the timeout on transferring the fw back to the caller  
>>>>> where it
>>>>> may act upon it, usually by restarting the xfer before failing.
>>>>>
>>>>
>>>> Did you see any case where we failed the xfer and didn't get a timeout
>>>> out of guc_wait_ucode? that'd be quite weird
>>  Yes the logs show the xfer error but not the wait error. So we ended up
>> returning 0.
>>
>
> The guc status register is correctly cleared by HW on guc reset (just  
> checked that), so if the wait_ucode succeeded in matching a "ready"  
> value in there it means that the guc FW had loaded correctly. Maybe it  
> managed to complete the xfer just after the timeout elapsed or the DMA  
> got confused and reported the wrong status. Still weird, but your  
> changes should help make the error more visible.

maybe first timeout in DMA transfer was mitigated by additional wait in
guc_wait_ucode() - so maybe this error propagation is not that important ?

>
>>>> Anyway, definitely better and cleaner than before:
>>>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>
>>>> small nitpick below
>>>>
>>>>> Testcase: igt/drv_selftest/live_hangcheck
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/intel_guc_fw.c | 23 ++++++-----------------
>>>>>    1 file changed, 6 insertions(+), 17 deletions(-)
>>>>>   diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c
>>>>> b/drivers/gpu/drm/i915/intel_guc_fw.c
>>>>> index a9e6fcce467c..b68a05892dab 100644
>>>>> --- a/drivers/gpu/drm/i915/intel_guc_fw.c
>>>>> +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
>>>>> @@ -125,22 +125,17 @@ static void guc_prepare_xfer(struct intel_guc
>>>>> *guc)
>>>>>    }
>>>>>      /* Copy RSA signature from the fw image to HW for verification  
>>>>> */
>>>>> -static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>>>>> +static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma  
>>>>> *vma)
>>>>>    {
>>>>>       struct drm_i915_private *dev_priv = guc_to_i915(guc);
>>>>> -    struct intel_uc_fw *guc_fw = &guc->fw;
>>>>> -    struct sg_table *sg = vma->pages;
>>>>>       u32 rsa[UOS_RSA_SCRATCH_COUNT];
>>>>>       int i;
>>>>>    -  if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
>>>>> -                           guc_fw->rsa_offset) != sizeof(rsa))
>>>>> -            return -EINVAL;
>>>>> +    sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
>>>>> +                       rsa, sizeof(rsa), guc->fw.rsa_offset);
>>>>>       for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
>>>>>               I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
>>>>> -
>>>>> -    return 0;
>>>>>    }
>>>>>      /*
>>>>> @@ -251,17 +246,11 @@ static int guc_fw_xfer(struct intel_uc_fw
>>>>> *guc_fw, struct i915_vma *vma)
>>>>>        * by the DMA engine in one operation, whereas the RSA  
>>>>> signature is
>>>>>        * loaded via MMIO.
>>>>>        */
>>>>> -    ret = guc_xfer_rsa(guc, vma);
>>>>> -    if (ret)
>>>>> -            DRM_WARN("GuC firmware signature xfer error %d\n", ret);
>>>>> +    guc_xfer_rsa(guc, vma);
>>>>>       ret = guc_xfer_ucode(guc, vma);
>>>>> -    if (ret)
>>>>> -            DRM_WARN("GuC firmware code xfer error %d\n", ret);
>>>>> -
>>>>> -    ret = guc_wait_ucode(guc);
>>>>> -    if (ret)
>>>>> -            DRM_ERROR("GuC firmware xfer error %d\n", ret);
>>>>
>>>> With the logs gone we don't have any more error message to understand
>>>> where exactly we hit issue (xfer vs wait_ucode). We do have debug logs
>>>> printing the status that would give the info, but might be worth
>>>> retaining at least 1 error-level log.
>>>
>>> +1 for leaving error-level logs (here or at helper functions)
>>> also, please let CI re-run this patch with GuC enabled ;)
>>  I disagree since the errors are noise as the error is propagated and
>> handled by the caller -- so not strictly errors, but maybe
>> DRM_DEBUG_DRIVER().
>> -Chris
>>
>
> True, we don't want an error message if the caller decides that a  
> failure can be expected and retries. we already have DRM_DEBUG_DRIVER  
> printing status inside the GuC wait and GuC xfer functions, those should  
> be enough.
>
> Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-18 18:27         ` Michal Wajdeczko
@ 2018-10-18 19:42           ` Chris Wilson
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2018-10-18 19:42 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Michal Wajdeczko, intel-gfx

Quoting Michal Wajdeczko (2018-10-18 19:27:20)
> On Thu, 18 Oct 2018 20:18:53 +0200, Daniele Ceraolo Spurio  
> <daniele.ceraolospurio@intel.com> wrote:
> 
> >
> >
> > On 18/10/18 02:13, Chris Wilson wrote:
> >> Quoting Michal Wajdeczko (2018-10-18 00:22:43)
> >>> On Thu, 18 Oct 2018 01:09:19 +0200, Daniele Ceraolo Spurio
> >>> <daniele.ceraolospurio@intel.com> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 17/10/18 13:29, Chris Wilson wrote:
> >>>>> Propagate the timeout on transferring the fw back to the caller  
> >>>>> where it
> >>>>> may act upon it, usually by restarting the xfer before failing.
> >>>>>
> >>>>
> >>>> Did you see any case where we failed the xfer and didn't get a timeout
> >>>> out of guc_wait_ucode? that'd be quite weird
> >>  Yes the logs show the xfer error but not the wait error. So we ended up
> >> returning 0.
> >>
> >
> > The guc status register is correctly cleared by HW on guc reset (just  
> > checked that), so if the wait_ucode succeeded in matching a "ready"  
> > value in there it means that the guc FW had loaded correctly. Maybe it  
> > managed to complete the xfer just after the timeout elapsed or the DMA  
> > got confused and reported the wrong status. Still weird, but your  
> > changes should help make the error more visible.
> 
> maybe first timeout in DMA transfer was mitigated by additional wait in
> guc_wait_ucode() - so maybe this error propagation is not that important ?

Yes, it looks reasonable that the guc is unlikely to signal it is
ready before the dma xfer is complete. How about only waiting for the
guc signal and asserting that the xfer is complete afterwards? (Just to
sanity check the hw state on completion.)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
                   ` (2 preceding siblings ...)
  2018-10-17 23:09 ` [PATCH] " Daniele Ceraolo Spurio
@ 2018-10-18 19:54 ` Chris Wilson
  2018-10-18 19:55 ` [PATCH v3] " Chris Wilson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Chris Wilson @ 2018-10-18 19:54 UTC (permalink / raw)
  To: intel-gfx

Propagate the timeout on transferring the fw back to the caller where it
may act upon it, usually by restarting the xfer before failing.

v2: Simplify the wait to only wait upon the guc signaling completion,
with an assertion that the fw xfer must have completed for it to be
ready!

Testcase: igt/drv_selftest/live_hangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_fw.c | 106 +++++++++++++---------------
 1 file changed, 50 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c
index a9e6fcce467c..21a5dcf78322 100644
--- a/drivers/gpu/drm/i915/intel_guc_fw.c
+++ b/drivers/gpu/drm/i915/intel_guc_fw.c
@@ -125,66 +125,26 @@ static void guc_prepare_xfer(struct intel_guc *guc)
 }
 
 /* Copy RSA signature from the fw image to HW for verification */
-static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
+static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct intel_uc_fw *guc_fw = &guc->fw;
-	struct sg_table *sg = vma->pages;
 	u32 rsa[UOS_RSA_SCRATCH_COUNT];
 	int i;
 
-	if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
-			       guc_fw->rsa_offset) != sizeof(rsa))
-		return -EINVAL;
+	sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
+			   rsa, sizeof(rsa), guc->fw.rsa_offset);
 
 	for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
 		I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
-
-	return 0;
 }
 
-/*
- * Transfer the firmware image to RAM for execution by the microcontroller.
- *
- * Architecturally, the DMA engine is bidirectional, and can potentially even
- * transfer between GTT locations. This functionality is left out of the API
- * for now as there is no need for it.
- */
-static int guc_xfer_ucode(struct intel_guc *guc, struct i915_vma *vma)
+static bool guc_xfer_completed(struct intel_guc *guc, u32 *status)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct intel_uc_fw *guc_fw = &guc->fw;
-	unsigned long offset;
-	u32 status;
-	int ret;
-
-	/*
-	 * The header plus uCode will be copied to WOPCM via DMA, excluding any
-	 * other components
-	 */
-	I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
-
-	/* Set the source address for the new blob */
-	offset = intel_guc_ggtt_offset(guc, vma) + guc_fw->header_offset;
-	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
-	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
 
-	/*
-	 * Set the DMA destination. Current uCode expects the code to be
-	 * loaded at 8k; locations below this are used for the stack.
-	 */
-	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
-	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
-
-	/* Finally start the DMA */
-	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
-
-	/* Wait for DMA to finish */
-	ret = __intel_wait_for_register_fw(dev_priv, DMA_CTRL, START_DMA, 0,
-					   2, 100, &status);
-	DRM_DEBUG_DRIVER("GuC DMA status %#x\n", status);
-
-	return ret;
+	/* Did we complete the xfer? */
+	*status = I915_READ(DMA_CTRL);
+	return !!(*status & START_DMA);
 }
 
 /*
@@ -228,9 +188,51 @@ static int guc_wait_ucode(struct intel_guc *guc)
 		ret = -ENOEXEC;
 	}
 
+	if (ret == 0 && !guc_xfer_completed(guc, &status)) {
+		DRM_ERROR("GuC is ready, but the xfer %08x is incomplete\n",
+			  status);
+		ret = -ENXIO;
+	}
+
 	return ret;
 }
 
+/*
+ * Transfer the firmware image to RAM for execution by the microcontroller.
+ *
+ * Architecturally, the DMA engine is bidirectional, and can potentially even
+ * transfer between GTT locations. This functionality is left out of the API
+ * for now as there is no need for it.
+ */
+static int guc_xfer_ucode(struct intel_guc *guc, struct i915_vma *vma)
+{
+	struct drm_i915_private *dev_priv = guc_to_i915(guc);
+	struct intel_uc_fw *guc_fw = &guc->fw;
+	unsigned long offset;
+
+	/*
+	 * The header plus uCode will be copied to WOPCM via DMA, excluding any
+	 * other components
+	 */
+	I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
+
+	/* Set the source address for the new blob */
+	offset = intel_guc_ggtt_offset(guc, vma) + guc_fw->header_offset;
+	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
+	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
+
+	/*
+	 * Set the DMA destination. Current uCode expects the code to be
+	 * loaded at 8k; locations below this are used for the stack.
+	 */
+	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
+	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
+
+	/* Finally start the DMA */
+	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
+
+	return guc_wait_ucode(guc);
+}
 /*
  * Load the GuC firmware blob into the MinuteIA.
  */
@@ -251,17 +253,9 @@ static int guc_fw_xfer(struct intel_uc_fw *guc_fw, struct i915_vma *vma)
 	 * by the DMA engine in one operation, whereas the RSA signature is
 	 * loaded via MMIO.
 	 */
-	ret = guc_xfer_rsa(guc, vma);
-	if (ret)
-		DRM_WARN("GuC firmware signature xfer error %d\n", ret);
+	guc_xfer_rsa(guc, vma);
 
 	ret = guc_xfer_ucode(guc, vma);
-	if (ret)
-		DRM_WARN("GuC firmware code xfer error %d\n", ret);
-
-	ret = guc_wait_ucode(guc);
-	if (ret)
-		DRM_ERROR("GuC firmware xfer error %d\n", ret);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
                   ` (3 preceding siblings ...)
  2018-10-18 19:54 ` [PATCH v2] " Chris Wilson
@ 2018-10-18 19:55 ` Chris Wilson
  2018-10-18 20:48   ` Daniele Ceraolo Spurio
  2018-10-18 20:16 ` ✗ Fi.CI.BAT: failure for drm/i915/guc: Propagate the fw xfer timeout (rev2) Patchwork
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2018-10-18 19:55 UTC (permalink / raw)
  To: intel-gfx

Propagate the timeout on transferring the fw back to the caller where it
may act upon it, usually by restarting the xfer before failing.

v2: Simplify the wait to only wait upon the guc signaling completion,
with an assertion that the fw xfer must have completed for it to be
ready!

Testcase: igt/drv_selftest/live_hangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_fw.c | 106 +++++++++++++---------------
 1 file changed, 50 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c
index a9e6fcce467c..51b292001edd 100644
--- a/drivers/gpu/drm/i915/intel_guc_fw.c
+++ b/drivers/gpu/drm/i915/intel_guc_fw.c
@@ -125,66 +125,26 @@ static void guc_prepare_xfer(struct intel_guc *guc)
 }
 
 /* Copy RSA signature from the fw image to HW for verification */
-static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
+static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct intel_uc_fw *guc_fw = &guc->fw;
-	struct sg_table *sg = vma->pages;
 	u32 rsa[UOS_RSA_SCRATCH_COUNT];
 	int i;
 
-	if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
-			       guc_fw->rsa_offset) != sizeof(rsa))
-		return -EINVAL;
+	sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
+			   rsa, sizeof(rsa), guc->fw.rsa_offset);
 
 	for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
 		I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
-
-	return 0;
 }
 
-/*
- * Transfer the firmware image to RAM for execution by the microcontroller.
- *
- * Architecturally, the DMA engine is bidirectional, and can potentially even
- * transfer between GTT locations. This functionality is left out of the API
- * for now as there is no need for it.
- */
-static int guc_xfer_ucode(struct intel_guc *guc, struct i915_vma *vma)
+static bool guc_xfer_completed(struct intel_guc *guc, u32 *status)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct intel_uc_fw *guc_fw = &guc->fw;
-	unsigned long offset;
-	u32 status;
-	int ret;
-
-	/*
-	 * The header plus uCode will be copied to WOPCM via DMA, excluding any
-	 * other components
-	 */
-	I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
-
-	/* Set the source address for the new blob */
-	offset = intel_guc_ggtt_offset(guc, vma) + guc_fw->header_offset;
-	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
-	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
 
-	/*
-	 * Set the DMA destination. Current uCode expects the code to be
-	 * loaded at 8k; locations below this are used for the stack.
-	 */
-	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
-	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
-
-	/* Finally start the DMA */
-	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
-
-	/* Wait for DMA to finish */
-	ret = __intel_wait_for_register_fw(dev_priv, DMA_CTRL, START_DMA, 0,
-					   2, 100, &status);
-	DRM_DEBUG_DRIVER("GuC DMA status %#x\n", status);
-
-	return ret;
+	/* Did we complete the xfer? */
+	*status = I915_READ(DMA_CTRL);
+	return !(*status & START_DMA);
 }
 
 /*
@@ -228,9 +188,51 @@ static int guc_wait_ucode(struct intel_guc *guc)
 		ret = -ENOEXEC;
 	}
 
+	if (ret == 0 && !guc_xfer_completed(guc, &status)) {
+		DRM_ERROR("GuC is ready, but the xfer %08x is incomplete\n",
+			  status);
+		ret = -ENXIO;
+	}
+
 	return ret;
 }
 
+/*
+ * Transfer the firmware image to RAM for execution by the microcontroller.
+ *
+ * Architecturally, the DMA engine is bidirectional, and can potentially even
+ * transfer between GTT locations. This functionality is left out of the API
+ * for now as there is no need for it.
+ */
+static int guc_xfer_ucode(struct intel_guc *guc, struct i915_vma *vma)
+{
+	struct drm_i915_private *dev_priv = guc_to_i915(guc);
+	struct intel_uc_fw *guc_fw = &guc->fw;
+	unsigned long offset;
+
+	/*
+	 * The header plus uCode will be copied to WOPCM via DMA, excluding any
+	 * other components
+	 */
+	I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
+
+	/* Set the source address for the new blob */
+	offset = intel_guc_ggtt_offset(guc, vma) + guc_fw->header_offset;
+	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
+	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
+
+	/*
+	 * Set the DMA destination. Current uCode expects the code to be
+	 * loaded at 8k; locations below this are used for the stack.
+	 */
+	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
+	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
+
+	/* Finally start the DMA */
+	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
+
+	return guc_wait_ucode(guc);
+}
 /*
  * Load the GuC firmware blob into the MinuteIA.
  */
@@ -251,17 +253,9 @@ static int guc_fw_xfer(struct intel_uc_fw *guc_fw, struct i915_vma *vma)
 	 * by the DMA engine in one operation, whereas the RSA signature is
 	 * loaded via MMIO.
 	 */
-	ret = guc_xfer_rsa(guc, vma);
-	if (ret)
-		DRM_WARN("GuC firmware signature xfer error %d\n", ret);
+	guc_xfer_rsa(guc, vma);
 
 	ret = guc_xfer_ucode(guc, vma);
-	if (ret)
-		DRM_WARN("GuC firmware code xfer error %d\n", ret);
-
-	ret = guc_wait_ucode(guc);
-	if (ret)
-		DRM_ERROR("GuC firmware xfer error %d\n", ret);
 
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 
-- 
2.19.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* ✗ Fi.CI.BAT: failure for drm/i915/guc: Propagate the fw xfer timeout (rev2)
  2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
                   ` (4 preceding siblings ...)
  2018-10-18 19:55 ` [PATCH v3] " Chris Wilson
@ 2018-10-18 20:16 ` Patchwork
  2018-10-18 20:37 ` ✓ Fi.CI.BAT: success for drm/i915/guc: Propagate the fw xfer timeout (rev3) Patchwork
  2018-10-18 23:41 ` ✗ Fi.CI.IGT: failure " Patchwork
  7 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2018-10-18 20:16 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: Propagate the fw xfer timeout (rev2)
URL   : https://patchwork.freedesktop.org/series/51140/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_5010 -> Patchwork_10506 =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_10506 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10506, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/51140/revisions/2/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10506:

  === IGT changes ===

    ==== Possible regressions ====

    igt@kms_frontbuffer_tracking@basic:
      fi-skl-6700hq:      PASS -> INCOMPLETE

    
== Known issues ==

  Here are the changes found in Patchwork_10506 that come from known issues:

  === IGT changes ===

    ==== Possible fixes ====

    igt@kms_flip@basic-plain-flip:
      fi-ilk-650:         DMESG-WARN (fdo#106387) -> PASS

    igt@kms_frontbuffer_tracking@basic:
      fi-icl-u2:          FAIL (fdo#103167) -> PASS

    igt@pm_rpm@module-reload:
      fi-glk-j4005:       DMESG-WARN (fdo#106000) -> PASS

    
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106387 https://bugs.freedesktop.org/show_bug.cgi?id=106387


== Participating hosts (43 -> 36) ==

  Missing    (7): fi-ilk-m540 fi-skl-guc fi-byt-squawks fi-bsw-cyan fi-cfl-guc fi-apl-guc fi-kbl-guc 


== Build changes ==

    * Linux: CI_DRM_5010 -> Patchwork_10506

  CI_DRM_5010: 27a4f334d3ec8882d50227c26ae4e393d7d1f4a1 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4683: 7766b1e2348b32cc8ed58a972c6fd53b20279549 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10506: 8650135b072e1fa09bc942a7c93e249cb5d5f78a @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

8650135b072e drm/i915/guc: Propagate the fw xfer timeout

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10506/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915/guc: Propagate the fw xfer timeout (rev3)
  2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
                   ` (5 preceding siblings ...)
  2018-10-18 20:16 ` ✗ Fi.CI.BAT: failure for drm/i915/guc: Propagate the fw xfer timeout (rev2) Patchwork
@ 2018-10-18 20:37 ` Patchwork
  2018-10-18 23:41 ` ✗ Fi.CI.IGT: failure " Patchwork
  7 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2018-10-18 20:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: Propagate the fw xfer timeout (rev3)
URL   : https://patchwork.freedesktop.org/series/51140/
State : success

== Summary ==

= CI Bug Log - changes from CI_DRM_5010 -> Patchwork_10507 =

== Summary - SUCCESS ==

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/51140/revisions/3/mbox/

== Known issues ==

  Here are the changes found in Patchwork_10507 that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@gem_exec_suspend@basic-s3:
      fi-blb-e6850:       PASS -> INCOMPLETE (fdo#107718)

    igt@kms_frontbuffer_tracking@basic:
      fi-hsw-peppy:       PASS -> DMESG-WARN (fdo#102614)

    
    ==== Possible fixes ====

    igt@kms_flip@basic-plain-flip:
      fi-ilk-650:         DMESG-WARN (fdo#106387) -> PASS

    igt@kms_frontbuffer_tracking@basic:
      fi-icl-u2:          FAIL (fdo#103167) -> PASS
      fi-byt-clapper:     FAIL (fdo#103167) -> PASS

    igt@pm_rpm@module-reload:
      fi-glk-j4005:       DMESG-WARN (fdo#106000) -> PASS

    
  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#106000 https://bugs.freedesktop.org/show_bug.cgi?id=106000
  fdo#106387 https://bugs.freedesktop.org/show_bug.cgi?id=106387
  fdo#107718 https://bugs.freedesktop.org/show_bug.cgi?id=107718


== Participating hosts (43 -> 39) ==

  Missing    (4): fi-skl-guc fi-ilk-m540 fi-byt-squawks fi-bsw-cyan 


== Build changes ==

    * Linux: CI_DRM_5010 -> Patchwork_10507

  CI_DRM_5010: 27a4f334d3ec8882d50227c26ae4e393d7d1f4a1 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4683: 7766b1e2348b32cc8ed58a972c6fd53b20279549 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10507: 2d11641bd072d84484e3ad4958fb79b82104cc15 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

2d11641bd072 drm/i915/guc: Propagate the fw xfer timeout

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10507/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-18 19:55 ` [PATCH v3] " Chris Wilson
@ 2018-10-18 20:48   ` Daniele Ceraolo Spurio
  2018-10-23  8:50     ` Chris Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-18 20:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 18/10/18 12:55, Chris Wilson wrote:
> Propagate the timeout on transferring the fw back to the caller where it
> may act upon it, usually by restarting the xfer before failing.
> 
> v2: Simplify the wait to only wait upon the guc signaling completion,
> with an assertion that the fw xfer must have completed for it to be
> ready!
> 
> Testcase: igt/drv_selftest/live_hangcheck
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_guc_fw.c | 106 +++++++++++++---------------
>   1 file changed, 50 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c
> index a9e6fcce467c..51b292001edd 100644
> --- a/drivers/gpu/drm/i915/intel_guc_fw.c
> +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
> @@ -125,66 +125,26 @@ static void guc_prepare_xfer(struct intel_guc *guc)
>   }
>   
>   /* Copy RSA signature from the fw image to HW for verification */
> -static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
> +static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>   {
>   	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> -	struct intel_uc_fw *guc_fw = &guc->fw;
> -	struct sg_table *sg = vma->pages;
>   	u32 rsa[UOS_RSA_SCRATCH_COUNT];
>   	int i;
>   
> -	if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
> -			       guc_fw->rsa_offset) != sizeof(rsa))
> -		return -EINVAL;
> +	sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
> +			   rsa, sizeof(rsa), guc->fw.rsa_offset);
>   
>   	for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
>   		I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
> -
> -	return 0;
>   }
>   
> -/*
> - * Transfer the firmware image to RAM for execution by the microcontroller.
> - *
> - * Architecturally, the DMA engine is bidirectional, and can potentially even
> - * transfer between GTT locations. This functionality is left out of the API
> - * for now as there is no need for it.
> - */
> -static int guc_xfer_ucode(struct intel_guc *guc, struct i915_vma *vma)
> +static bool guc_xfer_completed(struct intel_guc *guc, u32 *status)
>   {
>   	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> -	struct intel_uc_fw *guc_fw = &guc->fw;
> -	unsigned long offset;
> -	u32 status;
> -	int ret;
> -
> -	/*
> -	 * The header plus uCode will be copied to WOPCM via DMA, excluding any
> -	 * other components
> -	 */
> -	I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
> -
> -	/* Set the source address for the new blob */
> -	offset = intel_guc_ggtt_offset(guc, vma) + guc_fw->header_offset;
> -	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
> -	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
>   
> -	/*
> -	 * Set the DMA destination. Current uCode expects the code to be
> -	 * loaded at 8k; locations below this are used for the stack.
> -	 */
> -	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
> -	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
> -
> -	/* Finally start the DMA */
> -	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
> -
> -	/* Wait for DMA to finish */
> -	ret = __intel_wait_for_register_fw(dev_priv, DMA_CTRL, START_DMA, 0,
> -					   2, 100, &status);
> -	DRM_DEBUG_DRIVER("GuC DMA status %#x\n", status);
> -
> -	return ret;
> +	/* Did we complete the xfer? */
> +	*status = I915_READ(DMA_CTRL);
> +	return !(*status & START_DMA);
>   }
>   
>   /*
> @@ -228,9 +188,51 @@ static int guc_wait_ucode(struct intel_guc *guc)
>   		ret = -ENOEXEC;
>   	}
>   
> +	if (ret == 0 && !guc_xfer_completed(guc, &status)) {
> +		DRM_ERROR("GuC is ready, but the xfer %08x is incomplete\n",
> +			  status);
> +		ret = -ENXIO;

Do we want to allow reset & retry in this scenario? This feels like 
something weird happening in HW, so I'd say it'd be worth to allow a retry.

Thanks,
Daniele

> +	}
> +
>   	return ret;
>   }
>   
> +/*
> + * Transfer the firmware image to RAM for execution by the microcontroller.
> + *
> + * Architecturally, the DMA engine is bidirectional, and can potentially even
> + * transfer between GTT locations. This functionality is left out of the API
> + * for now as there is no need for it.
> + */
> +static int guc_xfer_ucode(struct intel_guc *guc, struct i915_vma *vma)
> +{
> +	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> +	struct intel_uc_fw *guc_fw = &guc->fw;
> +	unsigned long offset;
> +
> +	/*
> +	 * The header plus uCode will be copied to WOPCM via DMA, excluding any
> +	 * other components
> +	 */
> +	I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
> +
> +	/* Set the source address for the new blob */
> +	offset = intel_guc_ggtt_offset(guc, vma) + guc_fw->header_offset;
> +	I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
> +	I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
> +
> +	/*
> +	 * Set the DMA destination. Current uCode expects the code to be
> +	 * loaded at 8k; locations below this are used for the stack.
> +	 */
> +	I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
> +	I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
> +
> +	/* Finally start the DMA */
> +	I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
> +
> +	return guc_wait_ucode(guc);
> +}
>   /*
>    * Load the GuC firmware blob into the MinuteIA.
>    */
> @@ -251,17 +253,9 @@ static int guc_fw_xfer(struct intel_uc_fw *guc_fw, struct i915_vma *vma)
>   	 * by the DMA engine in one operation, whereas the RSA signature is
>   	 * loaded via MMIO.
>   	 */
> -	ret = guc_xfer_rsa(guc, vma);
> -	if (ret)
> -		DRM_WARN("GuC firmware signature xfer error %d\n", ret);
> +	guc_xfer_rsa(guc, vma);
>   
>   	ret = guc_xfer_ucode(guc, vma);
> -	if (ret)
> -		DRM_WARN("GuC firmware code xfer error %d\n", ret);
> -
> -	ret = guc_wait_ucode(guc);
> -	if (ret)
> -		DRM_ERROR("GuC firmware xfer error %d\n", ret);
>   
>   	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915/guc: Propagate the fw xfer timeout (rev3)
  2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
                   ` (6 preceding siblings ...)
  2018-10-18 20:37 ` ✓ Fi.CI.BAT: success for drm/i915/guc: Propagate the fw xfer timeout (rev3) Patchwork
@ 2018-10-18 23:41 ` Patchwork
  7 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2018-10-18 23:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/guc: Propagate the fw xfer timeout (rev3)
URL   : https://patchwork.freedesktop.org/series/51140/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_5010_full -> Patchwork_10507_full =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_10507_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_10507_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_10507_full:

  === IGT changes ===

    ==== Possible regressions ====

    igt@syncobj_wait@wait-for-submit-complex:
      shard-skl:          NOTRUN -> INCOMPLETE

    
== Known issues ==

  Here are the changes found in Patchwork_10507_full that come from known issues:

  === IGT changes ===

    ==== Issues hit ====

    igt@kms_atomic_transition@1x-modeset-transitions:
      shard-skl:          NOTRUN -> FAIL (fdo#108470)

    igt@kms_atomic_transition@1x-modeset-transitions-nonblocking-fencing:
      shard-apl:          PASS -> DMESG-WARN (fdo#103558, fdo#105602) +3

    igt@kms_atomic_transition@plane-all-modeset-transition:
      shard-hsw:          PASS -> DMESG-WARN (fdo#102614)

    igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-a:
      shard-snb:          NOTRUN -> DMESG-WARN (fdo#107956) +1

    igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c:
      shard-skl:          NOTRUN -> DMESG-WARN (fdo#107956)

    igt@kms_color@pipe-a-legacy-gamma:
      shard-apl:          PASS -> FAIL (fdo#108145, fdo#104782)

    igt@kms_cursor_crc@cursor-256x256-suspend:
      shard-skl:          NOTRUN -> FAIL (fdo#103191, fdo#103232)

    igt@kms_cursor_legacy@cursora-vs-flipa-toggle:
      shard-glk:          PASS -> DMESG-WARN (fdo#106538, fdo#105763)

    igt@kms_draw_crc@draw-method-rgb565-mmap-gtt-ytiled:
      shard-skl:          NOTRUN -> FAIL (fdo#103184)

    igt@kms_draw_crc@draw-method-xrgb8888-pwrite-untiled:
      shard-skl:          NOTRUN -> FAIL (fdo#108472)

    igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-pwrite:
      shard-skl:          NOTRUN -> FAIL (fdo#103167) +2

    igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-fullscreen:
      shard-skl:          NOTRUN -> FAIL (fdo#105682)

    igt@kms_plane_alpha_blend@pipe-a-alpha-7efc:
      shard-skl:          NOTRUN -> FAIL (fdo#107815, fdo#108145)

    igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max:
      shard-apl:          NOTRUN -> FAIL (fdo#108145)

    igt@kms_plane_multiple@atomic-pipe-b-tiling-x:
      shard-glk:          PASS -> FAIL (fdo#103166) +1
      shard-apl:          PASS -> FAIL (fdo#103166) +1

    igt@kms_setmode@basic:
      shard-apl:          NOTRUN -> FAIL (fdo#99912)
      shard-snb:          NOTRUN -> FAIL (fdo#99912)

    igt@pm_rpm@debugfs-read:
      shard-skl:          PASS -> INCOMPLETE (fdo#107807) +1

    igt@syncobj_wait@wait-all-for-submit-complex:
      shard-glk:          NOTRUN -> INCOMPLETE (k.org#198133, fdo#103359)
      shard-snb:          NOTRUN -> INCOMPLETE (fdo#105411)

    
    ==== Possible fixes ====

    igt@drv_suspend@fence-restore-untiled:
      shard-kbl:          INCOMPLETE (fdo#103665) -> PASS

    igt@gem_pwrite_pread@uncached-pwrite-blt-gtt_mmap-performance:
      shard-apl:          INCOMPLETE (fdo#103927) -> PASS

    igt@kms_ccs@pipe-b-crc-sprite-planes-basic:
      shard-glk:          FAIL (fdo#108145) -> PASS

    igt@kms_cursor_crc@cursor-128x128-suspend:
      shard-glk:          INCOMPLETE (k.org#198133, fdo#103359) -> PASS

    igt@kms_flip@flip-vs-modeset-interruptible:
      shard-kbl:          DMESG-WARN (fdo#103558, fdo#105602) -> PASS +4

    igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-blt:
      shard-glk:          FAIL (fdo#103167) -> PASS +3

    igt@kms_setmode@basic:
      shard-hsw:          FAIL (fdo#99912) -> PASS

    
  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#103166 https://bugs.freedesktop.org/show_bug.cgi?id=103166
  fdo#103167 https://bugs.freedesktop.org/show_bug.cgi?id=103167
  fdo#103184 https://bugs.freedesktop.org/show_bug.cgi?id=103184
  fdo#103191 https://bugs.freedesktop.org/show_bug.cgi?id=103191
  fdo#103232 https://bugs.freedesktop.org/show_bug.cgi?id=103232
  fdo#103359 https://bugs.freedesktop.org/show_bug.cgi?id=103359
  fdo#103558 https://bugs.freedesktop.org/show_bug.cgi?id=103558
  fdo#103665 https://bugs.freedesktop.org/show_bug.cgi?id=103665
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104782 https://bugs.freedesktop.org/show_bug.cgi?id=104782
  fdo#105411 https://bugs.freedesktop.org/show_bug.cgi?id=105411
  fdo#105602 https://bugs.freedesktop.org/show_bug.cgi?id=105602
  fdo#105682 https://bugs.freedesktop.org/show_bug.cgi?id=105682
  fdo#105763 https://bugs.freedesktop.org/show_bug.cgi?id=105763
  fdo#106538 https://bugs.freedesktop.org/show_bug.cgi?id=106538
  fdo#107807 https://bugs.freedesktop.org/show_bug.cgi?id=107807
  fdo#107815 https://bugs.freedesktop.org/show_bug.cgi?id=107815
  fdo#107956 https://bugs.freedesktop.org/show_bug.cgi?id=107956
  fdo#108145 https://bugs.freedesktop.org/show_bug.cgi?id=108145
  fdo#108470 https://bugs.freedesktop.org/show_bug.cgi?id=108470
  fdo#108472 https://bugs.freedesktop.org/show_bug.cgi?id=108472
  fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
  k.org#198133 https://bugzilla.kernel.org/show_bug.cgi?id=198133


== Participating hosts (6 -> 6) ==

  No changes in participating hosts


== Build changes ==

    * Linux: CI_DRM_5010 -> Patchwork_10507

  CI_DRM_5010: 27a4f334d3ec8882d50227c26ae4e393d7d1f4a1 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4683: 7766b1e2348b32cc8ed58a972c6fd53b20279549 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_10507: 2d11641bd072d84484e3ad4958fb79b82104cc15 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_10507/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-18 20:48   ` Daniele Ceraolo Spurio
@ 2018-10-23  8:50     ` Chris Wilson
  2018-10-23 16:04       ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2018-10-23  8:50 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx

Quoting Daniele Ceraolo Spurio (2018-10-18 21:48:30)
> 
> 
> On 18/10/18 12:55, Chris Wilson wrote:
> > Propagate the timeout on transferring the fw back to the caller where it
> > may act upon it, usually by restarting the xfer before failing.
> > 
> > v2: Simplify the wait to only wait upon the guc signaling completion,
> > with an assertion that the fw xfer must have completed for it to be
> > ready!
> > 
> > Testcase: igt/drv_selftest/live_hangcheck
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_guc_fw.c | 106 +++++++++++++---------------
> >   1 file changed, 50 insertions(+), 56 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c
> > index a9e6fcce467c..51b292001edd 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_fw.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
> > @@ -125,66 +125,26 @@ static void guc_prepare_xfer(struct intel_guc *guc)
> >   }
> >   
> >   /* Copy RSA signature from the fw image to HW for verification */
> > -static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
> > +static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
> >   {
> >       struct drm_i915_private *dev_priv = guc_to_i915(guc);
> > -     struct intel_uc_fw *guc_fw = &guc->fw;
> > -     struct sg_table *sg = vma->pages;
> >       u32 rsa[UOS_RSA_SCRATCH_COUNT];
> >       int i;
> >   
> > -     if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
> > -                            guc_fw->rsa_offset) != sizeof(rsa))
> > -             return -EINVAL;
> > +     sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
> > +                        rsa, sizeof(rsa), guc->fw.rsa_offset);
> >   
> >       for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
> >               I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
> > -
> > -     return 0;
> >   }
> >   
> > -/*
> > - * Transfer the firmware image to RAM for execution by the microcontroller.
> > - *
> > - * Architecturally, the DMA engine is bidirectional, and can potentially even
> > - * transfer between GTT locations. This functionality is left out of the API
> > - * for now as there is no need for it.
> > - */
> > -static int guc_xfer_ucode(struct intel_guc *guc, struct i915_vma *vma)
> > +static bool guc_xfer_completed(struct intel_guc *guc, u32 *status)
> >   {
> >       struct drm_i915_private *dev_priv = guc_to_i915(guc);
> > -     struct intel_uc_fw *guc_fw = &guc->fw;
> > -     unsigned long offset;
> > -     u32 status;
> > -     int ret;
> > -
> > -     /*
> > -      * The header plus uCode will be copied to WOPCM via DMA, excluding any
> > -      * other components
> > -      */
> > -     I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
> > -
> > -     /* Set the source address for the new blob */
> > -     offset = intel_guc_ggtt_offset(guc, vma) + guc_fw->header_offset;
> > -     I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
> > -     I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
> >   
> > -     /*
> > -      * Set the DMA destination. Current uCode expects the code to be
> > -      * loaded at 8k; locations below this are used for the stack.
> > -      */
> > -     I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
> > -     I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
> > -
> > -     /* Finally start the DMA */
> > -     I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
> > -
> > -     /* Wait for DMA to finish */
> > -     ret = __intel_wait_for_register_fw(dev_priv, DMA_CTRL, START_DMA, 0,
> > -                                        2, 100, &status);
> > -     DRM_DEBUG_DRIVER("GuC DMA status %#x\n", status);
> > -
> > -     return ret;
> > +     /* Did we complete the xfer? */
> > +     *status = I915_READ(DMA_CTRL);
> > +     return !(*status & START_DMA);
> >   }
> >   
> >   /*
> > @@ -228,9 +188,51 @@ static int guc_wait_ucode(struct intel_guc *guc)
> >               ret = -ENOEXEC;
> >       }
> >   
> > +     if (ret == 0 && !guc_xfer_completed(guc, &status)) {
> > +             DRM_ERROR("GuC is ready, but the xfer %08x is incomplete\n",
> > +                       status);
> > +             ret = -ENXIO;
> 
> Do we want to allow reset & retry in this scenario? This feels like 
> something weird happening in HW, so I'd say it'd be worth to allow a retry.

It strikes me as being very weird. Report and bail to see if it ever
occurs before trying to decide on how exactly to reset and retry.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3] drm/i915/guc: Propagate the fw xfer timeout
  2018-10-23  8:50     ` Chris Wilson
@ 2018-10-23 16:04       ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 17+ messages in thread
From: Daniele Ceraolo Spurio @ 2018-10-23 16:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 23/10/18 01:50, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2018-10-18 21:48:30)
>>
>>
>> On 18/10/18 12:55, Chris Wilson wrote:
>>> Propagate the timeout on transferring the fw back to the caller where it
>>> may act upon it, usually by restarting the xfer before failing.
>>>
>>> v2: Simplify the wait to only wait upon the guc signaling completion,
>>> with an assertion that the fw xfer must have completed for it to be
>>> ready!
>>>
>>> Testcase: igt/drv_selftest/live_hangcheck
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/intel_guc_fw.c | 106 +++++++++++++---------------
>>>    1 file changed, 50 insertions(+), 56 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_guc_fw.c b/drivers/gpu/drm/i915/intel_guc_fw.c
>>> index a9e6fcce467c..51b292001edd 100644
>>> --- a/drivers/gpu/drm/i915/intel_guc_fw.c
>>> +++ b/drivers/gpu/drm/i915/intel_guc_fw.c
>>> @@ -125,66 +125,26 @@ static void guc_prepare_xfer(struct intel_guc *guc)
>>>    }
>>>    
>>>    /* Copy RSA signature from the fw image to HW for verification */
>>> -static int guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>>> +static void guc_xfer_rsa(struct intel_guc *guc, struct i915_vma *vma)
>>>    {
>>>        struct drm_i915_private *dev_priv = guc_to_i915(guc);
>>> -     struct intel_uc_fw *guc_fw = &guc->fw;
>>> -     struct sg_table *sg = vma->pages;
>>>        u32 rsa[UOS_RSA_SCRATCH_COUNT];
>>>        int i;
>>>    
>>> -     if (sg_pcopy_to_buffer(sg->sgl, sg->nents, rsa, sizeof(rsa),
>>> -                            guc_fw->rsa_offset) != sizeof(rsa))
>>> -             return -EINVAL;
>>> +     sg_pcopy_to_buffer(vma->pages->sgl, vma->pages->nents,
>>> +                        rsa, sizeof(rsa), guc->fw.rsa_offset);
>>>    
>>>        for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++)
>>>                I915_WRITE(UOS_RSA_SCRATCH(i), rsa[i]);
>>> -
>>> -     return 0;
>>>    }
>>>    
>>> -/*
>>> - * Transfer the firmware image to RAM for execution by the microcontroller.
>>> - *
>>> - * Architecturally, the DMA engine is bidirectional, and can potentially even
>>> - * transfer between GTT locations. This functionality is left out of the API
>>> - * for now as there is no need for it.
>>> - */
>>> -static int guc_xfer_ucode(struct intel_guc *guc, struct i915_vma *vma)
>>> +static bool guc_xfer_completed(struct intel_guc *guc, u32 *status)
>>>    {
>>>        struct drm_i915_private *dev_priv = guc_to_i915(guc);
>>> -     struct intel_uc_fw *guc_fw = &guc->fw;
>>> -     unsigned long offset;
>>> -     u32 status;
>>> -     int ret;
>>> -
>>> -     /*
>>> -      * The header plus uCode will be copied to WOPCM via DMA, excluding any
>>> -      * other components
>>> -      */
>>> -     I915_WRITE(DMA_COPY_SIZE, guc_fw->header_size + guc_fw->ucode_size);
>>> -
>>> -     /* Set the source address for the new blob */
>>> -     offset = intel_guc_ggtt_offset(guc, vma) + guc_fw->header_offset;
>>> -     I915_WRITE(DMA_ADDR_0_LOW, lower_32_bits(offset));
>>> -     I915_WRITE(DMA_ADDR_0_HIGH, upper_32_bits(offset) & 0xFFFF);
>>>    
>>> -     /*
>>> -      * Set the DMA destination. Current uCode expects the code to be
>>> -      * loaded at 8k; locations below this are used for the stack.
>>> -      */
>>> -     I915_WRITE(DMA_ADDR_1_LOW, 0x2000);
>>> -     I915_WRITE(DMA_ADDR_1_HIGH, DMA_ADDRESS_SPACE_WOPCM);
>>> -
>>> -     /* Finally start the DMA */
>>> -     I915_WRITE(DMA_CTRL, _MASKED_BIT_ENABLE(UOS_MOVE | START_DMA));
>>> -
>>> -     /* Wait for DMA to finish */
>>> -     ret = __intel_wait_for_register_fw(dev_priv, DMA_CTRL, START_DMA, 0,
>>> -                                        2, 100, &status);
>>> -     DRM_DEBUG_DRIVER("GuC DMA status %#x\n", status);
>>> -
>>> -     return ret;
>>> +     /* Did we complete the xfer? */
>>> +     *status = I915_READ(DMA_CTRL);
>>> +     return !(*status & START_DMA);
>>>    }
>>>    
>>>    /*
>>> @@ -228,9 +188,51 @@ static int guc_wait_ucode(struct intel_guc *guc)
>>>                ret = -ENOEXEC;
>>>        }
>>>    
>>> +     if (ret == 0 && !guc_xfer_completed(guc, &status)) {
>>> +             DRM_ERROR("GuC is ready, but the xfer %08x is incomplete\n",
>>> +                       status);
>>> +             ret = -ENXIO;
>>
>> Do we want to allow reset & retry in this scenario? This feels like
>> something weird happening in HW, so I'd say it'd be worth to allow a retry.
> 
> It strikes me as being very weird. Report and bail to see if it ever
> occurs before trying to decide on how exactly to reset and retry.
> -Chris
> 

Sounds reasonable.

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-10-23 16:04 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-17 20:29 [PATCH] drm/i915/guc: Propagate the fw xfer timeout Chris Wilson
2018-10-17 21:12 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-10-17 22:14 ` ✓ Fi.CI.IGT: " Patchwork
2018-10-17 23:09 ` [PATCH] " Daniele Ceraolo Spurio
2018-10-17 23:22   ` Michal Wajdeczko
2018-10-18  9:13     ` Chris Wilson
2018-10-18 18:18       ` Daniele Ceraolo Spurio
2018-10-18 18:27         ` Michal Wajdeczko
2018-10-18 19:42           ` Chris Wilson
2018-10-18 19:54 ` [PATCH v2] " Chris Wilson
2018-10-18 19:55 ` [PATCH v3] " Chris Wilson
2018-10-18 20:48   ` Daniele Ceraolo Spurio
2018-10-23  8:50     ` Chris Wilson
2018-10-23 16:04       ` Daniele Ceraolo Spurio
2018-10-18 20:16 ` ✗ Fi.CI.BAT: failure for drm/i915/guc: Propagate the fw xfer timeout (rev2) Patchwork
2018-10-18 20:37 ` ✓ Fi.CI.BAT: success for drm/i915/guc: Propagate the fw xfer timeout (rev3) Patchwork
2018-10-18 23:41 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.