All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Skip error capture when wedged on init
@ 2021-11-09 12:20 ` Tvrtko Ursulin
  0 siblings, 0 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2021-11-09 12:20 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Trying to capture uninitialised engines when we wedged on init ends in
tears. Skip that together with uC capture, since failure to initialise the
latter can actually be one of the reasons for wedging on init.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2a2d7643b551..aa2b3aad9643 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1866,10 +1866,14 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 		}
 
 		gt_record_info(error->gt);
-		gt_record_engines(error->gt, engine_mask, compress);
 
-		if (INTEL_INFO(i915)->has_gt_uc)
-			error->gt->uc = gt_record_uc(error->gt, compress);
+		if (!intel_gt_has_unrecoverable_error(gt)) {
+			gt_record_engines(error->gt, engine_mask, compress);
+
+			if (INTEL_INFO(i915)->has_gt_uc)
+				error->gt->uc = gt_record_uc(error->gt,
+							     compress);
+		}
 
 		i915_vma_capture_finish(error->gt, compress);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Intel-gfx] [PATCH] drm/i915: Skip error capture when wedged on init
@ 2021-11-09 12:20 ` Tvrtko Ursulin
  0 siblings, 0 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2021-11-09 12:20 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Trying to capture uninitialised engines when we wedged on init ends in
tears. Skip that together with uC capture, since failure to initialise the
latter can actually be one of the reasons for wedging on init.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 2a2d7643b551..aa2b3aad9643 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1866,10 +1866,14 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
 		}
 
 		gt_record_info(error->gt);
-		gt_record_engines(error->gt, engine_mask, compress);
 
-		if (INTEL_INFO(i915)->has_gt_uc)
-			error->gt->uc = gt_record_uc(error->gt, compress);
+		if (!intel_gt_has_unrecoverable_error(gt)) {
+			gt_record_engines(error->gt, engine_mask, compress);
+
+			if (INTEL_INFO(i915)->has_gt_uc)
+				error->gt->uc = gt_record_uc(error->gt,
+							     compress);
+		}
 
 		i915_vma_capture_finish(error->gt, compress);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Skip error capture when wedged on init
  2021-11-09 12:20 ` [Intel-gfx] " Tvrtko Ursulin
  (?)
@ 2021-11-09 14:39 ` Patchwork
  -1 siblings, 0 replies; 7+ messages in thread
From: Patchwork @ 2021-11-09 14:39 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 5927 bytes --]

== Series Details ==

Series: drm/i915: Skip error capture when wedged on init
URL   : https://patchwork.freedesktop.org/series/96718/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10857 -> Patchwork_21547
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_21547 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_21547, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/index.html

Participating hosts (35 -> 33)
------------------------------

  Additional (1): fi-icl-u2 
  Missing    (3): fi-ctg-p8600 bat-dg1-6 bat-adlp-4 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_21547:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live@hugepages:
    - fi-rkl-guc:         [PASS][1] -> [DMESG-WARN][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10857/fi-rkl-guc/igt@i915_selftest@live@hugepages.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-rkl-guc/igt@i915_selftest@live@hugepages.html

  
Known issues
------------

  Here are the changes found in Patchwork_21547 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_cs_nop@fork-gfx0:
    - fi-icl-u2:          NOTRUN -> [SKIP][3] ([fdo#109315]) +17 similar issues
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-icl-u2/igt@amdgpu/amd_cs_nop@fork-gfx0.html

  * igt@gem_exec_suspend@basic-s0:
    - fi-tgl-1115g4:      [PASS][4] -> [FAIL][5] ([i915#1888])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10857/fi-tgl-1115g4/igt@gem_exec_suspend@basic-s0.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-tgl-1115g4/igt@gem_exec_suspend@basic-s0.html

  * igt@gem_huc_copy@huc-copy:
    - fi-icl-u2:          NOTRUN -> [SKIP][6] ([i915#2190])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-icl-u2/igt@gem_huc_copy@huc-copy.html

  * igt@i915_selftest@live@hangcheck:
    - fi-snb-2600:        [PASS][7] -> [INCOMPLETE][8] ([i915#3921])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10857/fi-snb-2600/igt@i915_selftest@live@hangcheck.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-snb-2600/igt@i915_selftest@live@hangcheck.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-icl-u2:          NOTRUN -> [SKIP][9] ([fdo#111827]) +8 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy:
    - fi-icl-u2:          NOTRUN -> [SKIP][10] ([fdo#109278]) +2 similar issues
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-icl-u2/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy.html

  * igt@kms_force_connector_basic@force-load-detect:
    - fi-icl-u2:          NOTRUN -> [SKIP][11] ([fdo#109285])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-icl-u2/igt@kms_force_connector_basic@force-load-detect.html

  * igt@prime_vgem@basic-userptr:
    - fi-icl-u2:          NOTRUN -> [SKIP][12] ([i915#3301])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-icl-u2/igt@prime_vgem@basic-userptr.html

  
#### Possible fixes ####

  * igt@kms_frontbuffer_tracking@basic:
    - fi-cml-u2:          [DMESG-WARN][13] ([i915#4269]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10857/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-cml-u2/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-b:
    - fi-cfl-8109u:       [DMESG-WARN][15] ([i915#295]) -> [PASS][16] +12 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10857/fi-cfl-8109u/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-b.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/fi-cfl-8109u/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-b.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867
  [i915#295]: https://gitlab.freedesktop.org/drm/intel/issues/295
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269


Build changes
-------------

  * Linux: CI_DRM_10857 -> Patchwork_21547

  CI-20190529: 20190529
  CI_DRM_10857: 2f005a829cd05b317c5b497a6941b88d981d22e6 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6275: 6d172a5cf51ffff5f2780e2837860d613db5067f @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_21547: 26a9ffe528edf82d31f13b50eb38c29418cb4d3a @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

26a9ffe528ed drm/i915: Skip error capture when wedged on init

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21547/index.html

[-- Attachment #2: Type: text/html, Size: 6694 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Skip error capture when wedged on init
  2021-11-09 12:20 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-11-10 10:48   ` Matthew Auld
  -1 siblings, 0 replies; 7+ messages in thread
From: Matthew Auld @ 2021-11-10 10:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel Graphics Development, ML dri-devel, Tvrtko Ursulin

On Tue, 9 Nov 2021 at 12:20, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Trying to capture uninitialised engines when we wedged on init ends in
> tears. Skip that together with uC capture, since failure to initialise the
> latter can actually be one of the reasons for wedging on init.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This fixes the issue with missing GuC wedging the GPU and then blowing
up when trying to use the driver?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2a2d7643b551..aa2b3aad9643 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1866,10 +1866,14 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
>                 }
>
>                 gt_record_info(error->gt);
> -               gt_record_engines(error->gt, engine_mask, compress);
>
> -               if (INTEL_INFO(i915)->has_gt_uc)
> -                       error->gt->uc = gt_record_uc(error->gt, compress);
> +               if (!intel_gt_has_unrecoverable_error(gt)) {
> +                       gt_record_engines(error->gt, engine_mask, compress);
> +
> +                       if (INTEL_INFO(i915)->has_gt_uc)
> +                               error->gt->uc = gt_record_uc(error->gt,
> +                                                            compress);
> +               }
>
>                 i915_vma_capture_finish(error->gt, compress);
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Skip error capture when wedged on init
@ 2021-11-10 10:48   ` Matthew Auld
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Auld @ 2021-11-10 10:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel Graphics Development, ML dri-devel

On Tue, 9 Nov 2021 at 12:20, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Trying to capture uninitialised engines when we wedged on init ends in
> tears. Skip that together with uC capture, since failure to initialise the
> latter can actually be one of the reasons for wedging on init.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This fixes the issue with missing GuC wedging the GPU and then blowing
up when trying to use the driver?

Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 2a2d7643b551..aa2b3aad9643 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1866,10 +1866,14 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
>                 }
>
>                 gt_record_info(error->gt);
> -               gt_record_engines(error->gt, engine_mask, compress);
>
> -               if (INTEL_INFO(i915)->has_gt_uc)
> -                       error->gt->uc = gt_record_uc(error->gt, compress);
> +               if (!intel_gt_has_unrecoverable_error(gt)) {
> +                       gt_record_engines(error->gt, engine_mask, compress);
> +
> +                       if (INTEL_INFO(i915)->has_gt_uc)
> +                               error->gt->uc = gt_record_uc(error->gt,
> +                                                            compress);
> +               }
>
>                 i915_vma_capture_finish(error->gt, compress);
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Skip error capture when wedged on init
  2021-11-10 10:48   ` [Intel-gfx] " Matthew Auld
@ 2021-11-10 11:34     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2021-11-10 11:34 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development, ML dri-devel, Tvrtko Ursulin


On 10/11/2021 10:48, Matthew Auld wrote:
> On Tue, 9 Nov 2021 at 12:20, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Trying to capture uninitialised engines when we wedged on init ends in
>> tears. Skip that together with uC capture, since failure to initialise the
>> latter can actually be one of the reasons for wedging on init.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> This fixes the issue with missing GuC wedging the GPU and then blowing
> up when trying to use the driver?

Probably does not blow up when using the driver, but definitely does 
when accessing error state. Someone suggested it would instead be better 
to call i915_disable_error_state from wedge on init/fini, and I think 
indeed it would, so I plan to send v2 looking like that.

Regards,

Tvrtko

> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> 
>> ---
>>   drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 2a2d7643b551..aa2b3aad9643 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1866,10 +1866,14 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
>>                  }
>>
>>                  gt_record_info(error->gt);
>> -               gt_record_engines(error->gt, engine_mask, compress);
>>
>> -               if (INTEL_INFO(i915)->has_gt_uc)
>> -                       error->gt->uc = gt_record_uc(error->gt, compress);
>> +               if (!intel_gt_has_unrecoverable_error(gt)) {
>> +                       gt_record_engines(error->gt, engine_mask, compress);
>> +
>> +                       if (INTEL_INFO(i915)->has_gt_uc)
>> +                               error->gt->uc = gt_record_uc(error->gt,
>> +                                                            compress);
>> +               }
>>
>>                  i915_vma_capture_finish(error->gt, compress);
>>
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Skip error capture when wedged on init
@ 2021-11-10 11:34     ` Tvrtko Ursulin
  0 siblings, 0 replies; 7+ messages in thread
From: Tvrtko Ursulin @ 2021-11-10 11:34 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development, ML dri-devel


On 10/11/2021 10:48, Matthew Auld wrote:
> On Tue, 9 Nov 2021 at 12:20, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Trying to capture uninitialised engines when we wedged on init ends in
>> tears. Skip that together with uC capture, since failure to initialise the
>> latter can actually be one of the reasons for wedging on init.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> This fixes the issue with missing GuC wedging the GPU and then blowing
> up when trying to use the driver?

Probably does not blow up when using the driver, but definitely does 
when accessing error state. Someone suggested it would instead be better 
to call i915_disable_error_state from wedge on init/fini, and I think 
indeed it would, so I plan to send v2 looking like that.

Regards,

Tvrtko

> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> 
>> ---
>>   drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>> index 2a2d7643b551..aa2b3aad9643 100644
>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>> @@ -1866,10 +1866,14 @@ i915_gpu_coredump(struct intel_gt *gt, intel_engine_mask_t engine_mask)
>>                  }
>>
>>                  gt_record_info(error->gt);
>> -               gt_record_engines(error->gt, engine_mask, compress);
>>
>> -               if (INTEL_INFO(i915)->has_gt_uc)
>> -                       error->gt->uc = gt_record_uc(error->gt, compress);
>> +               if (!intel_gt_has_unrecoverable_error(gt)) {
>> +                       gt_record_engines(error->gt, engine_mask, compress);
>> +
>> +                       if (INTEL_INFO(i915)->has_gt_uc)
>> +                               error->gt->uc = gt_record_uc(error->gt,
>> +                                                            compress);
>> +               }
>>
>>                  i915_vma_capture_finish(error->gt, compress);
>>
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-11-10 11:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-09 12:20 [PATCH] drm/i915: Skip error capture when wedged on init Tvrtko Ursulin
2021-11-09 12:20 ` [Intel-gfx] " Tvrtko Ursulin
2021-11-09 14:39 ` [Intel-gfx] ✗ Fi.CI.BAT: failure for " Patchwork
2021-11-10 10:48 ` [PATCH] " Matthew Auld
2021-11-10 10:48   ` [Intel-gfx] " Matthew Auld
2021-11-10 11:34   ` Tvrtko Ursulin
2021-11-10 11:34     ` [Intel-gfx] " Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.