All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Stop propagating fence errors by default
@ 2021-05-07  8:35 ` Tvrtko Ursulin
  0 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2021-05-07  8:35 UTC (permalink / raw)
  To: Intel-gfx
  Cc: Jason Ekstrand, Daniel Vetter, Marcin Slusarz, dri-devel, Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This is an alternative proposed fix for the below references bug report
where dma fence error propagation is causing undesirable change in
behaviour post GPU hang/reset.

Approach in this patch is to simply stop propagating all dma fence errors
by default since that seems to be the upstream ask.

To handle the case where i915 needs error propagation for security, I add
a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
the command parsing chain only.

It sounds a plausible argument that fence propagation could be useful in
which case a core flag to enable opt-in should be universally useful.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
Reported-by: Miroslav Bendik
References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
 drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
 drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
 include/linux/dma-fence.h                      | 1 +
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 297143511f99..6a516d1261d0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2522,6 +2522,8 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	}
 
 	dma_fence_work_init(&pw->base, &eb_parse_ops);
+	/* Propagate errors for security. */
+	__set_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &pw->base.dma.flags);
 
 	pw->engine = eb->engine;
 	pw->batch = eb->batch->vma;
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 2744558f3050..2ee917932ccf 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -449,7 +449,7 @@ static void dma_i915_sw_fence_wake_timer(struct dma_fence *dma,
 
 	fence = xchg(&cb->base.fence, NULL);
 	if (fence) {
-		i915_sw_fence_set_error_once(fence, dma->error);
+		i915_sw_fence_propagate_dma_fence_error(fence, dma);
 		i915_sw_fence_complete(fence);
 	}
 
@@ -480,7 +480,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	might_sleep_if(gfpflags_allow_blocking(gfp));
 
 	if (dma_fence_is_signaled(dma)) {
-		i915_sw_fence_set_error_once(fence, dma->error);
+		i915_sw_fence_propagate_dma_fence_error(fence, dma);
 		return 0;
 	}
 
@@ -496,7 +496,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 		if (ret)
 			return ret;
 
-		i915_sw_fence_set_error_once(fence, dma->error);
+		i915_sw_fence_propagate_dma_fence_error(fence, dma);
 		return 0;
 	}
 
@@ -548,7 +548,7 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	debug_fence_assert(fence);
 
 	if (dma_fence_is_signaled(dma)) {
-		i915_sw_fence_set_error_once(fence, dma->error);
+		i915_sw_fence_propagate_dma_fence_error(fence, dma);
 		return 0;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 30a863353ee6..872ef80ebd10 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -116,4 +116,12 @@ i915_sw_fence_set_error_once(struct i915_sw_fence *fence, int error)
 		cmpxchg(&fence->error, 0, error);
 }
 
+static inline void
+i915_sw_fence_propagate_dma_fence_error(struct i915_sw_fence *fence,
+					struct dma_fence *dma)
+{
+	if (unlikely(test_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &dma->flags)))
+		i915_sw_fence_set_error_once(fence, dma->error);
+}
+
 #endif /* _I915_SW_FENCE_H_ */
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 6ffb4b2c6371..8dabe1650f11 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
 	DMA_FENCE_FLAG_SIGNALED_BIT,
 	DMA_FENCE_FLAG_TIMESTAMP_BIT,
 	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+	DMA_FENCE_FLAG_PROPAGATE_ERROR,
 	DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
@ 2021-05-07  8:35 ` Tvrtko Ursulin
  0 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2021-05-07  8:35 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Jason Ekstrand, Daniel Vetter, dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

This is an alternative proposed fix for the below references bug report
where dma fence error propagation is causing undesirable change in
behaviour post GPU hang/reset.

Approach in this patch is to simply stop propagating all dma fence errors
by default since that seems to be the upstream ask.

To handle the case where i915 needs error propagation for security, I add
a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
the command parsing chain only.

It sounds a plausible argument that fence propagation could be useful in
which case a core flag to enable opt-in should be universally useful.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
Reported-by: Miroslav Bendik
References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
 drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
 drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
 include/linux/dma-fence.h                      | 1 +
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 297143511f99..6a516d1261d0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2522,6 +2522,8 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	}
 
 	dma_fence_work_init(&pw->base, &eb_parse_ops);
+	/* Propagate errors for security. */
+	__set_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &pw->base.dma.flags);
 
 	pw->engine = eb->engine;
 	pw->batch = eb->batch->vma;
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 2744558f3050..2ee917932ccf 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -449,7 +449,7 @@ static void dma_i915_sw_fence_wake_timer(struct dma_fence *dma,
 
 	fence = xchg(&cb->base.fence, NULL);
 	if (fence) {
-		i915_sw_fence_set_error_once(fence, dma->error);
+		i915_sw_fence_propagate_dma_fence_error(fence, dma);
 		i915_sw_fence_complete(fence);
 	}
 
@@ -480,7 +480,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	might_sleep_if(gfpflags_allow_blocking(gfp));
 
 	if (dma_fence_is_signaled(dma)) {
-		i915_sw_fence_set_error_once(fence, dma->error);
+		i915_sw_fence_propagate_dma_fence_error(fence, dma);
 		return 0;
 	}
 
@@ -496,7 +496,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 		if (ret)
 			return ret;
 
-		i915_sw_fence_set_error_once(fence, dma->error);
+		i915_sw_fence_propagate_dma_fence_error(fence, dma);
 		return 0;
 	}
 
@@ -548,7 +548,7 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	debug_fence_assert(fence);
 
 	if (dma_fence_is_signaled(dma)) {
-		i915_sw_fence_set_error_once(fence, dma->error);
+		i915_sw_fence_propagate_dma_fence_error(fence, dma);
 		return 0;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 30a863353ee6..872ef80ebd10 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -116,4 +116,12 @@ i915_sw_fence_set_error_once(struct i915_sw_fence *fence, int error)
 		cmpxchg(&fence->error, 0, error);
 }
 
+static inline void
+i915_sw_fence_propagate_dma_fence_error(struct i915_sw_fence *fence,
+					struct dma_fence *dma)
+{
+	if (unlikely(test_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &dma->flags)))
+		i915_sw_fence_set_error_once(fence, dma->error);
+}
+
 #endif /* _I915_SW_FENCE_H_ */
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 6ffb4b2c6371..8dabe1650f11 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
 	DMA_FENCE_FLAG_SIGNALED_BIT,
 	DMA_FENCE_FLAG_TIMESTAMP_BIT,
 	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+	DMA_FENCE_FLAG_PROPAGATE_ERROR,
 	DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
 };
 
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Stop propagating fence errors by default
  2021-05-07  8:35 ` [Intel-gfx] " Tvrtko Ursulin
  (?)
@ 2021-05-07  9:29 ` Patchwork
  -1 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2021-05-07  9:29 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Stop propagating fence errors by default
URL   : https://patchwork.freedesktop.org/series/89857/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
615d0d0d9d9f drm/i915: Stop propagating fence errors by default
-:22: ERROR:BAD_SIGN_OFF: Unrecognized email address: 'Miroslav Bendik'
#22: 
Reported-by: Miroslav Bendik

total: 1 errors, 0 warnings, 0 checks, 59 lines checked


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Stop propagating fence errors by default
  2021-05-07  8:35 ` [Intel-gfx] " Tvrtko Ursulin
  (?)
  (?)
@ 2021-05-07  9:58 ` Patchwork
  -1 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2021-05-07  9:58 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2101 bytes --]

== Series Details ==

Series: drm/i915: Stop propagating fence errors by default
URL   : https://patchwork.freedesktop.org/series/89857/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10056 -> Patchwork_20081
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/index.html

Known issues
------------

  Here are the changes found in Patchwork_20081 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@semaphore:
    - fi-bdw-5557u:       NOTRUN -> [SKIP][1] ([fdo#109271]) +23 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/fi-bdw-5557u/igt@amdgpu/amd_basic@semaphore.html

  * igt@amdgpu/amd_prime@amd-to-i915:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][2] ([fdo#109271])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/fi-kbl-soraka/igt@amdgpu/amd_prime@amd-to-i915.html

  * igt@core_hotunplug@unbind-rebind:
    - fi-bdw-5557u:       NOTRUN -> [WARN][3] ([i915#2283])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/fi-bdw-5557u/igt@core_hotunplug@unbind-rebind.html

  
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#2283]: https://gitlab.freedesktop.org/drm/intel/issues/2283


Participating hosts (43 -> 40)
------------------------------

  Missing    (3): fi-ilk-m540 fi-bdw-samus fi-hsw-4200u 


Build changes
-------------

  * Linux: CI_DRM_10056 -> Patchwork_20081

  CI-20190529: 20190529
  CI_DRM_10056: ba8b31d76abe45a6e486ba8cb73c53d147984e28 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6080: 1c450c3d4df19cf1087b8ccff3b62cb51addacae @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_20081: 615d0d0d9d9f20107f9a8415341327fd6a6c2fc8 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

615d0d0d9d9f drm/i915: Stop propagating fence errors by default

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/index.html

[-- Attachment #1.2: Type: text/html, Size: 2823 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915: Stop propagating fence errors by default
  2021-05-07  8:35 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  (?)
@ 2021-05-07 11:38 ` Patchwork
  -1 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2021-05-07 11:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 30273 bytes --]

== Series Details ==

Series: drm/i915: Stop propagating fence errors by default
URL   : https://patchwork.freedesktop.org/series/89857/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10056_full -> Patchwork_20081_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Known issues
------------

  Here are the changes found in Patchwork_20081_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_create@create-massive:
    - shard-snb:          NOTRUN -> [DMESG-WARN][1] ([i915#3002])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-snb2/igt@gem_create@create-massive.html

  * igt@gem_exec_capture@pi@rcs0:
    - shard-skl:          [PASS][2] -> [INCOMPLETE][3] ([i915#2369])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl8/igt@gem_exec_capture@pi@rcs0.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl10/igt@gem_exec_capture@pi@rcs0.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-glk:          [PASS][4] -> [FAIL][5] ([i915#2846])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-glk2/igt@gem_exec_fair@basic-deadline.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-glk4/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-pace@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][6] ([i915#2842])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb2/igt@gem_exec_fair@basic-pace@vcs1.html
    - shard-kbl:          [PASS][7] -> [SKIP][8] ([fdo#109271])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl7/igt@gem_exec_fair@basic-pace@vcs1.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl1/igt@gem_exec_fair@basic-pace@vcs1.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-glk:          [PASS][9] -> [FAIL][10] ([i915#2842]) +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-glk2/igt@gem_exec_fair@basic-throttle@rcs0.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-glk6/igt@gem_exec_fair@basic-throttle@rcs0.html
    - shard-iclb:         [PASS][11] -> [FAIL][12] ([i915#2849])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb3/igt@gem_exec_fair@basic-throttle@rcs0.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb6/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_exec_params@rsvd2-dirt:
    - shard-iclb:         NOTRUN -> [SKIP][13] ([fdo#109283])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@gem_exec_params@rsvd2-dirt.html

  * igt@gem_exec_reloc@basic-wide-active@bcs0:
    - shard-apl:          NOTRUN -> [FAIL][14] ([i915#2389]) +3 similar issues
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@gem_exec_reloc@basic-wide-active@bcs0.html

  * igt@gem_exec_reloc@basic-wide-active@rcs0:
    - shard-iclb:         NOTRUN -> [FAIL][15] ([i915#2389]) +3 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@gem_exec_reloc@basic-wide-active@rcs0.html

  * igt@gem_exec_schedule@u-semaphore-user:
    - shard-snb:          NOTRUN -> [SKIP][16] ([fdo#109271]) +126 similar issues
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-snb2/igt@gem_exec_schedule@u-semaphore-user.html

  * igt@gem_exec_whisper@basic-forked-all:
    - shard-glk:          [PASS][17] -> [DMESG-WARN][18] ([i915#118] / [i915#95])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-glk8/igt@gem_exec_whisper@basic-forked-all.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-glk8/igt@gem_exec_whisper@basic-forked-all.html

  * igt@gem_mmap_gtt@big-copy:
    - shard-glk:          [PASS][19] -> [FAIL][20] ([i915#307]) +1 similar issue
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-glk6/igt@gem_mmap_gtt@big-copy.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-glk7/igt@gem_mmap_gtt@big-copy.html

  * igt@gem_mmap_gtt@cpuset-big-copy:
    - shard-iclb:         [PASS][21] -> [FAIL][22] ([i915#307])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb5/igt@gem_mmap_gtt@cpuset-big-copy.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb2/igt@gem_mmap_gtt@cpuset-big-copy.html

  * igt@gem_mmap_gtt@cpuset-big-copy-xy:
    - shard-iclb:         [PASS][23] -> [FAIL][24] ([i915#2428])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb2/igt@gem_mmap_gtt@cpuset-big-copy-xy.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb8/igt@gem_mmap_gtt@cpuset-big-copy-xy.html

  * igt@gem_pread@exhaustion:
    - shard-snb:          NOTRUN -> [WARN][25] ([i915#2658])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-snb2/igt@gem_pread@exhaustion.html

  * igt@gem_render_copy@y-tiled-ccs-to-y-tiled-mc-ccs:
    - shard-iclb:         NOTRUN -> [SKIP][26] ([i915#768])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@gem_render_copy@y-tiled-ccs-to-y-tiled-mc-ccs.html

  * igt@gem_softpin@noreloc-s3:
    - shard-apl:          NOTRUN -> [DMESG-WARN][27] ([i915#180])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl6/igt@gem_softpin@noreloc-s3.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-apl:          NOTRUN -> [SKIP][28] ([fdo#109271] / [i915#3323])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl8/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@set-cache-level:
    - shard-skl:          NOTRUN -> [FAIL][29] ([i915#3324])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl6/igt@gem_userptr_blits@set-cache-level.html

  * igt@gem_vm_create@destroy-race:
    - shard-tglb:         [PASS][30] -> [TIMEOUT][31] ([i915#2795])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-tglb2/igt@gem_vm_create@destroy-race.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-tglb5/igt@gem_vm_create@destroy-race.html

  * igt@gen3_mixed_blits:
    - shard-kbl:          NOTRUN -> [SKIP][32] ([fdo#109271]) +8 similar issues
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl1/igt@gen3_mixed_blits.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-iclb:         NOTRUN -> [SKIP][33] ([fdo#112306])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@gen9_exec_parse@allowed-single.html

  * igt@kms_big_fb@x-tiled-8bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][34] ([fdo#110725] / [fdo#111614])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_big_fb@x-tiled-8bpp-rotate-270.html

  * igt@kms_chamelium@hdmi-edid-read:
    - shard-iclb:         NOTRUN -> [SKIP][35] ([fdo#109284] / [fdo#111827])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_chamelium@hdmi-edid-read.html

  * igt@kms_color_chamelium@pipe-a-ctm-limited-range:
    - shard-apl:          NOTRUN -> [SKIP][36] ([fdo#109271] / [fdo#111827]) +18 similar issues
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@kms_color_chamelium@pipe-a-ctm-limited-range.html

  * igt@kms_color_chamelium@pipe-d-ctm-blue-to-red:
    - shard-skl:          NOTRUN -> [SKIP][37] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl6/igt@kms_color_chamelium@pipe-d-ctm-blue-to-red.html

  * igt@kms_color_chamelium@pipe-invalid-gamma-lut-sizes:
    - shard-snb:          NOTRUN -> [SKIP][38] ([fdo#109271] / [fdo#111827]) +5 similar issues
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-snb2/igt@kms_color_chamelium@pipe-invalid-gamma-lut-sizes.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-apl:          NOTRUN -> [TIMEOUT][39] ([i915#1319]) +1 similar issue
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl8/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_cursor_crc@pipe-a-cursor-max-size-rapid-movement:
    - shard-iclb:         NOTRUN -> [SKIP][40] ([fdo#109278]) +5 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_cursor_crc@pipe-a-cursor-max-size-rapid-movement.html

  * igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions-varying-size:
    - shard-iclb:         NOTRUN -> [SKIP][41] ([fdo#109274] / [fdo#109278])
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_cursor_legacy@cursora-vs-flipb-atomic-transitions-varying-size.html

  * igt@kms_flip@2x-plain-flip:
    - shard-iclb:         NOTRUN -> [SKIP][42] ([fdo#109274])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_flip@2x-plain-flip.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1:
    - shard-skl:          [PASS][43] -> [FAIL][44] ([i915#79]) +2 similar issues
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl8/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl10/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [PASS][45] -> [DMESG-WARN][46] ([i915#180]) +8 similar issues
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl6/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile:
    - shard-apl:          NOTRUN -> [SKIP][47] ([fdo#109271] / [i915#2642])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl8/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs:
    - shard-apl:          NOTRUN -> [SKIP][48] ([fdo#109271] / [i915#2672])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs.html

  * igt@kms_frontbuffer_tracking@fbc-suspend:
    - shard-apl:          [PASS][49] -> [DMESG-WARN][50] ([i915#180]) +1 similar issue
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-apl3/igt@kms_frontbuffer_tracking@fbc-suspend.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl6/igt@kms_frontbuffer_tracking@fbc-suspend.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-draw-mmap-gtt:
    - shard-skl:          NOTRUN -> [SKIP][51] ([fdo#109271]) +55 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl1/igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-pri-shrfb-draw-mmap-wc:
    - shard-iclb:         NOTRUN -> [SKIP][52] ([fdo#109280]) +4 similar issues
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_frontbuffer_tracking@psr-2p-primscrn-pri-shrfb-draw-mmap-wc.html

  * igt@kms_hdr@bpc-switch-suspend:
    - shard-skl:          [PASS][53] -> [FAIL][54] ([i915#1188])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl10/igt@kms_hdr@bpc-switch-suspend.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl8/igt@kms_hdr@bpc-switch-suspend.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes:
    - shard-kbl:          [PASS][55] -> [DMESG-WARN][56] ([i915#180] / [i915#533])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl1/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-a-planes.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-basic:
    - shard-skl:          NOTRUN -> [FAIL][57] ([fdo#108145] / [i915#265])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl1/igt@kms_plane_alpha_blend@pipe-a-alpha-basic.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-opaque-fb:
    - shard-apl:          NOTRUN -> [FAIL][58] ([fdo#108145] / [i915#265]) +2 similar issues
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@kms_plane_alpha_blend@pipe-a-alpha-opaque-fb.html

  * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc:
    - shard-skl:          [PASS][59] -> [FAIL][60] ([fdo#108145] / [i915#265])
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl1/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl1/igt@kms_plane_alpha_blend@pipe-b-coverage-7efc.html

  * igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping:
    - shard-apl:          NOTRUN -> [SKIP][61] ([fdo#109271] / [i915#2733])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@kms_plane_scaling@scaler-with-clipping-clamping@pipe-c-scaler-with-clipping-clamping.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-2:
    - shard-apl:          NOTRUN -> [SKIP][62] ([fdo#109271] / [i915#658]) +5 similar issues
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl8/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-2.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-3:
    - shard-skl:          NOTRUN -> [SKIP][63] ([fdo#109271] / [i915#658])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl6/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area-3.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-4:
    - shard-iclb:         NOTRUN -> [SKIP][64] ([i915#658])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area-4.html

  * igt@kms_psr@psr2_primary_mmap_cpu:
    - shard-iclb:         [PASS][65] -> [SKIP][66] ([fdo#109441]) +1 similar issue
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb8/igt@kms_psr@psr2_primary_mmap_cpu.html

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-kbl:          [PASS][67] -> [DMESG-WARN][68] ([i915#180] / [i915#295])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl6/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl3/igt@kms_vblank@pipe-a-ts-continuation-suspend.html

  * igt@kms_vblank@pipe-d-wait-idle:
    - shard-apl:          NOTRUN -> [SKIP][69] ([fdo#109271] / [i915#533]) +2 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl8/igt@kms_vblank@pipe-d-wait-idle.html

  * igt@kms_writeback@writeback-invalid-parameters:
    - shard-skl:          NOTRUN -> [SKIP][70] ([fdo#109271] / [i915#2437])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl8/igt@kms_writeback@writeback-invalid-parameters.html
    - shard-apl:          NOTRUN -> [SKIP][71] ([fdo#109271] / [i915#2437])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@kms_writeback@writeback-invalid-parameters.html
    - shard-iclb:         NOTRUN -> [SKIP][72] ([i915#2437])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_writeback@writeback-invalid-parameters.html

  * igt@nouveau_crc@pipe-b-ctx-flip-skip-current-frame:
    - shard-apl:          NOTRUN -> [SKIP][73] ([fdo#109271]) +190 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@nouveau_crc@pipe-b-ctx-flip-skip-current-frame.html

  * igt@nouveau_crc@pipe-d-source-rg:
    - shard-iclb:         NOTRUN -> [SKIP][74] ([fdo#109278] / [i915#2530])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@nouveau_crc@pipe-d-source-rg.html

  * igt@runner@aborted:
    - shard-snb:          NOTRUN -> [FAIL][75] ([i915#3002] / [i915#698])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-snb2/igt@runner@aborted.html

  * igt@sysfs_clients@fair-1:
    - shard-apl:          NOTRUN -> [SKIP][76] ([fdo#109271] / [i915#2994]) +4 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@sysfs_clients@fair-1.html

  * igt@sysfs_clients@pidname:
    - shard-skl:          NOTRUN -> [SKIP][77] ([fdo#109271] / [i915#2994])
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl6/igt@sysfs_clients@pidname.html

  * igt@tools_test@sysfs_l3_parity:
    - shard-iclb:         NOTRUN -> [SKIP][78] ([fdo#109307])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@tools_test@sysfs_l3_parity.html

  
#### Possible fixes ####

  * igt@gem_eio@in-flight-suspend:
    - shard-kbl:          [DMESG-WARN][79] ([i915#180]) -> [PASS][80]
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl7/igt@gem_eio@in-flight-suspend.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl1/igt@gem_eio@in-flight-suspend.html

  * igt@gem_eio@kms:
    - shard-skl:          [TIMEOUT][81] ([i915#2502]) -> [PASS][82]
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl6/igt@gem_eio@kms.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl10/igt@gem_eio@kms.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-tglb:         [FAIL][83] ([i915#2842]) -> [PASS][84] +1 similar issue
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-tglb1/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-tglb1/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace@bcs0:
    - shard-iclb:         [FAIL][85] ([i915#2842]) -> [PASS][86]
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb5/igt@gem_exec_fair@basic-pace@bcs0.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb2/igt@gem_exec_fair@basic-pace@bcs0.html

  * igt@gem_exec_whisper@basic-normal-all:
    - shard-glk:          [DMESG-WARN][87] ([i915#118] / [i915#95]) -> [PASS][88] +1 similar issue
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-glk3/igt@gem_exec_whisper@basic-normal-all.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-glk2/igt@gem_exec_whisper@basic-normal-all.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [SKIP][89] ([i915#2190]) -> [PASS][90]
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-tglb6/igt@gem_huc_copy@huc-copy.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-tglb7/igt@gem_huc_copy@huc-copy.html

  * igt@gem_mmap_gtt@cpuset-medium-copy-odd:
    - shard-iclb:         [FAIL][91] ([i915#307]) -> [PASS][92]
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb5/igt@gem_mmap_gtt@cpuset-medium-copy-odd.html
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb3/igt@gem_mmap_gtt@cpuset-medium-copy-odd.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-skl:          [DMESG-WARN][93] ([i915#1436] / [i915#716]) -> [PASS][94]
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl1/igt@gen9_exec_parse@allowed-single.html
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl8/igt@gen9_exec_parse@allowed-single.html

  * igt@kms_color@pipe-a-ctm-max:
    - shard-skl:          [DMESG-WARN][95] ([i915#1982]) -> [PASS][96]
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl6/igt@kms_color@pipe-a-ctm-max.html
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl7/igt@kms_color@pipe-a-ctm-max.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-apl:          [DMESG-WARN][97] ([i915#180]) -> [PASS][98]
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-apl6/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl8/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_cursor_crc@pipe-b-cursor-256x85-random:
    - shard-skl:          [FAIL][99] ([i915#3444]) -> [PASS][100]
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl9/igt@kms_cursor_crc@pipe-b-cursor-256x85-random.html
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl9/igt@kms_cursor_crc@pipe-b-cursor-256x85-random.html

  * igt@kms_fbcon_fbt@psr-suspend:
    - shard-iclb:         [INCOMPLETE][101] ([i915#1185]) -> [PASS][102]
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb3/igt@kms_fbcon_fbt@psr-suspend.html
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@kms_fbcon_fbt@psr-suspend.html

  * igt@kms_flip@flip-vs-expired-vblank@a-hdmi-a2:
    - shard-glk:          [FAIL][103] ([i915#79]) -> [PASS][104]
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-glk8/igt@kms_flip@flip-vs-expired-vblank@a-hdmi-a2.html
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-glk3/igt@kms_flip@flip-vs-expired-vblank@a-hdmi-a2.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [FAIL][105] ([fdo#108145] / [i915#265]) -> [PASS][106]
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl7/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl6/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_psr2_su@page_flip:
    - shard-iclb:         [SKIP][107] ([fdo#109642] / [fdo#111068] / [i915#658]) -> [PASS][108]
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb5/igt@kms_psr2_su@page_flip.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb2/igt@kms_psr2_su@page_flip.html

  * igt@kms_psr@psr2_primary_blt:
    - shard-iclb:         [SKIP][109] ([fdo#109441]) -> [PASS][110]
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb7/igt@kms_psr@psr2_primary_blt.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb2/igt@kms_psr@psr2_primary_blt.html

  * igt@perf@polling-parameterized:
    - shard-skl:          [FAIL][111] ([i915#1542]) -> [PASS][112]
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl7/igt@perf@polling-parameterized.html
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl1/igt@perf@polling-parameterized.html

  
#### Warnings ####

  * igt@i915_pm_rc6_residency@rc6-fence:
    - shard-iclb:         [WARN][113] ([i915#2684]) -> [WARN][114] ([i915#1804] / [i915#2684]) +1 similar issue
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb8/igt@i915_pm_rc6_residency@rc6-fence.html
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb7/igt@i915_pm_rc6_residency@rc6-fence.html

  * igt@kms_cursor_crc@pipe-b-cursor-suspend:
    - shard-kbl:          [INCOMPLETE][115] ([i915#155] / [i915#2405]) -> [DMESG-WARN][116] ([i915#180])
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl4/igt@kms_cursor_crc@pipe-b-cursor-suspend.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl2/igt@kms_cursor_crc@pipe-b-cursor-suspend.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-edp1:
    - shard-skl:          [FAIL][117] ([i915#79]) -> [FAIL][118] ([i915#2122])
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl8/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-edp1.html
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl10/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-edp1.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area-0:
    - shard-iclb:         [SKIP][119] ([i915#658]) -> [SKIP][120] ([i915#2920]) +1 similar issue
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb4/igt@kms_psr2_sf@plane-move-sf-dmg-area-0.html
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb2/igt@kms_psr2_sf@plane-move-sf-dmg-area-0.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area-2:
    - shard-iclb:         [SKIP][121] ([i915#2920]) -> [SKIP][122] ([i915#658]) +2 similar issues
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-iclb2/igt@kms_psr2_sf@plane-move-sf-dmg-area-2.html
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-iclb8/igt@kms_psr2_sf@plane-move-sf-dmg-area-2.html

  * igt@runner@aborted:
    - shard-kbl:          ([FAIL][123], [FAIL][124], [FAIL][125], [FAIL][126], [FAIL][127], [FAIL][128], [FAIL][129]) ([i915#1436] / [i915#180] / [i915#1814] / [i915#3002] / [i915#3363] / [i915#92]) -> ([FAIL][130], [FAIL][131], [FAIL][132], [FAIL][133], [FAIL][134], [FAIL][135], [FAIL][136], [FAIL][137], [FAIL][138], [FAIL][139], [FAIL][140], [FAIL][141], [FAIL][142], [FAIL][143], [FAIL][144]) ([fdo#109271] / [i915#1436] / [i915#180] / [i915#1814] / [i915#2505] / [i915#3002] / [i915#3363] / [i915#602])
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl7/igt@runner@aborted.html
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl2/igt@runner@aborted.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl3/igt@runner@aborted.html
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl7/igt@runner@aborted.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl7/igt@runner@aborted.html
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl1/igt@runner@aborted.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-kbl1/igt@runner@aborted.html
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl1/igt@runner@aborted.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl2/igt@runner@aborted.html
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl2/igt@runner@aborted.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl3/igt@runner@aborted.html
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl3/igt@runner@aborted.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl2/igt@runner@aborted.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl2/igt@runner@aborted.html
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl1/igt@runner@aborted.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl6/igt@runner@aborted.html
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl4/igt@runner@aborted.html
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl7/igt@runner@aborted.html
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl4/igt@runner@aborted.html
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl4/igt@runner@aborted.html
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl7/igt@runner@aborted.html
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-kbl4/igt@runner@aborted.html
    - shard-apl:          ([FAIL][145], [FAIL][146], [FAIL][147]) ([i915#1814] / [i915#3002] / [i915#3363]) -> ([FAIL][148], [FAIL][149], [FAIL][150]) ([fdo#109271] / [i915#1610] / [i915#180] / [i915#1814] / [i915#3363])
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-apl6/igt@runner@aborted.html
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-apl7/igt@runner@aborted.html
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-apl1/igt@runner@aborted.html
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl3/igt@runner@aborted.html
   [149]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl6/igt@runner@aborted.html
   [150]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-apl6/igt@runner@aborted.html
    - shard-skl:          ([FAIL][151], [FAIL][152], [FAIL][153]) ([i915#1436] / [i915#3002] / [i915#3363]) -> ([FAIL][154], [FAIL][155], [FAIL][156], [FAIL][157]) ([i915#1814] / [i915#2029] / [i915#2369] / [i915#3002] / [i915#3363])
   [151]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl1/igt@runner@aborted.html
   [152]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl7/igt@runner@aborted.html
   [153]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10056/shard-skl6/igt@runner@aborted.html
   [154]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl2/igt@runner@aborted.html
   [155]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl10/igt@runner@aborted.html
   [156]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl1/igt@runner@aborted.html
   [157]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/shard-skl10/igt@runner@aborted.html

  
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109283]: https://bugs.freedesktop.org/show_bug.cgi?id=1092

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20081/index.html

[-- Attachment #1.2: Type: text/html, Size: 35246 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
  2021-05-07  8:35 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-10  1:36   ` Jason Ekstrand
  -1 siblings, 0 replies; 17+ messages in thread
From: Jason Ekstrand @ 2021-05-10  1:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx; +Cc: Jason Ekstrand, Daniel Vetter, dri-devel

[-- Attachment #1: Type: text/plain, Size: 4913 bytes --]

On May 7, 2021 03:35:33 Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> This is an alternative proposed fix for the below references bug report
> where dma fence error propagation is causing undesirable change in
> behaviour post GPU hang/reset.
>
> Approach in this patch is to simply stop propagating all dma fence errors
> by default since that seems to be the upstream ask.
>
> To handle the case where i915 needs error propagation for security, I add
> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
> the command parsing chain only.

This sounds plausible. I can't review in full without doing a thorough 
audit of the surrounding code, though. I'll try to get to that next week if 
Daniel doesn't beat me to it. Thanks for working on this!

--Jason


> It sounds a plausible argument that fence propagation could be useful in
> which case a core flag to enable opt-in should be universally useful.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
> Reported-by: Miroslav Bendik
> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already 
> signaled fences")
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
> drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
> drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
> include/linux/dma-fence.h                      | 1 +
> 4 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 297143511f99..6a516d1261d0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2522,6 +2522,8 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
>  }
>
>  dma_fence_work_init(&pw->base, &eb_parse_ops);
> + /* Propagate errors for security. */
> + __set_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &pw->base.dma.flags);
>
>  pw->engine = eb->engine;
>  pw->batch = eb->batch->vma;
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
> b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 2744558f3050..2ee917932ccf 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -449,7 +449,7 @@ static void dma_i915_sw_fence_wake_timer(struct 
> dma_fence *dma,
>
>  fence = xchg(&cb->base.fence, NULL);
>  if (fence) {
> - i915_sw_fence_set_error_once(fence, dma->error);
> + i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  i915_sw_fence_complete(fence);
>  }
>
> @@ -480,7 +480,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence 
> *fence,
>  might_sleep_if(gfpflags_allow_blocking(gfp));
>
>  if (dma_fence_is_signaled(dma)) {
> - i915_sw_fence_set_error_once(fence, dma->error);
> + i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  return 0;
>  }
>
> @@ -496,7 +496,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence 
> *fence,
>  if (ret)
>  return ret;
>
> - i915_sw_fence_set_error_once(fence, dma->error);
> + i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  return 0;
>  }
>
> @@ -548,7 +548,7 @@ int __i915_sw_fence_await_dma_fence(struct 
> i915_sw_fence *fence,
>  debug_fence_assert(fence);
>
>  if (dma_fence_is_signaled(dma)) {
> - i915_sw_fence_set_error_once(fence, dma->error);
> + i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  return 0;
>  }
>
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h 
> b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 30a863353ee6..872ef80ebd10 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -116,4 +116,12 @@ i915_sw_fence_set_error_once(struct i915_sw_fence 
> *fence, int error)
>  cmpxchg(&fence->error, 0, error);
> }
>
> +static inline void
> +i915_sw_fence_propagate_dma_fence_error(struct i915_sw_fence *fence,
> + struct dma_fence *dma)
> +{
> + if (unlikely(test_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &dma->flags)))
> + i915_sw_fence_set_error_once(fence, dma->error);
> +}
> +
> #endif /* _I915_SW_FENCE_H_ */
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 6ffb4b2c6371..8dabe1650f11 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
>  DMA_FENCE_FLAG_SIGNALED_BIT,
>  DMA_FENCE_FLAG_TIMESTAMP_BIT,
>  DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> + DMA_FENCE_FLAG_PROPAGATE_ERROR,
>  DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> };
>
> --
> 2.30.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[-- Attachment #2: Type: text/html, Size: 8740 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
@ 2021-05-10  1:36   ` Jason Ekstrand
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Ekstrand @ 2021-05-10  1:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx; +Cc: Jason Ekstrand, Daniel Vetter, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4913 bytes --]

On May 7, 2021 03:35:33 Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> This is an alternative proposed fix for the below references bug report
> where dma fence error propagation is causing undesirable change in
> behaviour post GPU hang/reset.
>
> Approach in this patch is to simply stop propagating all dma fence errors
> by default since that seems to be the upstream ask.
>
> To handle the case where i915 needs error propagation for security, I add
> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
> the command parsing chain only.

This sounds plausible. I can't review in full without doing a thorough 
audit of the surrounding code, though. I'll try to get to that next week if 
Daniel doesn't beat me to it. Thanks for working on this!

--Jason


> It sounds a plausible argument that fence propagation could be useful in
> which case a core flag to enable opt-in should be universally useful.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
> Reported-by: Miroslav Bendik
> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already 
> signaled fences")
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
> drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
> drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
> include/linux/dma-fence.h                      | 1 +
> 4 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 297143511f99..6a516d1261d0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2522,6 +2522,8 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
>  }
>
>  dma_fence_work_init(&pw->base, &eb_parse_ops);
> + /* Propagate errors for security. */
> + __set_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &pw->base.dma.flags);
>
>  pw->engine = eb->engine;
>  pw->batch = eb->batch->vma;
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c 
> b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 2744558f3050..2ee917932ccf 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -449,7 +449,7 @@ static void dma_i915_sw_fence_wake_timer(struct 
> dma_fence *dma,
>
>  fence = xchg(&cb->base.fence, NULL);
>  if (fence) {
> - i915_sw_fence_set_error_once(fence, dma->error);
> + i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  i915_sw_fence_complete(fence);
>  }
>
> @@ -480,7 +480,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence 
> *fence,
>  might_sleep_if(gfpflags_allow_blocking(gfp));
>
>  if (dma_fence_is_signaled(dma)) {
> - i915_sw_fence_set_error_once(fence, dma->error);
> + i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  return 0;
>  }
>
> @@ -496,7 +496,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence 
> *fence,
>  if (ret)
>  return ret;
>
> - i915_sw_fence_set_error_once(fence, dma->error);
> + i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  return 0;
>  }
>
> @@ -548,7 +548,7 @@ int __i915_sw_fence_await_dma_fence(struct 
> i915_sw_fence *fence,
>  debug_fence_assert(fence);
>
>  if (dma_fence_is_signaled(dma)) {
> - i915_sw_fence_set_error_once(fence, dma->error);
> + i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  return 0;
>  }
>
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h 
> b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 30a863353ee6..872ef80ebd10 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -116,4 +116,12 @@ i915_sw_fence_set_error_once(struct i915_sw_fence 
> *fence, int error)
>  cmpxchg(&fence->error, 0, error);
> }
>
> +static inline void
> +i915_sw_fence_propagate_dma_fence_error(struct i915_sw_fence *fence,
> + struct dma_fence *dma)
> +{
> + if (unlikely(test_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &dma->flags)))
> + i915_sw_fence_set_error_once(fence, dma->error);
> +}
> +
> #endif /* _I915_SW_FENCE_H_ */
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 6ffb4b2c6371..8dabe1650f11 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
>  DMA_FENCE_FLAG_SIGNALED_BIT,
>  DMA_FENCE_FLAG_TIMESTAMP_BIT,
>  DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> + DMA_FENCE_FLAG_PROPAGATE_ERROR,
>  DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
> };
>
> --
> 2.30.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[-- Attachment #1.2: Type: text/html, Size: 8740 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915: Stop propagating fence errors by default
  2021-05-07  8:35 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-10 15:55   ` Daniel Vetter
  -1 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2021-05-10 15:55 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Tvrtko Ursulin, Daniel Vetter, Intel-gfx, dri-devel,
	Jason Ekstrand, Marcin Slusarz

On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> This is an alternative proposed fix for the below references bug report
> where dma fence error propagation is causing undesirable change in
> behaviour post GPU hang/reset.
> 
> Approach in this patch is to simply stop propagating all dma fence errors
> by default since that seems to be the upstream ask.
> 
> To handle the case where i915 needs error propagation for security, I add
> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
> the command parsing chain only.
> 
> It sounds a plausible argument that fence propagation could be useful in
> which case a core flag to enable opt-in should be universally useful.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
> Reported-by: Miroslav Bendik
> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
>  drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
>  drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
>  include/linux/dma-fence.h                      | 1 +

I still don't like this, least because we still introduce the concept of
error propagation to dma-fence (but hey only in i915 code, which is
exactly the kind of not-really-upstream approach we got a major chiding
for).

The only thing this does is make it explicitly opt-in instead opt-out,
like the first fix. The right approach is imo still to just throw it out,
and instead make the one error propagation we really need very, very
explicit. Instead of hiding it behind lots of magic.

The one error propagation we need is when the cmd parser work fails, it
must cancel it's corresponding request to make sure the batchbuffer
doesn't run. This should require about 2 lines in total:

- one line to store the request so that the cmd parser work can access it.
  No refcounting needed, because the the request cannot even start (much
  less get freed) before the cmd parser has singalled its fence

- one line to kill the request if the parsing fails. Maybe 2 if you
  include the if condition. I have no idea how that's done since I'm
  honestly lost how the i915 scheduler decides whether to run a batch or
  not. I'm guessing we have a version of this for the ringbuffer and the
  execlist backend (if not maybe gen7 cmdparser is broken?)

I don't see any need for magic behind-the-scenes propagation of such a
security critical error. Especially when that error propagation thing
caused security bugs of its own, is an i915-only feature, and not
motivated by any userspace/uapi requirements at all.

Thanks, Daniel

>  4 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 297143511f99..6a516d1261d0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2522,6 +2522,8 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
>  	}
>  
>  	dma_fence_work_init(&pw->base, &eb_parse_ops);
> +	/* Propagate errors for security. */
> +	__set_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &pw->base.dma.flags);
>  
>  	pw->engine = eb->engine;
>  	pw->batch = eb->batch->vma;
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 2744558f3050..2ee917932ccf 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -449,7 +449,7 @@ static void dma_i915_sw_fence_wake_timer(struct dma_fence *dma,
>  
>  	fence = xchg(&cb->base.fence, NULL);
>  	if (fence) {
> -		i915_sw_fence_set_error_once(fence, dma->error);
> +		i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  		i915_sw_fence_complete(fence);
>  	}
>  
> @@ -480,7 +480,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  	might_sleep_if(gfpflags_allow_blocking(gfp));
>  
>  	if (dma_fence_is_signaled(dma)) {
> -		i915_sw_fence_set_error_once(fence, dma->error);
> +		i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  		return 0;
>  	}
>  
> @@ -496,7 +496,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  		if (ret)
>  			return ret;
>  
> -		i915_sw_fence_set_error_once(fence, dma->error);
> +		i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  		return 0;
>  	}
>  
> @@ -548,7 +548,7 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  	debug_fence_assert(fence);
>  
>  	if (dma_fence_is_signaled(dma)) {
> -		i915_sw_fence_set_error_once(fence, dma->error);
> +		i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  		return 0;
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 30a863353ee6..872ef80ebd10 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -116,4 +116,12 @@ i915_sw_fence_set_error_once(struct i915_sw_fence *fence, int error)
>  		cmpxchg(&fence->error, 0, error);
>  }
>  
> +static inline void
> +i915_sw_fence_propagate_dma_fence_error(struct i915_sw_fence *fence,
> +					struct dma_fence *dma)
> +{
> +	if (unlikely(test_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &dma->flags)))
> +		i915_sw_fence_set_error_once(fence, dma->error);
> +}
> +
>  #endif /* _I915_SW_FENCE_H_ */
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 6ffb4b2c6371..8dabe1650f11 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
>  	DMA_FENCE_FLAG_SIGNALED_BIT,
>  	DMA_FENCE_FLAG_TIMESTAMP_BIT,
>  	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +	DMA_FENCE_FLAG_PROPAGATE_ERROR,
>  	DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
>  };
>  
> -- 
> 2.30.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
@ 2021-05-10 15:55   ` Daniel Vetter
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2021-05-10 15:55 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx, dri-devel, Jason Ekstrand

On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> This is an alternative proposed fix for the below references bug report
> where dma fence error propagation is causing undesirable change in
> behaviour post GPU hang/reset.
> 
> Approach in this patch is to simply stop propagating all dma fence errors
> by default since that seems to be the upstream ask.
> 
> To handle the case where i915 needs error propagation for security, I add
> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
> the command parsing chain only.
> 
> It sounds a plausible argument that fence propagation could be useful in
> which case a core flag to enable opt-in should be universally useful.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
> Reported-by: Miroslav Bendik
> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
>  drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
>  drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
>  include/linux/dma-fence.h                      | 1 +

I still don't like this, least because we still introduce the concept of
error propagation to dma-fence (but hey only in i915 code, which is
exactly the kind of not-really-upstream approach we got a major chiding
for).

The only thing this does is make it explicitly opt-in instead opt-out,
like the first fix. The right approach is imo still to just throw it out,
and instead make the one error propagation we really need very, very
explicit. Instead of hiding it behind lots of magic.

The one error propagation we need is when the cmd parser work fails, it
must cancel it's corresponding request to make sure the batchbuffer
doesn't run. This should require about 2 lines in total:

- one line to store the request so that the cmd parser work can access it.
  No refcounting needed, because the the request cannot even start (much
  less get freed) before the cmd parser has singalled its fence

- one line to kill the request if the parsing fails. Maybe 2 if you
  include the if condition. I have no idea how that's done since I'm
  honestly lost how the i915 scheduler decides whether to run a batch or
  not. I'm guessing we have a version of this for the ringbuffer and the
  execlist backend (if not maybe gen7 cmdparser is broken?)

I don't see any need for magic behind-the-scenes propagation of such a
security critical error. Especially when that error propagation thing
caused security bugs of its own, is an i915-only feature, and not
motivated by any userspace/uapi requirements at all.

Thanks, Daniel

>  4 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 297143511f99..6a516d1261d0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2522,6 +2522,8 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
>  	}
>  
>  	dma_fence_work_init(&pw->base, &eb_parse_ops);
> +	/* Propagate errors for security. */
> +	__set_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &pw->base.dma.flags);
>  
>  	pw->engine = eb->engine;
>  	pw->batch = eb->batch->vma;
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 2744558f3050..2ee917932ccf 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -449,7 +449,7 @@ static void dma_i915_sw_fence_wake_timer(struct dma_fence *dma,
>  
>  	fence = xchg(&cb->base.fence, NULL);
>  	if (fence) {
> -		i915_sw_fence_set_error_once(fence, dma->error);
> +		i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  		i915_sw_fence_complete(fence);
>  	}
>  
> @@ -480,7 +480,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  	might_sleep_if(gfpflags_allow_blocking(gfp));
>  
>  	if (dma_fence_is_signaled(dma)) {
> -		i915_sw_fence_set_error_once(fence, dma->error);
> +		i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  		return 0;
>  	}
>  
> @@ -496,7 +496,7 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  		if (ret)
>  			return ret;
>  
> -		i915_sw_fence_set_error_once(fence, dma->error);
> +		i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  		return 0;
>  	}
>  
> @@ -548,7 +548,7 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>  	debug_fence_assert(fence);
>  
>  	if (dma_fence_is_signaled(dma)) {
> -		i915_sw_fence_set_error_once(fence, dma->error);
> +		i915_sw_fence_propagate_dma_fence_error(fence, dma);
>  		return 0;
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 30a863353ee6..872ef80ebd10 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -116,4 +116,12 @@ i915_sw_fence_set_error_once(struct i915_sw_fence *fence, int error)
>  		cmpxchg(&fence->error, 0, error);
>  }
>  
> +static inline void
> +i915_sw_fence_propagate_dma_fence_error(struct i915_sw_fence *fence,
> +					struct dma_fence *dma)
> +{
> +	if (unlikely(test_bit(DMA_FENCE_FLAG_PROPAGATE_ERROR, &dma->flags)))
> +		i915_sw_fence_set_error_once(fence, dma->error);
> +}
> +
>  #endif /* _I915_SW_FENCE_H_ */
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 6ffb4b2c6371..8dabe1650f11 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
>  	DMA_FENCE_FLAG_SIGNALED_BIT,
>  	DMA_FENCE_FLAG_TIMESTAMP_BIT,
>  	DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +	DMA_FENCE_FLAG_PROPAGATE_ERROR,
>  	DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
>  };
>  
> -- 
> 2.30.2
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915: Stop propagating fence errors by default
  2021-05-10 15:55   ` [Intel-gfx] " Daniel Vetter
@ 2021-05-11  9:05     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2021-05-11  9:05 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Tvrtko Ursulin, Daniel Vetter, Intel-gfx, dri-devel,
	Jason Ekstrand, Marcin Slusarz


On 10/05/2021 16:55, Daniel Vetter wrote:
> On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> This is an alternative proposed fix for the below references bug report
>> where dma fence error propagation is causing undesirable change in
>> behaviour post GPU hang/reset.
>>
>> Approach in this patch is to simply stop propagating all dma fence errors
>> by default since that seems to be the upstream ask.
>>
>> To handle the case where i915 needs error propagation for security, I add
>> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
>> the command parsing chain only.
>>
>> It sounds a plausible argument that fence propagation could be useful in
>> which case a core flag to enable opt-in should be universally useful.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
>> Reported-by: Miroslav Bendik
>> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
>> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
>>   drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
>>   drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
>>   include/linux/dma-fence.h                      | 1 +
> 
> I still don't like this, least because we still introduce the concept of
> error propagation to dma-fence (but hey only in i915 code, which is
> exactly the kind of not-really-upstream approach we got a major chiding
> for).
> 
> The only thing this does is make it explicitly opt-in instead opt-out,
> like the first fix. The right approach is imo still to just throw it out,
> and instead make the one error propagation we really need very, very
> explicit. Instead of hiding it behind lots of magic.
> 
> The one error propagation we need is when the cmd parser work fails, it
> must cancel it's corresponding request to make sure the batchbuffer
> doesn't run. This should require about 2 lines in total:
> 
> - one line to store the request so that the cmd parser work can access it.
>    No refcounting needed, because the the request cannot even start (much
>    less get freed) before the cmd parser has singalled its fence
> 
> - one line to kill the request if the parsing fails. Maybe 2 if you
>    include the if condition. I have no idea how that's done since I'm
>    honestly lost how the i915 scheduler decides whether to run a batch or
>    not. I'm guessing we have a version of this for the ringbuffer and the
>    execlist backend (if not maybe gen7 cmdparser is broken?)
> 
> I don't see any need for magic behind-the-scenes propagation of such a
> security critical error. Especially when that error propagation thing
> caused security bugs of its own, is an i915-only feature, and not
> motivated by any userspace/uapi requirements at all.

I took this approach because to me propagating errors sounds more 
logical than ignoring them and I was arguing in the commit message that 
the infrastructure to enable that could be put in place as opt-in.

I also do not see a lot of magic in this patch. Only thing, potentially 
the logic should be inverted so that the waiter marks itself as 
interested in receiving errors. That would probably make even more sense 
as a core concept.

Has there been a wider discussion on this topic in the past? I am 
curious to know, even if propagation currently is i915 only, could other 
drivers be interested.

Note that it adds almost nothing to the dma-buf common code about a 
single flag, and at some point (currently missing) documentation on the 
very flag.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
@ 2021-05-11  9:05     ` Tvrtko Ursulin
  0 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2021-05-11  9:05 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx, dri-devel, Jason Ekstrand


On 10/05/2021 16:55, Daniel Vetter wrote:
> On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> This is an alternative proposed fix for the below references bug report
>> where dma fence error propagation is causing undesirable change in
>> behaviour post GPU hang/reset.
>>
>> Approach in this patch is to simply stop propagating all dma fence errors
>> by default since that seems to be the upstream ask.
>>
>> To handle the case where i915 needs error propagation for security, I add
>> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
>> the command parsing chain only.
>>
>> It sounds a plausible argument that fence propagation could be useful in
>> which case a core flag to enable opt-in should be universally useful.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
>> Reported-by: Miroslav Bendik
>> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
>> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
>>   drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
>>   drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
>>   include/linux/dma-fence.h                      | 1 +
> 
> I still don't like this, least because we still introduce the concept of
> error propagation to dma-fence (but hey only in i915 code, which is
> exactly the kind of not-really-upstream approach we got a major chiding
> for).
> 
> The only thing this does is make it explicitly opt-in instead opt-out,
> like the first fix. The right approach is imo still to just throw it out,
> and instead make the one error propagation we really need very, very
> explicit. Instead of hiding it behind lots of magic.
> 
> The one error propagation we need is when the cmd parser work fails, it
> must cancel it's corresponding request to make sure the batchbuffer
> doesn't run. This should require about 2 lines in total:
> 
> - one line to store the request so that the cmd parser work can access it.
>    No refcounting needed, because the the request cannot even start (much
>    less get freed) before the cmd parser has singalled its fence
> 
> - one line to kill the request if the parsing fails. Maybe 2 if you
>    include the if condition. I have no idea how that's done since I'm
>    honestly lost how the i915 scheduler decides whether to run a batch or
>    not. I'm guessing we have a version of this for the ringbuffer and the
>    execlist backend (if not maybe gen7 cmdparser is broken?)
> 
> I don't see any need for magic behind-the-scenes propagation of such a
> security critical error. Especially when that error propagation thing
> caused security bugs of its own, is an i915-only feature, and not
> motivated by any userspace/uapi requirements at all.

I took this approach because to me propagating errors sounds more 
logical than ignoring them and I was arguing in the commit message that 
the infrastructure to enable that could be put in place as opt-in.

I also do not see a lot of magic in this patch. Only thing, potentially 
the logic should be inverted so that the waiter marks itself as 
interested in receiving errors. That would probably make even more sense 
as a core concept.

Has there been a wider discussion on this topic in the past? I am 
curious to know, even if propagation currently is i915 only, could other 
drivers be interested.

Note that it adds almost nothing to the dma-buf common code about a 
single flag, and at some point (currently missing) documentation on the 
very flag.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915: Stop propagating fence errors by default
  2021-05-11  9:05     ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-17 15:12       ` Daniel Vetter
  -1 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2021-05-17 15:12 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Tvrtko Ursulin, Daniel Vetter, Intel-gfx, dri-devel,
	Jason Ekstrand, Marcin Slusarz

On Tue, May 11, 2021 at 10:05:27AM +0100, Tvrtko Ursulin wrote:
> 
> On 10/05/2021 16:55, Daniel Vetter wrote:
> > On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > 
> > > This is an alternative proposed fix for the below references bug report
> > > where dma fence error propagation is causing undesirable change in
> > > behaviour post GPU hang/reset.
> > > 
> > > Approach in this patch is to simply stop propagating all dma fence errors
> > > by default since that seems to be the upstream ask.
> > > 
> > > To handle the case where i915 needs error propagation for security, I add
> > > a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
> > > the command parsing chain only.
> > > 
> > > It sounds a plausible argument that fence propagation could be useful in
> > > which case a core flag to enable opt-in should be universally useful.
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
> > > Reported-by: Miroslav Bendik
> > > References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
> > > References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> > > Cc: Jason Ekstrand <jason.ekstrand@intel.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
> > >   drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
> > >   drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
> > >   include/linux/dma-fence.h                      | 1 +
> > 
> > I still don't like this, least because we still introduce the concept of
> > error propagation to dma-fence (but hey only in i915 code, which is
> > exactly the kind of not-really-upstream approach we got a major chiding
> > for).
> > 
> > The only thing this does is make it explicitly opt-in instead opt-out,
> > like the first fix. The right approach is imo still to just throw it out,
> > and instead make the one error propagation we really need very, very
> > explicit. Instead of hiding it behind lots of magic.
> > 
> > The one error propagation we need is when the cmd parser work fails, it
> > must cancel it's corresponding request to make sure the batchbuffer
> > doesn't run. This should require about 2 lines in total:
> > 
> > - one line to store the request so that the cmd parser work can access it.
> >    No refcounting needed, because the the request cannot even start (much
> >    less get freed) before the cmd parser has singalled its fence
> > 
> > - one line to kill the request if the parsing fails. Maybe 2 if you
> >    include the if condition. I have no idea how that's done since I'm
> >    honestly lost how the i915 scheduler decides whether to run a batch or
> >    not. I'm guessing we have a version of this for the ringbuffer and the
> >    execlist backend (if not maybe gen7 cmdparser is broken?)
> > 
> > I don't see any need for magic behind-the-scenes propagation of such a
> > security critical error. Especially when that error propagation thing
> > caused security bugs of its own, is an i915-only feature, and not
> > motivated by any userspace/uapi requirements at all.
> 
> I took this approach because to me propagating errors sounds more logical
> than ignoring them and I was arguing in the commit message that the
> infrastructure to enable that could be put in place as opt-in.
> 
> I also do not see a lot of magic in this patch. Only thing, potentially the
> logic should be inverted so that the waiter marks itself as interested in
> receiving errors. That would probably make even more sense as a core
> concept.
> 
> Has there been a wider discussion on this topic in the past? I am curious to
> know, even if propagation currently is i915 only, could other drivers be
> interested.

There hasn't been. i915-gem team decided "this is a cool concept", which
resulted in a security bug. Now we're a few months in arguing whether a
cool-looking concept that leads to a security bug is maybe a good idea,
and whether we should sneak it in as a core concept to dma-buf.h without
any wider discussion on the concept.

> Note that it adds almost nothing to the dma-buf common code about a single
> flag, and at some point (currently missing) documentation on the very flag.

This is really not how upstream collaboration works, and it needs to stop.

If you want this, start another thread arguing why this is a good idea,
fully decoupled from the security fix here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
@ 2021-05-17 15:12       ` Daniel Vetter
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2021-05-17 15:12 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx, dri-devel, Jason Ekstrand

On Tue, May 11, 2021 at 10:05:27AM +0100, Tvrtko Ursulin wrote:
> 
> On 10/05/2021 16:55, Daniel Vetter wrote:
> > On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > 
> > > This is an alternative proposed fix for the below references bug report
> > > where dma fence error propagation is causing undesirable change in
> > > behaviour post GPU hang/reset.
> > > 
> > > Approach in this patch is to simply stop propagating all dma fence errors
> > > by default since that seems to be the upstream ask.
> > > 
> > > To handle the case where i915 needs error propagation for security, I add
> > > a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
> > > the command parsing chain only.
> > > 
> > > It sounds a plausible argument that fence propagation could be useful in
> > > which case a core flag to enable opt-in should be universally useful.
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
> > > Reported-by: Miroslav Bendik
> > > References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
> > > References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> > > Cc: Jason Ekstrand <jason.ekstrand@intel.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
> > >   drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
> > >   drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
> > >   include/linux/dma-fence.h                      | 1 +
> > 
> > I still don't like this, least because we still introduce the concept of
> > error propagation to dma-fence (but hey only in i915 code, which is
> > exactly the kind of not-really-upstream approach we got a major chiding
> > for).
> > 
> > The only thing this does is make it explicitly opt-in instead opt-out,
> > like the first fix. The right approach is imo still to just throw it out,
> > and instead make the one error propagation we really need very, very
> > explicit. Instead of hiding it behind lots of magic.
> > 
> > The one error propagation we need is when the cmd parser work fails, it
> > must cancel it's corresponding request to make sure the batchbuffer
> > doesn't run. This should require about 2 lines in total:
> > 
> > - one line to store the request so that the cmd parser work can access it.
> >    No refcounting needed, because the the request cannot even start (much
> >    less get freed) before the cmd parser has singalled its fence
> > 
> > - one line to kill the request if the parsing fails. Maybe 2 if you
> >    include the if condition. I have no idea how that's done since I'm
> >    honestly lost how the i915 scheduler decides whether to run a batch or
> >    not. I'm guessing we have a version of this for the ringbuffer and the
> >    execlist backend (if not maybe gen7 cmdparser is broken?)
> > 
> > I don't see any need for magic behind-the-scenes propagation of such a
> > security critical error. Especially when that error propagation thing
> > caused security bugs of its own, is an i915-only feature, and not
> > motivated by any userspace/uapi requirements at all.
> 
> I took this approach because to me propagating errors sounds more logical
> than ignoring them and I was arguing in the commit message that the
> infrastructure to enable that could be put in place as opt-in.
> 
> I also do not see a lot of magic in this patch. Only thing, potentially the
> logic should be inverted so that the waiter marks itself as interested in
> receiving errors. That would probably make even more sense as a core
> concept.
> 
> Has there been a wider discussion on this topic in the past? I am curious to
> know, even if propagation currently is i915 only, could other drivers be
> interested.

There hasn't been. i915-gem team decided "this is a cool concept", which
resulted in a security bug. Now we're a few months in arguing whether a
cool-looking concept that leads to a security bug is maybe a good idea,
and whether we should sneak it in as a core concept to dma-buf.h without
any wider discussion on the concept.

> Note that it adds almost nothing to the dma-buf common code about a single
> flag, and at some point (currently missing) documentation on the very flag.

This is really not how upstream collaboration works, and it needs to stop.

If you want this, start another thread arguing why this is a good idea,
fully decoupled from the security fix here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915: Stop propagating fence errors by default
  2021-05-17 15:12       ` [Intel-gfx] " Daniel Vetter
@ 2021-05-17 15:33         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2021-05-17 15:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Tvrtko Ursulin, Daniel Vetter, Intel-gfx, dri-devel,
	Jason Ekstrand, Marcin Slusarz


On 17/05/2021 16:12, Daniel Vetter wrote:
> On Tue, May 11, 2021 at 10:05:27AM +0100, Tvrtko Ursulin wrote:
>>
>> On 10/05/2021 16:55, Daniel Vetter wrote:
>>> On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> This is an alternative proposed fix for the below references bug report
>>>> where dma fence error propagation is causing undesirable change in
>>>> behaviour post GPU hang/reset.
>>>>
>>>> Approach in this patch is to simply stop propagating all dma fence errors
>>>> by default since that seems to be the upstream ask.
>>>>
>>>> To handle the case where i915 needs error propagation for security, I add
>>>> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
>>>> the command parsing chain only.
>>>>
>>>> It sounds a plausible argument that fence propagation could be useful in
>>>> which case a core flag to enable opt-in should be universally useful.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
>>>> Reported-by: Miroslav Bendik
>>>> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
>>>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
>>>> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
>>>>    drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
>>>>    drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
>>>>    include/linux/dma-fence.h                      | 1 +
>>>
>>> I still don't like this, least because we still introduce the concept of
>>> error propagation to dma-fence (but hey only in i915 code, which is
>>> exactly the kind of not-really-upstream approach we got a major chiding
>>> for).
>>>
>>> The only thing this does is make it explicitly opt-in instead opt-out,
>>> like the first fix. The right approach is imo still to just throw it out,
>>> and instead make the one error propagation we really need very, very
>>> explicit. Instead of hiding it behind lots of magic.
>>>
>>> The one error propagation we need is when the cmd parser work fails, it
>>> must cancel it's corresponding request to make sure the batchbuffer
>>> doesn't run. This should require about 2 lines in total:
>>>
>>> - one line to store the request so that the cmd parser work can access it.
>>>     No refcounting needed, because the the request cannot even start (much
>>>     less get freed) before the cmd parser has singalled its fence
>>>
>>> - one line to kill the request if the parsing fails. Maybe 2 if you
>>>     include the if condition. I have no idea how that's done since I'm
>>>     honestly lost how the i915 scheduler decides whether to run a batch or
>>>     not. I'm guessing we have a version of this for the ringbuffer and the
>>>     execlist backend (if not maybe gen7 cmdparser is broken?)
>>>
>>> I don't see any need for magic behind-the-scenes propagation of such a
>>> security critical error. Especially when that error propagation thing
>>> caused security bugs of its own, is an i915-only feature, and not
>>> motivated by any userspace/uapi requirements at all.
>>
>> I took this approach because to me propagating errors sounds more logical
>> than ignoring them and I was arguing in the commit message that the
>> infrastructure to enable that could be put in place as opt-in.
>>
>> I also do not see a lot of magic in this patch. Only thing, potentially the
>> logic should be inverted so that the waiter marks itself as interested in
>> receiving errors. That would probably make even more sense as a core
>> concept.
>>
>> Has there been a wider discussion on this topic in the past? I am curious to
>> know, even if propagation currently is i915 only, could other drivers be
>> interested.
> 
> There hasn't been. i915-gem team decided "this is a cool concept", which
> resulted in a security bug. Now we're a few months in arguing whether a
> cool-looking concept that leads to a security bug is maybe a good idea,
> and whether we should sneak it in as a core concept to dma-buf.h without
> any wider discussion on the concept.
> 
>> Note that it adds almost nothing to the dma-buf common code about a single
>> flag, and at some point (currently missing) documentation on the very flag.
> 
> This is really not how upstream collaboration works, and it needs to stop.
> 
> If you want this, start another thread arguing why this is a good idea,
> fully decoupled from the security fix here.

When I asked you whether you know there were past discussions on this 
topic, clearly the point of that was to figure out whether a new 
discussion needs to be started, or I need to go and read an existing one 
to get up to speed.

I don't know how you interpreted that as an attempt to sneak anything 
in. And I don't know how I could have reliably figured out the answer to 
that question without asking. So colour me confused.

To clarify on the security issue part - are you talking about 
https://gitlab.freedesktop.org/drm/intel/-/issues/3080, or the other 
security issue, the one which would be caused by simply reverting the 
error propagation in i915?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
@ 2021-05-17 15:33         ` Tvrtko Ursulin
  0 siblings, 0 replies; 17+ messages in thread
From: Tvrtko Ursulin @ 2021-05-17 15:33 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, Intel-gfx, dri-devel, Jason Ekstrand


On 17/05/2021 16:12, Daniel Vetter wrote:
> On Tue, May 11, 2021 at 10:05:27AM +0100, Tvrtko Ursulin wrote:
>>
>> On 10/05/2021 16:55, Daniel Vetter wrote:
>>> On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> This is an alternative proposed fix for the below references bug report
>>>> where dma fence error propagation is causing undesirable change in
>>>> behaviour post GPU hang/reset.
>>>>
>>>> Approach in this patch is to simply stop propagating all dma fence errors
>>>> by default since that seems to be the upstream ask.
>>>>
>>>> To handle the case where i915 needs error propagation for security, I add
>>>> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
>>>> the command parsing chain only.
>>>>
>>>> It sounds a plausible argument that fence propagation could be useful in
>>>> which case a core flag to enable opt-in should be universally useful.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
>>>> Reported-by: Miroslav Bendik
>>>> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
>>>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
>>>> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
>>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
>>>>    drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
>>>>    drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
>>>>    include/linux/dma-fence.h                      | 1 +
>>>
>>> I still don't like this, least because we still introduce the concept of
>>> error propagation to dma-fence (but hey only in i915 code, which is
>>> exactly the kind of not-really-upstream approach we got a major chiding
>>> for).
>>>
>>> The only thing this does is make it explicitly opt-in instead opt-out,
>>> like the first fix. The right approach is imo still to just throw it out,
>>> and instead make the one error propagation we really need very, very
>>> explicit. Instead of hiding it behind lots of magic.
>>>
>>> The one error propagation we need is when the cmd parser work fails, it
>>> must cancel it's corresponding request to make sure the batchbuffer
>>> doesn't run. This should require about 2 lines in total:
>>>
>>> - one line to store the request so that the cmd parser work can access it.
>>>     No refcounting needed, because the the request cannot even start (much
>>>     less get freed) before the cmd parser has singalled its fence
>>>
>>> - one line to kill the request if the parsing fails. Maybe 2 if you
>>>     include the if condition. I have no idea how that's done since I'm
>>>     honestly lost how the i915 scheduler decides whether to run a batch or
>>>     not. I'm guessing we have a version of this for the ringbuffer and the
>>>     execlist backend (if not maybe gen7 cmdparser is broken?)
>>>
>>> I don't see any need for magic behind-the-scenes propagation of such a
>>> security critical error. Especially when that error propagation thing
>>> caused security bugs of its own, is an i915-only feature, and not
>>> motivated by any userspace/uapi requirements at all.
>>
>> I took this approach because to me propagating errors sounds more logical
>> than ignoring them and I was arguing in the commit message that the
>> infrastructure to enable that could be put in place as opt-in.
>>
>> I also do not see a lot of magic in this patch. Only thing, potentially the
>> logic should be inverted so that the waiter marks itself as interested in
>> receiving errors. That would probably make even more sense as a core
>> concept.
>>
>> Has there been a wider discussion on this topic in the past? I am curious to
>> know, even if propagation currently is i915 only, could other drivers be
>> interested.
> 
> There hasn't been. i915-gem team decided "this is a cool concept", which
> resulted in a security bug. Now we're a few months in arguing whether a
> cool-looking concept that leads to a security bug is maybe a good idea,
> and whether we should sneak it in as a core concept to dma-buf.h without
> any wider discussion on the concept.
> 
>> Note that it adds almost nothing to the dma-buf common code about a single
>> flag, and at some point (currently missing) documentation on the very flag.
> 
> This is really not how upstream collaboration works, and it needs to stop.
> 
> If you want this, start another thread arguing why this is a good idea,
> fully decoupled from the security fix here.

When I asked you whether you know there were past discussions on this 
topic, clearly the point of that was to figure out whether a new 
discussion needs to be started, or I need to go and read an existing one 
to get up to speed.

I don't know how you interpreted that as an attempt to sneak anything 
in. And I don't know how I could have reliably figured out the answer to 
that question without asking. So colour me confused.

To clarify on the security issue part - are you talking about 
https://gitlab.freedesktop.org/drm/intel/-/issues/3080, or the other 
security issue, the one which would be caused by simply reverting the 
error propagation in i915?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] drm/i915: Stop propagating fence errors by default
  2021-05-17 15:33         ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-17 15:51           ` Daniel Vetter
  -1 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2021-05-17 15:51 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Jason Ekstrand, intel-gfx, Marcin Slusarz, dri-devel, Tvrtko Ursulin

On Mon, May 17, 2021 at 5:33 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> On 17/05/2021 16:12, Daniel Vetter wrote:
> > On Tue, May 11, 2021 at 10:05:27AM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 10/05/2021 16:55, Daniel Vetter wrote:
> >>> On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> This is an alternative proposed fix for the below references bug report
> >>>> where dma fence error propagation is causing undesirable change in
> >>>> behaviour post GPU hang/reset.
> >>>>
> >>>> Approach in this patch is to simply stop propagating all dma fence errors
> >>>> by default since that seems to be the upstream ask.
> >>>>
> >>>> To handle the case where i915 needs error propagation for security, I add
> >>>> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
> >>>> the command parsing chain only.
> >>>>
> >>>> It sounds a plausible argument that fence propagation could be useful in
> >>>> which case a core flag to enable opt-in should be universally useful.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
> >>>> Reported-by: Miroslav Bendik
> >>>> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
> >>>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> >>>> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
> >>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
> >>>>    drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
> >>>>    drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
> >>>>    include/linux/dma-fence.h                      | 1 +
> >>>
> >>> I still don't like this, least because we still introduce the concept of
> >>> error propagation to dma-fence (but hey only in i915 code, which is
> >>> exactly the kind of not-really-upstream approach we got a major chiding
> >>> for).
> >>>
> >>> The only thing this does is make it explicitly opt-in instead opt-out,
> >>> like the first fix. The right approach is imo still to just throw it out,
> >>> and instead make the one error propagation we really need very, very
> >>> explicit. Instead of hiding it behind lots of magic.
> >>>
> >>> The one error propagation we need is when the cmd parser work fails, it
> >>> must cancel it's corresponding request to make sure the batchbuffer
> >>> doesn't run. This should require about 2 lines in total:
> >>>
> >>> - one line to store the request so that the cmd parser work can access it.
> >>>     No refcounting needed, because the the request cannot even start (much
> >>>     less get freed) before the cmd parser has singalled its fence
> >>>
> >>> - one line to kill the request if the parsing fails. Maybe 2 if you
> >>>     include the if condition. I have no idea how that's done since I'm
> >>>     honestly lost how the i915 scheduler decides whether to run a batch or
> >>>     not. I'm guessing we have a version of this for the ringbuffer and the
> >>>     execlist backend (if not maybe gen7 cmdparser is broken?)
> >>>
> >>> I don't see any need for magic behind-the-scenes propagation of such a
> >>> security critical error. Especially when that error propagation thing
> >>> caused security bugs of its own, is an i915-only feature, and not
> >>> motivated by any userspace/uapi requirements at all.
> >>
> >> I took this approach because to me propagating errors sounds more logical
> >> than ignoring them and I was arguing in the commit message that the
> >> infrastructure to enable that could be put in place as opt-in.
> >>
> >> I also do not see a lot of magic in this patch. Only thing, potentially the
> >> logic should be inverted so that the waiter marks itself as interested in
> >> receiving errors. That would probably make even more sense as a core
> >> concept.
> >>
> >> Has there been a wider discussion on this topic in the past? I am curious to
> >> know, even if propagation currently is i915 only, could other drivers be
> >> interested.
> >
> > There hasn't been. i915-gem team decided "this is a cool concept", which
> > resulted in a security bug. Now we're a few months in arguing whether a
> > cool-looking concept that leads to a security bug is maybe a good idea,
> > and whether we should sneak it in as a core concept to dma-buf.h without
> > any wider discussion on the concept.
> >
> >> Note that it adds almost nothing to the dma-buf common code about a single
> >> flag, and at some point (currently missing) documentation on the very flag.
> >
> > This is really not how upstream collaboration works, and it needs to stop.
> >
> > If you want this, start another thread arguing why this is a good idea,
> > fully decoupled from the security fix here.
>
> When I asked you whether you know there were past discussions on this
> topic, clearly the point of that was to figure out whether a new
> discussion needs to be started, or I need to go and read an existing one
> to get up to speed.
>
> I don't know how you interpreted that as an attempt to sneak anything
> in. And I don't know how I could have reliably figured out the answer to
> that question without asking. So colour me confused.
>
> To clarify on the security issue part - are you talking about
> https://gitlab.freedesktop.org/drm/intel/-/issues/3080, or the other
> security issue, the one which would be caused by simply reverting the
> error propagation in i915?

Both.

But what I really don't get is why we keep defending a "it's cool"
type of concept that accidentally caused a security bug as a
consequence that neither author nor reviewer managed to forsee.

If it accidentally causes a security bug and is otherwise not
justified by anything, it's not cool. There isn't anything left to
argue about here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Stop propagating fence errors by default
@ 2021-05-17 15:51           ` Daniel Vetter
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2021-05-17 15:51 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Jason Ekstrand, intel-gfx, dri-devel

On Mon, May 17, 2021 at 5:33 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> On 17/05/2021 16:12, Daniel Vetter wrote:
> > On Tue, May 11, 2021 at 10:05:27AM +0100, Tvrtko Ursulin wrote:
> >>
> >> On 10/05/2021 16:55, Daniel Vetter wrote:
> >>> On Fri, May 07, 2021 at 09:35:21AM +0100, Tvrtko Ursulin wrote:
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> This is an alternative proposed fix for the below references bug report
> >>>> where dma fence error propagation is causing undesirable change in
> >>>> behaviour post GPU hang/reset.
> >>>>
> >>>> Approach in this patch is to simply stop propagating all dma fence errors
> >>>> by default since that seems to be the upstream ask.
> >>>>
> >>>> To handle the case where i915 needs error propagation for security, I add
> >>>> a new dma fence flag DMA_FENCE_FLAG_PROPAGATE_ERROR and make use of it in
> >>>> the command parsing chain only.
> >>>>
> >>>> It sounds a plausible argument that fence propagation could be useful in
> >>>> which case a core flag to enable opt-in should be universally useful.
> >>>>
> >>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>> Reported-by: Marcin Slusarz <marcin.slusarz@intel.com>
> >>>> Reported-by: Miroslav Bendik
> >>>> References: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
> >>>> References: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
> >>>> Cc: Jason Ekstrand <jason.ekstrand@intel.com>
> >>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 ++
> >>>>    drivers/gpu/drm/i915/i915_sw_fence.c           | 8 ++++----
> >>>>    drivers/gpu/drm/i915/i915_sw_fence.h           | 8 ++++++++
> >>>>    include/linux/dma-fence.h                      | 1 +
> >>>
> >>> I still don't like this, least because we still introduce the concept of
> >>> error propagation to dma-fence (but hey only in i915 code, which is
> >>> exactly the kind of not-really-upstream approach we got a major chiding
> >>> for).
> >>>
> >>> The only thing this does is make it explicitly opt-in instead opt-out,
> >>> like the first fix. The right approach is imo still to just throw it out,
> >>> and instead make the one error propagation we really need very, very
> >>> explicit. Instead of hiding it behind lots of magic.
> >>>
> >>> The one error propagation we need is when the cmd parser work fails, it
> >>> must cancel it's corresponding request to make sure the batchbuffer
> >>> doesn't run. This should require about 2 lines in total:
> >>>
> >>> - one line to store the request so that the cmd parser work can access it.
> >>>     No refcounting needed, because the the request cannot even start (much
> >>>     less get freed) before the cmd parser has singalled its fence
> >>>
> >>> - one line to kill the request if the parsing fails. Maybe 2 if you
> >>>     include the if condition. I have no idea how that's done since I'm
> >>>     honestly lost how the i915 scheduler decides whether to run a batch or
> >>>     not. I'm guessing we have a version of this for the ringbuffer and the
> >>>     execlist backend (if not maybe gen7 cmdparser is broken?)
> >>>
> >>> I don't see any need for magic behind-the-scenes propagation of such a
> >>> security critical error. Especially when that error propagation thing
> >>> caused security bugs of its own, is an i915-only feature, and not
> >>> motivated by any userspace/uapi requirements at all.
> >>
> >> I took this approach because to me propagating errors sounds more logical
> >> than ignoring them and I was arguing in the commit message that the
> >> infrastructure to enable that could be put in place as opt-in.
> >>
> >> I also do not see a lot of magic in this patch. Only thing, potentially the
> >> logic should be inverted so that the waiter marks itself as interested in
> >> receiving errors. That would probably make even more sense as a core
> >> concept.
> >>
> >> Has there been a wider discussion on this topic in the past? I am curious to
> >> know, even if propagation currently is i915 only, could other drivers be
> >> interested.
> >
> > There hasn't been. i915-gem team decided "this is a cool concept", which
> > resulted in a security bug. Now we're a few months in arguing whether a
> > cool-looking concept that leads to a security bug is maybe a good idea,
> > and whether we should sneak it in as a core concept to dma-buf.h without
> > any wider discussion on the concept.
> >
> >> Note that it adds almost nothing to the dma-buf common code about a single
> >> flag, and at some point (currently missing) documentation on the very flag.
> >
> > This is really not how upstream collaboration works, and it needs to stop.
> >
> > If you want this, start another thread arguing why this is a good idea,
> > fully decoupled from the security fix here.
>
> When I asked you whether you know there were past discussions on this
> topic, clearly the point of that was to figure out whether a new
> discussion needs to be started, or I need to go and read an existing one
> to get up to speed.
>
> I don't know how you interpreted that as an attempt to sneak anything
> in. And I don't know how I could have reliably figured out the answer to
> that question without asking. So colour me confused.
>
> To clarify on the security issue part - are you talking about
> https://gitlab.freedesktop.org/drm/intel/-/issues/3080, or the other
> security issue, the one which would be caused by simply reverting the
> error propagation in i915?

Both.

But what I really don't get is why we keep defending a "it's cool"
type of concept that accidentally caused a security bug as a
consequence that neither author nor reviewer managed to forsee.

If it accidentally causes a security bug and is otherwise not
justified by anything, it's not cool. There isn't anything left to
argue about here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-05-17 15:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-07  8:35 [PATCH] drm/i915: Stop propagating fence errors by default Tvrtko Ursulin
2021-05-07  8:35 ` [Intel-gfx] " Tvrtko Ursulin
2021-05-07  9:29 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2021-05-07  9:58 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-05-07 11:38 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2021-05-10  1:36 ` [Intel-gfx] [PATCH] " Jason Ekstrand
2021-05-10  1:36   ` Jason Ekstrand
2021-05-10 15:55 ` Daniel Vetter
2021-05-10 15:55   ` [Intel-gfx] " Daniel Vetter
2021-05-11  9:05   ` Tvrtko Ursulin
2021-05-11  9:05     ` [Intel-gfx] " Tvrtko Ursulin
2021-05-17 15:12     ` Daniel Vetter
2021-05-17 15:12       ` [Intel-gfx] " Daniel Vetter
2021-05-17 15:33       ` Tvrtko Ursulin
2021-05-17 15:33         ` [Intel-gfx] " Tvrtko Ursulin
2021-05-17 15:51         ` Daniel Vetter
2021-05-17 15:51           ` [Intel-gfx] " Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.