All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Fix TLB invalidate issues with Broadwell
@ 2022-06-15 15:27 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Jason Ekstrand, David Airlie, dri-devel, Daniele Ceraolo Spurio,
	Fei Yang, Matthew Brost, Chris Wilson, Matthew Auld, Andi Shyti,
	Dave Airlie, Thomas Hellström, Lucas De Marchi, intel-gfx,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	tvrtko.ursulin, mauro.chehab, Michał Winiarski,
	linux-kernel, Bruce Chang, Tejas Upadhyay, Umesh Nerlige Ramappa,
	John Harrison

i915 selftest hangcheck is causing the i915 driver timeouts, as reported
by Intel CI bot:

	http://gfx-ci.fi.intel.com/cibuglog-ng/issuefilterassoc/24297?query_key=42a999f48fa6ecce068bc8126c069be7c31153b4

When such test runs, the only output is:

	[   68.811639] i915: Performing live selftests with st_random_seed=0xe138eac7 st_timeout=500
	[   68.811792] i915: Running hangcheck
	[   68.811859] i915: Running intel_hangcheck_live_selftests/igt_hang_sanitycheck
	[   68.816910] i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
	[   68.841597] i915: Running intel_hangcheck_live_selftests/igt_reset_nop
	[   69.346347] igt_reset_nop: 80 resets
	[   69.362695] i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine
	[   69.863559] igt_reset_nop_engine(rcs0): 709 resets
	[   70.364924] igt_reset_nop_engine(bcs0): 903 resets
	[   70.866005] igt_reset_nop_engine(vcs0): 659 resets
	[   71.367934] igt_reset_nop_engine(vcs1): 549 resets
	[   71.869259] igt_reset_nop_engine(vecs0): 553 resets
	[   71.882592] i915: Running intel_hangcheck_live_selftests/igt_reset_idle_engine
	[   72.383554] rcs0: Completed 16605 idle resets
	[   72.884599] bcs0: Completed 18641 idle resets
	[   73.385592] vcs0: Completed 17517 idle resets
	[   73.886658] vcs1: Completed 15474 idle resets
	[   74.387600] vecs0: Completed 17983 idle resets
	[   74.387667] i915: Running intel_hangcheck_live_selftests/igt_reset_active_engine
	[   74.889017] rcs0: Completed 747 active resets
	[   75.174240] intel_engine_reset(bcs0) failed, err:-110
	[   75.174301] bcs0: Completed 525 active resets

After that, the machine just silently hangs.

Bisecting the issue, the patch that introduced the regression is:

    7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reverting it fix the issues, but introduce other problems, as TLB
won't be invalidated anymore. So, instead, let's fix the root cause.

It turns that the TLB flush logic ends conflicting with i915 reset,
which is called during selftest hangcheck. So, the TLB cache should
be serialized, but other TLB fix patches are required for this one
to work.

Tested on an Intel NUC5i7RYB with an i7-5557U Broadwell CPU.

Chris Wilson (6):
  drm/i915/gt: Ignore TLB invalidations on idle engines
  drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  drm/i915/gt: Skip TLB invalidations once wedged
  drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  drm/i915/gt: Serialize GRDOM access between multiple engine resets
  drm/i915/gt: Serialize TLB invalidates with GT resets

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++---
 drivers/gpu/drm/i915/gt/intel_gt.c        | 43 +++++++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 ++
 drivers/gpu/drm/i915/gt/intel_reset.c     | 37 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_vma.c           |  3 +-
 5 files changed, 75 insertions(+), 21 deletions(-)

-- 
2.36.1



^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Intel-gfx] [PATCH 0/6] Fix TLB invalidate issues with Broadwell
@ 2022-06-15 15:27 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, Lucas De Marchi, intel-gfx,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	mauro.chehab, Michał Winiarski, linux-kernel,
	Tejas Upadhyay

i915 selftest hangcheck is causing the i915 driver timeouts, as reported
by Intel CI bot:

	http://gfx-ci.fi.intel.com/cibuglog-ng/issuefilterassoc/24297?query_key=42a999f48fa6ecce068bc8126c069be7c31153b4

When such test runs, the only output is:

	[   68.811639] i915: Performing live selftests with st_random_seed=0xe138eac7 st_timeout=500
	[   68.811792] i915: Running hangcheck
	[   68.811859] i915: Running intel_hangcheck_live_selftests/igt_hang_sanitycheck
	[   68.816910] i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
	[   68.841597] i915: Running intel_hangcheck_live_selftests/igt_reset_nop
	[   69.346347] igt_reset_nop: 80 resets
	[   69.362695] i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine
	[   69.863559] igt_reset_nop_engine(rcs0): 709 resets
	[   70.364924] igt_reset_nop_engine(bcs0): 903 resets
	[   70.866005] igt_reset_nop_engine(vcs0): 659 resets
	[   71.367934] igt_reset_nop_engine(vcs1): 549 resets
	[   71.869259] igt_reset_nop_engine(vecs0): 553 resets
	[   71.882592] i915: Running intel_hangcheck_live_selftests/igt_reset_idle_engine
	[   72.383554] rcs0: Completed 16605 idle resets
	[   72.884599] bcs0: Completed 18641 idle resets
	[   73.385592] vcs0: Completed 17517 idle resets
	[   73.886658] vcs1: Completed 15474 idle resets
	[   74.387600] vecs0: Completed 17983 idle resets
	[   74.387667] i915: Running intel_hangcheck_live_selftests/igt_reset_active_engine
	[   74.889017] rcs0: Completed 747 active resets
	[   75.174240] intel_engine_reset(bcs0) failed, err:-110
	[   75.174301] bcs0: Completed 525 active resets

After that, the machine just silently hangs.

Bisecting the issue, the patch that introduced the regression is:

    7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reverting it fix the issues, but introduce other problems, as TLB
won't be invalidated anymore. So, instead, let's fix the root cause.

It turns that the TLB flush logic ends conflicting with i915 reset,
which is called during selftest hangcheck. So, the TLB cache should
be serialized, but other TLB fix patches are required for this one
to work.

Tested on an Intel NUC5i7RYB with an i7-5557U Broadwell CPU.

Chris Wilson (6):
  drm/i915/gt: Ignore TLB invalidations on idle engines
  drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  drm/i915/gt: Skip TLB invalidations once wedged
  drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  drm/i915/gt: Serialize GRDOM access between multiple engine resets
  drm/i915/gt: Serialize TLB invalidates with GT resets

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++---
 drivers/gpu/drm/i915/gt/intel_gt.c        | 43 +++++++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 ++
 drivers/gpu/drm/i915/gt/intel_reset.c     | 37 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_vma.c           |  3 +-
 5 files changed, 75 insertions(+), 21 deletions(-)

-- 
2.36.1



^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 0/6] Fix TLB invalidate issues with Broadwell
@ 2022-06-15 15:27 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Chris Wilson, tvrtko.ursulin, Fei Yang,
	Thomas Hellstrom, mauro.chehab, Michał Winiarski,
	Thomas Hellström, Andi Shyti, Bruce Chang, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Jason Ekstrand, John Harrison, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost,
	Ramalingam C, Rodrigo Vivi, Tejas Upadhyay,
	Umesh Nerlige Ramappa, dri-devel, intel-gfx, linux-kernel

i915 selftest hangcheck is causing the i915 driver timeouts, as reported
by Intel CI bot:

	http://gfx-ci.fi.intel.com/cibuglog-ng/issuefilterassoc/24297?query_key=42a999f48fa6ecce068bc8126c069be7c31153b4

When such test runs, the only output is:

	[   68.811639] i915: Performing live selftests with st_random_seed=0xe138eac7 st_timeout=500
	[   68.811792] i915: Running hangcheck
	[   68.811859] i915: Running intel_hangcheck_live_selftests/igt_hang_sanitycheck
	[   68.816910] i915 0000:00:02.0: [drm] Cannot find any crtc or sizes
	[   68.841597] i915: Running intel_hangcheck_live_selftests/igt_reset_nop
	[   69.346347] igt_reset_nop: 80 resets
	[   69.362695] i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine
	[   69.863559] igt_reset_nop_engine(rcs0): 709 resets
	[   70.364924] igt_reset_nop_engine(bcs0): 903 resets
	[   70.866005] igt_reset_nop_engine(vcs0): 659 resets
	[   71.367934] igt_reset_nop_engine(vcs1): 549 resets
	[   71.869259] igt_reset_nop_engine(vecs0): 553 resets
	[   71.882592] i915: Running intel_hangcheck_live_selftests/igt_reset_idle_engine
	[   72.383554] rcs0: Completed 16605 idle resets
	[   72.884599] bcs0: Completed 18641 idle resets
	[   73.385592] vcs0: Completed 17517 idle resets
	[   73.886658] vcs1: Completed 15474 idle resets
	[   74.387600] vecs0: Completed 17983 idle resets
	[   74.387667] i915: Running intel_hangcheck_live_selftests/igt_reset_active_engine
	[   74.889017] rcs0: Completed 747 active resets
	[   75.174240] intel_engine_reset(bcs0) failed, err:-110
	[   75.174301] bcs0: Completed 525 active resets

After that, the machine just silently hangs.

Bisecting the issue, the patch that introduced the regression is:

    7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reverting it fix the issues, but introduce other problems, as TLB
won't be invalidated anymore. So, instead, let's fix the root cause.

It turns that the TLB flush logic ends conflicting with i915 reset,
which is called during selftest hangcheck. So, the TLB cache should
be serialized, but other TLB fix patches are required for this one
to work.

Tested on an Intel NUC5i7RYB with an i7-5557U Broadwell CPU.

Chris Wilson (6):
  drm/i915/gt: Ignore TLB invalidations on idle engines
  drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  drm/i915/gt: Skip TLB invalidations once wedged
  drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  drm/i915/gt: Serialize GRDOM access between multiple engine resets
  drm/i915/gt: Serialize TLB invalidates with GT resets

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++---
 drivers/gpu/drm/i915/gt/intel_gt.c        | 43 +++++++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 ++
 drivers/gpu/drm/i915/gt/intel_reset.c     | 37 ++++++++++++++-----
 drivers/gpu/drm/i915/i915_vma.c           |  3 +-
 5 files changed, 75 insertions(+), 21 deletions(-)

-- 
2.36.1



^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Jason Ekstrand, David Airlie, dri-devel, Daniele Ceraolo Spurio,
	Fei Yang, Matthew Brost, Chris Wilson, Matthew Auld, Andi Shyti,
	Dave Airlie, Thomas Hellström, Lucas De Marchi, intel-gfx,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	Tvrtko Ursulin, mauro.chehab, Michał Winiarski,
	linux-kernel, stable, John Harrison

From: Chris Wilson <chris.p.wilson@intel.com>

As an extension of the current skip TLB invalidations,
check if the device is powered down prior to any engine activity,

as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed.

This becomes more significant  with GuC, as it can only do so when
the connection to the GuC is awake.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++++----
 drivers/gpu/drm/i915/gt/intel_gt.c        | 26 +++++++++++++++++------
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 97c820eee115..6835279943df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -6,14 +6,15 @@
 
 #include <drm/drm_cache.h>
 
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+
 #include "i915_drv.h"
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
-#include "gt/intel_gt.h"
-
 void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 				 struct sg_table *pages,
 				 unsigned int sg_page_sizes)
@@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 
 	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct intel_gt *gt = to_gt(i915);
 		intel_wakeref_t wakeref;
 
-		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
-			intel_gt_invalidate_tlbs(to_gt(i915));
+		with_intel_gt_pm_if_awake(gt, wakeref)
+			intel_gt_invalidate_tlbs(gt);
 	}
 
 	return pages;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f33290358c51..d5ed6a6ac67c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,6 +11,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
@@ -1216,6 +1217,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
 	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
 	enum intel_engine_id id;
 	const i915_reg_t *regs;
 	unsigned int num = 0;
@@ -1239,12 +1241,27 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 
 	GEM_TRACE("\n");
 
-	assert_rpm_wakelock_held(&i915->runtime_pm);
-
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
+	awake = 0;
 	for_each_engine(engine, gt, id) {
+		struct reg_and_bit rb;
+
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+		if (!i915_mmio_reg_offset(rb.reg))
+			continue;
+
+		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
+	}
+
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
 		/*
 		 * HW architecture suggest typical invalidation time at 40us,
 		 * with pessimistic cases up to 100us and a recommendation to
@@ -1252,13 +1269,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		 */
 		const unsigned int timeout_us = 100;
 		const unsigned int timeout_ms = 4;
-		struct reg_and_bit rb;
 
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
-		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
 		if (__intel_wait_for_register_fw(uncore,
 						 rb.reg, rb.bit, 0,
 						 timeout_us, timeout_ms,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index bc898df7a48c..a334787a4939 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Intel-gfx] [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, Lucas De Marchi, intel-gfx,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	mauro.chehab, Michał Winiarski, linux-kernel, stable

From: Chris Wilson <chris.p.wilson@intel.com>

As an extension of the current skip TLB invalidations,
check if the device is powered down prior to any engine activity,

as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed.

This becomes more significant  with GuC, as it can only do so when
the connection to the GuC is awake.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++++----
 drivers/gpu/drm/i915/gt/intel_gt.c        | 26 +++++++++++++++++------
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 97c820eee115..6835279943df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -6,14 +6,15 @@
 
 #include <drm/drm_cache.h>
 
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+
 #include "i915_drv.h"
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
-#include "gt/intel_gt.h"
-
 void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 				 struct sg_table *pages,
 				 unsigned int sg_page_sizes)
@@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 
 	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct intel_gt *gt = to_gt(i915);
 		intel_wakeref_t wakeref;
 
-		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
-			intel_gt_invalidate_tlbs(to_gt(i915));
+		with_intel_gt_pm_if_awake(gt, wakeref)
+			intel_gt_invalidate_tlbs(gt);
 	}
 
 	return pages;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f33290358c51..d5ed6a6ac67c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,6 +11,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
@@ -1216,6 +1217,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
 	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
 	enum intel_engine_id id;
 	const i915_reg_t *regs;
 	unsigned int num = 0;
@@ -1239,12 +1241,27 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 
 	GEM_TRACE("\n");
 
-	assert_rpm_wakelock_held(&i915->runtime_pm);
-
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
+	awake = 0;
 	for_each_engine(engine, gt, id) {
+		struct reg_and_bit rb;
+
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+		if (!i915_mmio_reg_offset(rb.reg))
+			continue;
+
+		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
+	}
+
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
 		/*
 		 * HW architecture suggest typical invalidation time at 40us,
 		 * with pessimistic cases up to 100us and a recommendation to
@@ -1252,13 +1269,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		 */
 		const unsigned int timeout_us = 100;
 		const unsigned int timeout_ms = 4;
-		struct reg_and_bit rb;
 
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
-		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
 		if (__intel_wait_for_register_fw(uncore,
 						 rb.reg, rb.bit, 0,
 						 timeout_us, timeout_ms,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index bc898df7a48c..a334787a4939 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Jason Ekstrand, John Harrison, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost,
	Mauro Carvalho Chehab, Ramalingam C, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, mauro.chehab,
	stable

From: Chris Wilson <chris.p.wilson@intel.com>

As an extension of the current skip TLB invalidations,
check if the device is powered down prior to any engine activity,

as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed.

This becomes more significant  with GuC, as it can only do so when
the connection to the GuC is awake.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++++----
 drivers/gpu/drm/i915/gt/intel_gt.c        | 26 +++++++++++++++++------
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 97c820eee115..6835279943df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -6,14 +6,15 @@
 
 #include <drm/drm_cache.h>
 
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+
 #include "i915_drv.h"
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
-#include "gt/intel_gt.h"
-
 void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 				 struct sg_table *pages,
 				 unsigned int sg_page_sizes)
@@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 
 	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct intel_gt *gt = to_gt(i915);
 		intel_wakeref_t wakeref;
 
-		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
-			intel_gt_invalidate_tlbs(to_gt(i915));
+		with_intel_gt_pm_if_awake(gt, wakeref)
+			intel_gt_invalidate_tlbs(gt);
 	}
 
 	return pages;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f33290358c51..d5ed6a6ac67c 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,6 +11,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
@@ -1216,6 +1217,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
 	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
 	enum intel_engine_id id;
 	const i915_reg_t *regs;
 	unsigned int num = 0;
@@ -1239,12 +1241,27 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 
 	GEM_TRACE("\n");
 
-	assert_rpm_wakelock_held(&i915->runtime_pm);
-
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
+	awake = 0;
 	for_each_engine(engine, gt, id) {
+		struct reg_and_bit rb;
+
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+		if (!i915_mmio_reg_offset(rb.reg))
+			continue;
+
+		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
+	}
+
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
 		/*
 		 * HW architecture suggest typical invalidation time at 40us,
 		 * with pessimistic cases up to 100us and a recommendation to
@@ -1252,13 +1269,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		 */
 		const unsigned int timeout_us = 100;
 		const unsigned int timeout_ms = 4;
-		struct reg_and_bit rb;
 
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
-		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
 		if (__intel_wait_for_register_fw(uncore,
 						 rb.reg, rb.bit, 0,
 						 timeout_us, timeout_ms,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index bc898df7a48c..a334787a4939 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang,
	Chris Wilson, Matthew Auld, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	Tvrtko Ursulin, mauro.chehab, Michał Winiarski,
	linux-kernel, stable, John Harrison

From: Chris Wilson <chris.p.wilson@intel.com>

On gen12 HW, ensure that the TLB of the OA unit is also invalidated
as just invalidating the TLB of an engine is not enough.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d5ed6a6ac67c..61b7ec5118f9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -10,6 +10,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
@@ -1259,6 +1260,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
 	for_each_engine_masked(engine, gt, awake, tmp) {
 		struct reg_and_bit rb;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Intel-gfx] [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	mauro.chehab, Michał Winiarski, linux-kernel, stable

From: Chris Wilson <chris.p.wilson@intel.com>

On gen12 HW, ensure that the TLB of the OA unit is also invalidated
as just invalidating the TLB of an engine is not enough.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d5ed6a6ac67c..61b7ec5118f9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -10,6 +10,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
@@ -1259,6 +1260,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
 	for_each_engine_masked(engine, gt, awake, tmp) {
 		struct reg_and_bit rb;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Matthew Auld, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, mauro.chehab,
	stable, Thomas Hellström, Mauro Carvalho Chehab

From: Chris Wilson <chris.p.wilson@intel.com>

On gen12 HW, ensure that the TLB of the OA unit is also invalidated
as just invalidating the TLB of an engine is not enough.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index d5ed6a6ac67c..61b7ec5118f9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -10,6 +10,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
@@ -1259,6 +1260,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
 	for_each_engine_masked(engine, gt, awake, tmp) {
 		struct reg_and_bit rb;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang,
	Chris Wilson, Matthew Auld, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	Tvrtko Ursulin, mauro.chehab, Michał Winiarski,
	linux-kernel, stable, John Harrison

From: Chris Wilson <chris.p.wilson@intel.com>

Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 61b7ec5118f9..fb4fd5273ca4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
 		return;
 
+	if (intel_gt_is_wedged(gt))
+		return;
+
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Intel-gfx] [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	mauro.chehab, Michał Winiarski, linux-kernel, stable

From: Chris Wilson <chris.p.wilson@intel.com>

Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 61b7ec5118f9..fb4fd5273ca4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
 		return;
 
+	if (intel_gt_is_wedged(gt))
+		return;
+
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Matthew Auld, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, mauro.chehab,
	stable, Thomas Hellström, Mauro Carvalho Chehab

From: Chris Wilson <chris.p.wilson@intel.com>

Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 61b7ec5118f9..fb4fd5273ca4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
 		return;
 
+	if (intel_gt_is_wedged(gt))
+		return;
+
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Tvrtko Ursulin, mauro.chehab, Fei Yang, Thomas Hellström,
	David Airlie, dri-devel, linux-kernel, Chris Wilson,
	Thomas Hellstrom, Rodrigo Vivi, Andi Shyti, Dave Airlie, stable,
	Mauro Carvalho Chehab, intel-gfx

From: Chris Wilson <chris.p.wilson@intel.com>

Don't flush TLBs when the buffer is only used in the GGTT under full
control of the kernel, as there's no risk of of concurrent access
and stale access from prefetch.

We only need to invalidate the TLB if they are accessible by the user.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_vma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 0bffb70b3c5f..7989986161e8 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -537,7 +537,8 @@ int i915_vma_bind(struct i915_vma *vma,
 				   bind_flags);
 	}
 
-	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
+	if (bind_flags & I915_VMA_LOCAL_BIND)
+		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
 
 	atomic_or(bind_flags, &vma->flags);
 	return 0;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Intel-gfx] [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: mauro.chehab, Thomas Hellström, David Airlie, dri-devel,
	linux-kernel, Chris Wilson, Thomas Hellstrom, Rodrigo Vivi,
	Dave Airlie, stable, Mauro Carvalho Chehab, intel-gfx

From: Chris Wilson <chris.p.wilson@intel.com>

Don't flush TLBs when the buffer is only used in the GGTT under full
control of the kernel, as there's no risk of of concurrent access
and stale access from prefetch.

We only need to invalidate the TLB if they are accessible by the user.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_vma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 0bffb70b3c5f..7989986161e8 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -537,7 +537,8 @@ int i915_vma_bind(struct i915_vma *vma,
 				   bind_flags);
 	}
 
-	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
+	if (bind_flags & I915_VMA_LOCAL_BIND)
+		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
 
 	atomic_or(bind_flags, &vma->flags);
 	return 0;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Chris Wilson, Fei Yang, Thomas Hellstrom, Daniel Vetter,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel,
	mauro.chehab, Andi Shyti, stable, Thomas Hellström,
	Mauro Carvalho Chehab

From: Chris Wilson <chris.p.wilson@intel.com>

Don't flush TLBs when the buffer is only used in the GGTT under full
control of the kernel, as there's no risk of of concurrent access
and stale access from prefetch.

We only need to invalidate the TLB if they are accessible by the user.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_vma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 0bffb70b3c5f..7989986161e8 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -537,7 +537,8 @@ int i915_vma_bind(struct i915_vma *vma,
 				   bind_flags);
 	}
 
-	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
+	if (bind_flags & I915_VMA_LOCAL_BIND)
+		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
 
 	atomic_or(bind_flags, &vma->flags);
 	return 0;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Chris Wilson, Fei Yang, Matthew Brost,
	Mika Kuoppala, Chris Wilson, Dave Airlie, Thomas Hellström,
	Andi Shyti, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, Tvrtko Ursulin, mauro.chehab,
	linux-kernel, stable, Bruce Chang, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison

From: Chris Wilson <chris.p.wilson@intel.com>

Don't allow two engines to be reset in parallel, as they would both
try to select a reset bit (and send requests to common registers)
and wait on that register, at the same time. Serialize control of
the reset requests/acks using the uncore->lock, which will also ensure
that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index a5338c3fde7a..c68d36fb5bbd 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
 	return err;
 }
 
-static int gen6_reset_engines(struct intel_gt *gt,
-			      intel_engine_mask_t engine_mask,
-			      unsigned int retry)
+static int __gen6_reset_engines(struct intel_gt *gt,
+				intel_engine_mask_t engine_mask,
+				unsigned int retry)
 {
 	struct intel_engine_cs *engine;
 	u32 hw_mask;
@@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt,
 	return gen6_hw_domain_reset(gt, hw_mask);
 }
 
+static int gen6_reset_engines(struct intel_gt *gt,
+			      intel_engine_mask_t engine_mask,
+			      unsigned int retry)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&gt->uncore->lock, flags);
+	ret = __gen6_reset_engines(gt, engine_mask, retry);
+	spin_unlock_irqrestore(&gt->uncore->lock, flags);
+
+	return ret;
+}
+
 static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine)
 {
 	int vecs_id;
@@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine)
 	rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit);
 }
 
-static int gen11_reset_engines(struct intel_gt *gt,
-			       intel_engine_mask_t engine_mask,
-			       unsigned int retry)
+static int __gen11_reset_engines(struct intel_gt *gt,
+				 intel_engine_mask_t engine_mask,
+				 unsigned int retry)
 {
 	struct intel_engine_cs *engine;
 	intel_engine_mask_t tmp;
@@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt,
 	struct intel_engine_cs *engine;
 	const bool reset_non_ready = retry >= 1;
 	intel_engine_mask_t tmp;
+	unsigned long flags;
 	int ret;
 
+	spin_lock_irqsave(&gt->uncore->lock, flags);
+
 	for_each_engine_masked(engine, gt, engine_mask, tmp) {
 		ret = gen8_engine_reset_prepare(engine);
 		if (ret && !reset_non_ready)
@@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt,
 	 * This is best effort, so ignore any error from the initial reset.
 	 */
 	if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES)
-		gen11_reset_engines(gt, gt->info.engine_mask, 0);
+		__gen11_reset_engines(gt, gt->info.engine_mask, 0);
 
 	if (GRAPHICS_VER(gt->i915) >= 11)
-		ret = gen11_reset_engines(gt, engine_mask, retry);
+		ret = __gen11_reset_engines(gt, engine_mask, retry);
 	else
-		ret = gen6_reset_engines(gt, engine_mask, retry);
+		ret = __gen6_reset_engines(gt, engine_mask, retry);
 
 skip_reset:
 	for_each_engine_masked(engine, gt, engine_mask, tmp)
 		gen8_engine_reset_cancel(engine);
 
+	spin_unlock_irqrestore(&gt->uncore->lock, flags);
+
 	return ret;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Chris Wilson, Chris Wilson, Dave Airlie,
	Thomas Hellström, Andi Shyti, intel-gfx, Thomas Hellstrom,
	Rodrigo Vivi, Mauro Carvalho Chehab, mauro.chehab, linux-kernel,
	stable, Tejas Upadhyay

From: Chris Wilson <chris.p.wilson@intel.com>

Don't allow two engines to be reset in parallel, as they would both
try to select a reset bit (and send requests to common registers)
and wait on that register, at the same time. Serialize control of
the reset requests/acks using the uncore->lock, which will also ensure
that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index a5338c3fde7a..c68d36fb5bbd 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
 	return err;
 }
 
-static int gen6_reset_engines(struct intel_gt *gt,
-			      intel_engine_mask_t engine_mask,
-			      unsigned int retry)
+static int __gen6_reset_engines(struct intel_gt *gt,
+				intel_engine_mask_t engine_mask,
+				unsigned int retry)
 {
 	struct intel_engine_cs *engine;
 	u32 hw_mask;
@@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt,
 	return gen6_hw_domain_reset(gt, hw_mask);
 }
 
+static int gen6_reset_engines(struct intel_gt *gt,
+			      intel_engine_mask_t engine_mask,
+			      unsigned int retry)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&gt->uncore->lock, flags);
+	ret = __gen6_reset_engines(gt, engine_mask, retry);
+	spin_unlock_irqrestore(&gt->uncore->lock, flags);
+
+	return ret;
+}
+
 static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine)
 {
 	int vecs_id;
@@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine)
 	rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit);
 }
 
-static int gen11_reset_engines(struct intel_gt *gt,
-			       intel_engine_mask_t engine_mask,
-			       unsigned int retry)
+static int __gen11_reset_engines(struct intel_gt *gt,
+				 intel_engine_mask_t engine_mask,
+				 unsigned int retry)
 {
 	struct intel_engine_cs *engine;
 	intel_engine_mask_t tmp;
@@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt,
 	struct intel_engine_cs *engine;
 	const bool reset_non_ready = retry >= 1;
 	intel_engine_mask_t tmp;
+	unsigned long flags;
 	int ret;
 
+	spin_lock_irqsave(&gt->uncore->lock, flags);
+
 	for_each_engine_masked(engine, gt, engine_mask, tmp) {
 		ret = gen8_engine_reset_prepare(engine);
 		if (ret && !reset_non_ready)
@@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt,
 	 * This is best effort, so ignore any error from the initial reset.
 	 */
 	if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES)
-		gen11_reset_engines(gt, gt->info.engine_mask, 0);
+		__gen11_reset_engines(gt, gt->info.engine_mask, 0);
 
 	if (GRAPHICS_VER(gt->i915) >= 11)
-		ret = gen11_reset_engines(gt, engine_mask, retry);
+		ret = __gen11_reset_engines(gt, engine_mask, retry);
 	else
-		ret = gen6_reset_engines(gt, engine_mask, retry);
+		ret = __gen6_reset_engines(gt, engine_mask, retry);
 
 skip_reset:
 	for_each_engine_masked(engine, gt, engine_mask, tmp)
 		gen8_engine_reset_cancel(engine);
 
+	spin_unlock_irqrestore(&gt->uncore->lock, flags);
+
 	return ret;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Chris Wilson, Fei Yang, Thomas Hellstrom, Bruce Chang,
	Daniel Vetter, Dave Airlie, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Matt Roper, Matthew Brost,
	Rodrigo Vivi, Tejas Upadhyay, Tvrtko Ursulin,
	Umesh Nerlige Ramappa, dri-devel, intel-gfx, linux-kernel,
	mauro.chehab, Mika Kuoppala, Chris Wilson, Andi Shyti, stable,
	Thomas Hellström, Mauro Carvalho Chehab

From: Chris Wilson <chris.p.wilson@intel.com>

Don't allow two engines to be reset in parallel, as they would both
try to select a reset bit (and send requests to common registers)
and wait on that register, at the same time. Serialize control of
the reset requests/acks using the uncore->lock, which will also ensure
that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index a5338c3fde7a..c68d36fb5bbd 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
 	return err;
 }
 
-static int gen6_reset_engines(struct intel_gt *gt,
-			      intel_engine_mask_t engine_mask,
-			      unsigned int retry)
+static int __gen6_reset_engines(struct intel_gt *gt,
+				intel_engine_mask_t engine_mask,
+				unsigned int retry)
 {
 	struct intel_engine_cs *engine;
 	u32 hw_mask;
@@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt,
 	return gen6_hw_domain_reset(gt, hw_mask);
 }
 
+static int gen6_reset_engines(struct intel_gt *gt,
+			      intel_engine_mask_t engine_mask,
+			      unsigned int retry)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&gt->uncore->lock, flags);
+	ret = __gen6_reset_engines(gt, engine_mask, retry);
+	spin_unlock_irqrestore(&gt->uncore->lock, flags);
+
+	return ret;
+}
+
 static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine)
 {
 	int vecs_id;
@@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine)
 	rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit);
 }
 
-static int gen11_reset_engines(struct intel_gt *gt,
-			       intel_engine_mask_t engine_mask,
-			       unsigned int retry)
+static int __gen11_reset_engines(struct intel_gt *gt,
+				 intel_engine_mask_t engine_mask,
+				 unsigned int retry)
 {
 	struct intel_engine_cs *engine;
 	intel_engine_mask_t tmp;
@@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt,
 	struct intel_engine_cs *engine;
 	const bool reset_non_ready = retry >= 1;
 	intel_engine_mask_t tmp;
+	unsigned long flags;
 	int ret;
 
+	spin_lock_irqsave(&gt->uncore->lock, flags);
+
 	for_each_engine_masked(engine, gt, engine_mask, tmp) {
 		ret = gen8_engine_reset_prepare(engine);
 		if (ret && !reset_non_ready)
@@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt,
 	 * This is best effort, so ignore any error from the initial reset.
 	 */
 	if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES)
-		gen11_reset_engines(gt, gt->info.engine_mask, 0);
+		__gen11_reset_engines(gt, gt->info.engine_mask, 0);
 
 	if (GRAPHICS_VER(gt->i915) >= 11)
-		ret = gen11_reset_engines(gt, engine_mask, retry);
+		ret = __gen11_reset_engines(gt, engine_mask, retry);
 	else
-		ret = gen6_reset_engines(gt, engine_mask, retry);
+		ret = __gen6_reset_engines(gt, engine_mask, retry);
 
 skip_reset:
 	for_each_engine_masked(engine, gt, engine_mask, tmp)
 		gen8_engine_reset_cancel(engine);
 
+	spin_unlock_irqrestore(&gt->uncore->lock, flags);
+
 	return ret;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 6/6] drm/i915/gt: Serialize TLB invalidates with GT resets
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang,
	Chris Wilson, Matthew Auld, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	Tvrtko Ursulin, mauro.chehab, Michał Winiarski,
	linux-kernel, stable

From: Chris Wilson <chris.p.wilson@intel.com>

Avoid trying to invalidate the TLB in the middle of performing an
engine reset, as this may result in the reset timing out. Currently,
the TLB invalidate is only serialised by its own mutex, forgoing the
uncore lock, but we can take the uncore->lock as well to serialise
the mmio access, thereby serialising with the GDRST.

Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
i915 selftest/hangcheck.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index fb4fd5273ca4..33eb93586858 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1248,6 +1248,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
+	spin_lock_irq(&uncore->lock); /* seralise invalidate with GT reset */
+
 	awake = 0;
 	for_each_engine(engine, gt, id) {
 		struct reg_and_bit rb;
@@ -1272,6 +1274,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	     IS_ALDERLAKE_P(i915)))
 		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
 
+	spin_unlock_irq(&uncore->lock);
+
 	for_each_engine_masked(engine, gt, awake, tmp) {
 		struct reg_and_bit rb;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [Intel-gfx] [PATCH 6/6] drm/i915/gt: Serialize TLB invalidates with GT resets
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Mauro Carvalho Chehab,
	mauro.chehab, Michał Winiarski, linux-kernel, stable

From: Chris Wilson <chris.p.wilson@intel.com>

Avoid trying to invalidate the TLB in the middle of performing an
engine reset, as this may result in the reset timing out. Currently,
the TLB invalidate is only serialised by its own mutex, forgoing the
uncore lock, but we can take the uncore->lock as well to serialise
the mmio access, thereby serialising with the GDRST.

Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
i915 selftest/hangcheck.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index fb4fd5273ca4..33eb93586858 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1248,6 +1248,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
+	spin_lock_irq(&uncore->lock); /* seralise invalidate with GT reset */
+
 	awake = 0;
 	for_each_engine(engine, gt, id) {
 		struct reg_and_bit rb;
@@ -1272,6 +1274,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	     IS_ALDERLAKE_P(i915)))
 		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
 
+	spin_unlock_irq(&uncore->lock);
+
 	for_each_engine_masked(engine, gt, awake, tmp) {
 		struct reg_and_bit rb;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH 6/6] drm/i915/gt: Serialize TLB invalidates with GT resets
@ 2022-06-15 15:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-15 15:27 UTC (permalink / raw)
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Joonas Lahtinen, Lucas De Marchi, Matt Roper, Matthew Auld,
	Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel,
	mauro.chehab, Mauro Carvalho Chehab, stable

From: Chris Wilson <chris.p.wilson@intel.com>

Avoid trying to invalidate the TLB in the middle of performing an
engine reset, as this may result in the reset timing out. Currently,
the TLB invalidate is only serialised by its own mutex, forgoing the
uncore lock, but we can take the uncore->lock as well to serialise
the mmio access, thereby serialising with the GDRST.

Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
i915 selftest/hangcheck.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index fb4fd5273ca4..33eb93586858 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -1248,6 +1248,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
+	spin_lock_irq(&uncore->lock); /* seralise invalidate with GT reset */
+
 	awake = 0;
 	for_each_engine(engine, gt, id) {
 		struct reg_and_bit rb;
@@ -1272,6 +1274,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	     IS_ALDERLAKE_P(i915)))
 		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
 
+	spin_unlock_irq(&uncore->lock);
+
 	for_each_engine_masked(engine, gt, awake, tmp) {
 		struct reg_and_bit rb;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
@ 2022-06-15 17:03     ` Umesh Nerlige Ramappa
  -1 siblings, 0 replies; 87+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-06-15 17:03 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable

On Wed, Jun 15, 2022 at 04:27:36PM +0100, Mauro Carvalho Chehab wrote:
>From: Chris Wilson <chris.p.wilson@intel.com>
>
>On gen12 HW, ensure that the TLB of the OA unit is also invalidated
>as just invalidating the TLB of an engine is not enough.
>
>Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>
>Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>Cc: Fei Yang <fei.yang@intel.com>
>Cc: Andi Shyti <andi.shyti@linux.intel.com>
>Cc: stable@vger.kernel.org
>Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>---
>
>See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
>
> drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>index d5ed6a6ac67c..61b7ec5118f9 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>@@ -10,6 +10,7 @@
> #include "pxp/intel_pxp.h"
>
> #include "i915_drv.h"
>+#include "i915_perf_oa_regs.h"
> #include "intel_context.h"
> #include "intel_engine_pm.h"
> #include "intel_engine_regs.h"
>@@ -1259,6 +1260,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> 		awake |= engine->mask;
> 	}
>
>+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>+	if (awake &&
>+	    (IS_TIGERLAKE(i915) ||
>+	     IS_DG1(i915) ||
>+	     IS_ROCKETLAKE(i915) ||
>+	     IS_ALDERLAKE_S(i915) ||
>+	     IS_ALDERLAKE_P(i915)))
>+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
>+

This patch can be dropped since this is being done in i915/i915_perf.c 
-> gen12_oa_disable and is synchronized with OA use cases.

Regards,
Umesh


> 	for_each_engine_masked(engine, gt, awake, tmp) {
> 		struct reg_and_bit rb;
>
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
@ 2022-06-15 17:03     ` Umesh Nerlige Ramappa
  0 siblings, 0 replies; 87+ messages in thread
From: Umesh Nerlige Ramappa @ 2022-06-15 17:03 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Thomas Hellström, mauro.chehab, Michał Winiarski,
	David Airlie, intel-gfx, Lucas De Marchi, linux-kernel,
	Chris Wilson, Thomas Hellstrom, dri-devel, Rodrigo Vivi,
	Dave Airlie, stable, Matthew Auld

On Wed, Jun 15, 2022 at 04:27:36PM +0100, Mauro Carvalho Chehab wrote:
>From: Chris Wilson <chris.p.wilson@intel.com>
>
>On gen12 HW, ensure that the TLB of the OA unit is also invalidated
>as just invalidating the TLB of an engine is not enough.
>
>Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>
>Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>Cc: Fei Yang <fei.yang@intel.com>
>Cc: Andi Shyti <andi.shyti@linux.intel.com>
>Cc: stable@vger.kernel.org
>Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>---
>
>See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
>
> drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
>diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>index d5ed6a6ac67c..61b7ec5118f9 100644
>--- a/drivers/gpu/drm/i915/gt/intel_gt.c
>+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>@@ -10,6 +10,7 @@
> #include "pxp/intel_pxp.h"
>
> #include "i915_drv.h"
>+#include "i915_perf_oa_regs.h"
> #include "intel_context.h"
> #include "intel_engine_pm.h"
> #include "intel_engine_regs.h"
>@@ -1259,6 +1260,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> 		awake |= engine->mask;
> 	}
>
>+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>+	if (awake &&
>+	    (IS_TIGERLAKE(i915) ||
>+	     IS_DG1(i915) ||
>+	     IS_ROCKETLAKE(i915) ||
>+	     IS_ALDERLAKE_S(i915) ||
>+	     IS_ALDERLAKE_P(i915)))
>+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
>+

This patch can be dropped since this is being done in i915/i915_perf.c 
-> gen12_oa_disable and is synchronized with OA use cases.

Regards,
Umesh


> 	for_each_engine_masked(engine, gt, awake, tmp) {
> 		struct reg_and_bit rb;
>
>-- 
>2.36.1
>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix TLB invalidate issues with Broadwell
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
                   ` (7 preceding siblings ...)
  (?)
@ 2022-06-15 17:26 ` Patchwork
  -1 siblings, 0 replies; 87+ messages in thread
From: Patchwork @ 2022-06-15 17:26 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: intel-gfx

== Series Details ==

Series: Fix TLB invalidate issues with Broadwell
URL   : https://patchwork.freedesktop.org/series/105167/
State : warning

== Summary ==

Error: dim checkpatch failed
f4a32c9a2c4b drm/i915/gt: Ignore TLB invalidations on idle engines
-:135: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'gt' - possible side-effects?
#135: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:58:
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)

-:135: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'wf' - possible side-effects?
#135: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:58:
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)

total: 0 errors, 0 warnings, 2 checks, 95 lines checked
81db701fd3ad drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
8cf10580267f drm/i915/gt: Skip TLB invalidations once wedged
dff31200063a drm/i915/gt: Only invalidate TLBs exposed to user manipulation
-:11: WARNING:REPEATED_WORD: Possible repeated word: 'of'
#11: 
control of the kernel, as there's no risk of of concurrent access

total: 0 errors, 1 warnings, 0 checks, 9 lines checked
93a8fff3630a drm/i915/gt: Serialize GRDOM access between multiple engine resets
-:111: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: Chris Wilson <chris.p.wilson@intel.com>' != 'Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>'

total: 0 errors, 1 warnings, 0 checks, 77 lines checked
8c4f0ef22248 drm/i915/gt: Serialize TLB invalidates with GT resets



^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Fix TLB invalidate issues with Broadwell
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
                   ` (8 preceding siblings ...)
  (?)
@ 2022-06-15 17:26 ` Patchwork
  -1 siblings, 0 replies; 87+ messages in thread
From: Patchwork @ 2022-06-15 17:26 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: intel-gfx

== Series Details ==

Series: Fix TLB invalidate issues with Broadwell
URL   : https://patchwork.freedesktop.org/series/105167/
State : warning

== Summary ==

Error: dim sparse failed
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:28:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:28:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:28:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:33:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:33:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:51:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:51:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:51:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:57:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:57:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_reset.c:1410:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block



^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Fix TLB invalidate issues with Broadwell
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
                   ` (9 preceding siblings ...)
  (?)
@ 2022-06-15 17:48 ` Patchwork
  -1 siblings, 0 replies; 87+ messages in thread
From: Patchwork @ 2022-06-15 17:48 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 7777 bytes --]

== Series Details ==

Series: Fix TLB invalidate issues with Broadwell
URL   : https://patchwork.freedesktop.org/series/105167/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11761 -> Patchwork_105167v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/index.html

Participating hosts (42 -> 41)
------------------------------

  Additional (1): bat-jsl-2 
  Missing    (2): bat-dg2-9 fi-kbl-guc 

Known issues
------------

  Here are the changes found in Patchwork_105167v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_pm_rpm@module-reload:
    - bat-adlp-4:         [PASS][1] -> [DMESG-WARN][2] ([i915#3576]) +2 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/bat-adlp-4/igt@i915_pm_rpm@module-reload.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/bat-adlp-4/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live@gt_engines:
    - bat-dg1-5:          [PASS][3] -> [INCOMPLETE][4] ([i915#4418])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/bat-dg1-5/igt@i915_selftest@live@gt_engines.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/bat-dg1-5/igt@i915_selftest@live@gt_engines.html

  * igt@i915_selftest@live@gt_mocs:
    - fi-rkl-guc:         [PASS][5] -> [DMESG-WARN][6] ([i915#5790])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/fi-rkl-guc/igt@i915_selftest@live@gt_mocs.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/fi-rkl-guc/igt@i915_selftest@live@gt_mocs.html

  * igt@i915_selftest@live@hangcheck:
    - bat-dg1-6:          NOTRUN -> [DMESG-FAIL][7] ([i915#4494] / [i915#4957])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/bat-dg1-6/igt@i915_selftest@live@hangcheck.html

  * igt@i915_suspend@basic-s2idle-without-i915:
    - bat-dg1-6:          NOTRUN -> [INCOMPLETE][8] ([i915#6011])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/bat-dg1-6/igt@i915_suspend@basic-s2idle-without-i915.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-pnv-d510:        NOTRUN -> [SKIP][9] ([fdo#109271])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/fi-pnv-d510/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_flip@basic-plain-flip@a-edp1:
    - fi-tgl-u2:          [PASS][10] -> [DMESG-WARN][11] ([i915#402])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/fi-tgl-u2/igt@kms_flip@basic-plain-flip@a-edp1.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/fi-tgl-u2/igt@kms_flip@basic-plain-flip@a-edp1.html

  * igt@runner@aborted:
    - fi-bdw-5557u:       NOTRUN -> [FAIL][12] ([i915#4312])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/fi-bdw-5557u/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@gem:
    - fi-pnv-d510:        [DMESG-FAIL][13] ([i915#4528]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/fi-pnv-d510/igt@i915_selftest@live@gem.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/fi-pnv-d510/igt@i915_selftest@live@gem.html

  * igt@i915_selftest@live@gt_engines:
    - bat-dg1-6:          [INCOMPLETE][15] ([i915#4418]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/bat-dg1-6/igt@i915_selftest@live@gt_engines.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/bat-dg1-6/igt@i915_selftest@live@gt_engines.html

  * igt@i915_selftest@live@hangcheck:
    - fi-bdw-5557u:       [INCOMPLETE][17] ([i915#3921]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/fi-bdw-5557u/igt@i915_selftest@live@hangcheck.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/fi-bdw-5557u/igt@i915_selftest@live@hangcheck.html

  * igt@kms_busy@basic@flip:
    - {bat-adlp-6}:       [DMESG-WARN][19] ([i915#3576]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/bat-adlp-6/igt@kms_busy@basic@flip.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/bat-adlp-6/igt@kms_busy@basic@flip.html

  * igt@kms_flip@basic-flip-vs-modeset@a-edp1:
    - fi-tgl-u2:          [DMESG-WARN][21] ([i915#402]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/fi-tgl-u2/igt@kms_flip@basic-flip-vs-modeset@a-edp1.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c:
    - bat-adlp-4:         [DMESG-WARN][23] ([i915#3576]) -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/bat-adlp-4/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/bat-adlp-4/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#3921]: https://gitlab.freedesktop.org/drm/intel/issues/3921
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4418]: https://gitlab.freedesktop.org/drm/intel/issues/4418
  [i915#4494]: https://gitlab.freedesktop.org/drm/intel/issues/4494
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4957]: https://gitlab.freedesktop.org/drm/intel/issues/4957
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5790]: https://gitlab.freedesktop.org/drm/intel/issues/5790
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#6011]: https://gitlab.freedesktop.org/drm/intel/issues/6011
  [i915#6227]: https://gitlab.freedesktop.org/drm/intel/issues/6227


Build changes
-------------

  * Linux: CI_DRM_11761 -> Patchwork_105167v1

  CI-20190529: 20190529
  CI_DRM_11761: ec90bfe7e13f9a75f02b7f409c8f23911e551b9f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6529: b96bf5a0307fc0bdbf6c8e86872817306e102883 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105167v1: ec90bfe7e13f9a75f02b7f409c8f23911e551b9f @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

bdb931865fc1 drm/i915/gt: Serialize TLB invalidates with GT resets
c028d9cb4966 drm/i915/gt: Serialize GRDOM access between multiple engine resets
80dde7b33084 drm/i915/gt: Only invalidate TLBs exposed to user manipulation
7503afbec552 drm/i915/gt: Skip TLB invalidations once wedged
9f278bd15a39 drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
6902d7385cee drm/i915/gt: Ignore TLB invalidations on idle engines

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/index.html

[-- Attachment #2: Type: text/html, Size: 8230 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for Fix TLB invalidate issues with Broadwell
  2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
                   ` (10 preceding siblings ...)
  (?)
@ 2022-06-15 23:45 ` Patchwork
  -1 siblings, 0 replies; 87+ messages in thread
From: Patchwork @ 2022-06-15 23:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 60357 bytes --]

== Series Details ==

Series: Fix TLB invalidate issues with Broadwell
URL   : https://patchwork.freedesktop.org/series/105167/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11761_full -> Patchwork_105167v1_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_105167v1_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_105167v1_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (12 -> 12)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_105167v1_full:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_plane_alpha_blend@pipe-d-alpha-basic:
    - shard-tglb:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-tglb3/igt@kms_plane_alpha_blend@pipe-d-alpha-basic.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb8/igt@kms_plane_alpha_blend@pipe-d-alpha-basic.html

  * igt@perf@non-zero-reason:
    - shard-skl:          [PASS][3] -> [TIMEOUT][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-skl2/igt@perf@non-zero-reason.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl3/igt@perf@non-zero-reason.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_ccs@suspend-resume:
    - {shard-rkl}:        [SKIP][5] ([i915#5325]) -> [SKIP][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@gem_ccs@suspend-resume.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@gem_ccs@suspend-resume.html

  * igt@gem_render_copy@yf-tiled-mc-ccs-to-vebox-yf-tiled:
    - {shard-rkl}:        NOTRUN -> [SKIP][7]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@gem_render_copy@yf-tiled-mc-ccs-to-vebox-yf-tiled.html

  * igt@gen9_exec_parse@bb-oversize:
    - {shard-rkl}:        [SKIP][8] ([i915#2527]) -> [SKIP][9]
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@gen9_exec_parse@bb-oversize.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@gen9_exec_parse@bb-oversize.html

  * igt@perf@gen12-mi-rpc:
    - {shard-rkl}:        [PASS][10] -> [SKIP][11]
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@perf@gen12-mi-rpc.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@perf@gen12-mi-rpc.html

  
Known issues
------------

  Here are the changes found in Patchwork_105167v1_full that come from known issues:

### CI changes ###

#### Possible fixes ####

  * boot:
    - shard-glk:          ([PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22], [FAIL][23], [PASS][24], [PASS][25], [PASS][26], [PASS][27], [PASS][28], [PASS][29], [PASS][30], [PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36]) ([i915#4392]) -> ([PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], [PASS][49], [PASS][50], [PASS][51], [PASS][52], [PASS][53], [PASS][54], [PASS][55], [PASS][56], [PASS][57], [PASS][58], [PASS][59], [PASS][60], [PASS][61])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk3/boot.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk2/boot.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk9/boot.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk9/boot.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk9/boot.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk8/boot.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk8/boot.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk1/boot.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk8/boot.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk1/boot.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk7/boot.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk1/boot.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk7/boot.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk6/boot.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk6/boot.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk6/boot.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk5/boot.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk5/boot.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk5/boot.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk4/boot.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk4/boot.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk4/boot.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk3/boot.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk3/boot.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk2/boot.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk1/boot.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk1/boot.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk1/boot.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk2/boot.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk2/boot.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk2/boot.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk3/boot.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk3/boot.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk3/boot.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk4/boot.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk4/boot.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk5/boot.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk5/boot.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk5/boot.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk6/boot.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk6/boot.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk7/boot.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk7/boot.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk7/boot.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk8/boot.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk8/boot.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk8/boot.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk9/boot.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk9/boot.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk9/boot.html

  

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@chamelium:
    - shard-tglb:         NOTRUN -> [SKIP][62] ([fdo#111827])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@feature_discovery@chamelium.html

  * igt@feature_discovery@display-2x:
    - shard-iclb:         NOTRUN -> [SKIP][63] ([i915#1839])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@feature_discovery@display-2x.html

  * igt@gem_exec_balancer@parallel-contexts:
    - shard-iclb:         [PASS][64] -> [SKIP][65] ([i915#4525])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb4/igt@gem_exec_balancer@parallel-contexts.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb6/igt@gem_exec_balancer@parallel-contexts.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-skl:          NOTRUN -> [FAIL][66] ([i915#6141])
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl10/igt@gem_exec_fair@basic-deadline.html
    - shard-apl:          NOTRUN -> [FAIL][67] ([i915#6141])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl2/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
    - shard-tglb:         NOTRUN -> [FAIL][68] ([i915#2842])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@gem_exec_fair@basic-none-rrul@rcs0.html

  * igt@gem_exec_fair@basic-pace@bcs0:
    - shard-tglb:         [PASS][69] -> [FAIL][70] ([i915#2842])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-tglb1/igt@gem_exec_fair@basic-pace@bcs0.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb3/igt@gem_exec_fair@basic-pace@bcs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-iclb:         [PASS][71] -> [FAIL][72] ([i915#2842])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb2/igt@gem_exec_fair@basic-pace@vcs0.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb6/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_exec_params@no-vebox:
    - shard-tglb:         NOTRUN -> [SKIP][73] ([fdo#109283] / [i915#4877])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@gem_exec_params@no-vebox.html

  * igt@gem_huc_copy@huc-copy:
    - shard-skl:          NOTRUN -> [SKIP][74] ([fdo#109271] / [i915#2190])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl7/igt@gem_huc_copy@huc-copy.html

  * igt@gem_lmem_swapping@heavy-verify-random:
    - shard-glk:          NOTRUN -> [SKIP][75] ([fdo#109271] / [i915#4613])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk6/igt@gem_lmem_swapping@heavy-verify-random.html

  * igt@gem_lmem_swapping@parallel-random:
    - shard-apl:          NOTRUN -> [SKIP][76] ([fdo#109271] / [i915#4613])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl7/igt@gem_lmem_swapping@parallel-random.html

  * igt@gem_lmem_swapping@parallel-random-verify-ccs:
    - shard-skl:          NOTRUN -> [SKIP][77] ([fdo#109271] / [i915#4613]) +1 similar issue
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl7/igt@gem_lmem_swapping@parallel-random-verify-ccs.html

  * igt@gem_lmem_swapping@verify-random:
    - shard-tglb:         NOTRUN -> [SKIP][78] ([i915#4613]) +2 similar issues
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@gem_lmem_swapping@verify-random.html

  * igt@gem_pwrite@basic-exhaustion:
    - shard-skl:          NOTRUN -> [WARN][79] ([i915#2658])
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl7/igt@gem_pwrite@basic-exhaustion.html

  * igt@gem_pxp@create-valid-protected-context:
    - shard-tglb:         NOTRUN -> [SKIP][80] ([i915#4270])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@gem_pxp@create-valid-protected-context.html

  * igt@gem_softpin@evict-snoop-interruptible:
    - shard-tglb:         NOTRUN -> [SKIP][81] ([fdo#109312])
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@gem_softpin@evict-snoop-interruptible.html

  * igt@gem_userptr_blits@coherency-sync:
    - shard-tglb:         NOTRUN -> [SKIP][82] ([fdo#110542])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@gem_userptr_blits@coherency-sync.html

  * igt@gem_userptr_blits@dmabuf-sync:
    - shard-tglb:         NOTRUN -> [SKIP][83] ([i915#3323])
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@gem_userptr_blits@dmabuf-sync.html

  * igt@gem_userptr_blits@huge-split:
    - shard-skl:          [PASS][84] -> [FAIL][85] ([i915#3376])
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-skl6/igt@gem_userptr_blits@huge-split.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl2/igt@gem_userptr_blits@huge-split.html

  * igt@gem_userptr_blits@unsync-unmap:
    - shard-tglb:         NOTRUN -> [SKIP][86] ([i915#3297])
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@gem_userptr_blits@unsync-unmap.html

  * igt@gem_userptr_blits@vma-merge:
    - shard-apl:          NOTRUN -> [FAIL][87] ([i915#3318])
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl7/igt@gem_userptr_blits@vma-merge.html

  * igt@gen7_exec_parse@oacontrol-tracking:
    - shard-tglb:         NOTRUN -> [SKIP][88] ([fdo#109289]) +1 similar issue
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@gen7_exec_parse@oacontrol-tracking.html

  * igt@gen9_exec_parse@bb-secure:
    - shard-iclb:         NOTRUN -> [SKIP][89] ([i915#2856])
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@gen9_exec_parse@bb-secure.html

  * igt@gen9_exec_parse@cmd-crossing-page:
    - shard-tglb:         NOTRUN -> [SKIP][90] ([i915#2527] / [i915#2856]) +1 similar issue
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@gen9_exec_parse@cmd-crossing-page.html

  * igt@i915_module_load@load:
    - shard-skl:          NOTRUN -> [SKIP][91] ([fdo#109271] / [i915#6227])
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl4/igt@i915_module_load@load.html

  * igt@i915_pm_dc@dc3co-vpb-simulation:
    - shard-apl:          NOTRUN -> [SKIP][92] ([fdo#109271] / [i915#658])
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl7/igt@i915_pm_dc@dc3co-vpb-simulation.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-iclb:         [PASS][93] -> [FAIL][94] ([i915#454])
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb8/igt@i915_pm_dc@dc6-dpms.html
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb3/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-skl:          NOTRUN -> [FAIL][95] ([i915#454])
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl1/igt@i915_pm_dc@dc6-psr.html

  * igt@i915_selftest@live@gt_pm:
    - shard-skl:          NOTRUN -> [DMESG-FAIL][96] ([i915#1886])
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl9/igt@i915_selftest@live@gt_pm.html

  * igt@i915_suspend@basic-s3-without-i915:
    - shard-tglb:         NOTRUN -> [SKIP][97] ([i915#5903])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@i915_suspend@basic-s3-without-i915.html

  * igt@i915_suspend@forcewake:
    - shard-kbl:          [PASS][98] -> [INCOMPLETE][99] ([i915#3614] / [i915#4817])
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-kbl1/igt@i915_suspend@forcewake.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-kbl4/igt@i915_suspend@forcewake.html

  * igt@kms_big_fb@4-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip:
    - shard-tglb:         NOTRUN -> [SKIP][100] ([i915#5286]) +5 similar issues
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@kms_big_fb@4-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip.html

  * igt@kms_big_fb@linear-8bpp-rotate-90:
    - shard-iclb:         NOTRUN -> [SKIP][101] ([fdo#110725] / [fdo#111614])
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_big_fb@linear-8bpp-rotate-90.html

  * igt@kms_big_fb@y-tiled-64bpp-rotate-270:
    - shard-tglb:         NOTRUN -> [SKIP][102] ([fdo#111614]) +1 similar issue
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@kms_big_fb@y-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][103] ([i915#3743]) +1 similar issue
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl10/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html

  * igt@kms_ccs@pipe-a-bad-aux-stride-y_tiled_gen12_rc_ccs_cc:
    - shard-skl:          NOTRUN -> [SKIP][104] ([fdo#109271] / [i915#3886]) +8 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl4/igt@kms_ccs@pipe-a-bad-aux-stride-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-crc-primary-rotation-180-yf_tiled_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][105] ([fdo#111615] / [i915#3689]) +6 similar issues
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@kms_ccs@pipe-a-crc-primary-rotation-180-yf_tiled_ccs.html

  * igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc:
    - shard-apl:          NOTRUN -> [SKIP][106] ([fdo#109271] / [i915#3886]) +4 similar issues
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl7/igt@kms_ccs@pipe-a-missing-ccs-buffer-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-b-bad-aux-stride-y_tiled_gen12_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][107] ([i915#3689] / [i915#3886]) +2 similar issues
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_ccs@pipe-b-bad-aux-stride-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_gen12_rc_ccs:
    - shard-iclb:         NOTRUN -> [SKIP][108] ([fdo#109278]) +3 similar issues
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_ccs@pipe-b-missing-ccs-buffer-y_tiled_gen12_rc_ccs.html

  * igt@kms_ccs@pipe-c-bad-rotation-90-4_tiled_dg2_rc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][109] ([i915#3689]) +8 similar issues
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@kms_ccs@pipe-c-bad-rotation-90-4_tiled_dg2_rc_ccs.html

  * igt@kms_ccs@pipe-c-crc-primary-rotation-180-4_tiled_dg2_mc_ccs:
    - shard-tglb:         NOTRUN -> [SKIP][110] ([i915#6095]) +2 similar issues
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_ccs@pipe-c-crc-primary-rotation-180-4_tiled_dg2_mc_ccs.html

  * igt@kms_chamelium@dp-crc-fast:
    - shard-apl:          NOTRUN -> [SKIP][111] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl7/igt@kms_chamelium@dp-crc-fast.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - shard-skl:          NOTRUN -> [SKIP][112] ([fdo#109271] / [fdo#111827]) +12 similar issues
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl10/igt@kms_chamelium@hdmi-crc-fast.html

  * igt@kms_color@pipe-d-ctm-negative:
    - shard-iclb:         NOTRUN -> [SKIP][113] ([fdo#109278] / [i915#1149])
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_color@pipe-d-ctm-negative.html

  * igt@kms_color_chamelium@pipe-a-ctm-0-75:
    - shard-iclb:         NOTRUN -> [SKIP][114] ([fdo#109284] / [fdo#111827])
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_color_chamelium@pipe-a-ctm-0-75.html

  * igt@kms_color_chamelium@pipe-b-ctm-blue-to-red:
    - shard-glk:          NOTRUN -> [SKIP][115] ([fdo#109271] / [fdo#111827]) +1 similar issue
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk6/igt@kms_color_chamelium@pipe-b-ctm-blue-to-red.html

  * igt@kms_color_chamelium@pipe-b-degamma:
    - shard-tglb:         NOTRUN -> [SKIP][116] ([fdo#109284] / [fdo#111827]) +12 similar issues
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@kms_color_chamelium@pipe-b-degamma.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-tglb:         NOTRUN -> [SKIP][117] ([i915#1063])
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_cursor_crc@pipe-a-cursor-max-size-onscreen:
    - shard-tglb:         NOTRUN -> [SKIP][118] ([i915#3359]) +3 similar issues
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@kms_cursor_crc@pipe-a-cursor-max-size-onscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-32x32-rapid-movement:
    - shard-tglb:         NOTRUN -> [SKIP][119] ([i915#3319])
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@kms_cursor_crc@pipe-c-cursor-32x32-rapid-movement.html

  * igt@kms_cursor_crc@pipe-d-cursor-512x512-random:
    - shard-tglb:         NOTRUN -> [SKIP][120] ([fdo#109279] / [i915#3359]) +3 similar issues
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@kms_cursor_crc@pipe-d-cursor-512x512-random.html

  * igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions:
    - shard-tglb:         NOTRUN -> [SKIP][121] ([fdo#109274] / [fdo#111825]) +6 similar issues
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_cursor_legacy@cursorb-vs-flipa-atomic-transitions.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions:
    - shard-skl:          NOTRUN -> [SKIP][122] ([fdo#109271]) +214 similar issues
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl4/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions.html

  * igt@kms_cursor_legacy@pipe-d-torture-bo:
    - shard-skl:          NOTRUN -> [SKIP][123] ([fdo#109271] / [i915#533]) +1 similar issue
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl4/igt@kms_cursor_legacy@pipe-d-torture-bo.html

  * igt@kms_cursor_legacy@short-busy-flip-before-cursor-toggle:
    - shard-tglb:         NOTRUN -> [SKIP][124] ([i915#4103])
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_cursor_legacy@short-busy-flip-before-cursor-toggle.html

  * igt@kms_draw_crc@draw-method-rgb565-mmap-cpu-4tiled:
    - shard-iclb:         NOTRUN -> [SKIP][125] ([i915#5287])
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_draw_crc@draw-method-rgb565-mmap-cpu-4tiled.html

  * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-4tiled:
    - shard-tglb:         NOTRUN -> [SKIP][126] ([i915#5287]) +2 similar issues
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-4tiled.html

  * igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ac-hdmi-a1-hdmi-a2:
    - shard-glk:          [PASS][127] -> [FAIL][128] ([i915#79])
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk7/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ac-hdmi-a1-hdmi-a2.html
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk7/igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@ac-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@2x-flip-vs-panning-vs-hang:
    - shard-iclb:         NOTRUN -> [SKIP][129] ([fdo#109274])
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_flip@2x-flip-vs-panning-vs-hang.html

  * igt@kms_flip@flip-vs-blocking-wf-vblank@c-edp1:
    - shard-skl:          [PASS][130] -> [FAIL][131] ([i915#2122])
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-skl1/igt@kms_flip@flip-vs-blocking-wf-vblank@c-edp1.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl3/igt@kms_flip@flip-vs-blocking-wf-vblank@c-edp1.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-hdmi-a2:
    - shard-glk:          NOTRUN -> [FAIL][132] ([i915#2122])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk6/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-hdmi-a2.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling:
    - shard-iclb:         [PASS][133] -> [SKIP][134] ([i915#3701])
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb1/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling.html
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb2/igt@kms_flip_scaled_crc@flip-32bpp-ytile-to-64bpp-ytile-downscaling.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-upscaling:
    - shard-glk:          [PASS][135] -> [FAIL][136] ([i915#4911]) +1 similar issue
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk2/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-upscaling.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk8/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-upscaling.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff:
    - shard-iclb:         NOTRUN -> [SKIP][137] ([fdo#109280]) +2 similar issues
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-onoff.html

  * igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-fullscreen:
    - shard-tglb:         NOTRUN -> [SKIP][138] ([fdo#109280] / [fdo#111825]) +21 similar issues
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_frontbuffer_tracking@psr-2p-primscrn-spr-indfb-fullscreen.html

  * igt@kms_hdr@bpc-switch@pipe-a-dp-1:
    - shard-kbl:          [PASS][139] -> [FAIL][140] ([i915#1188])
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-kbl4/igt@kms_hdr@bpc-switch@pipe-a-dp-1.html
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-kbl1/igt@kms_hdr@bpc-switch@pipe-a-dp-1.html

  * igt@kms_hdr@static-toggle-dpms:
    - shard-iclb:         NOTRUN -> [SKIP][141] ([i915#3555])
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_hdr@static-toggle-dpms.html

  * igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a-planes:
    - shard-apl:          [PASS][142] -> [DMESG-WARN][143] ([i915#180]) +1 similar issue
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-apl3/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a-planes.html
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl4/igt@kms_plane@plane-panning-bottom-right-suspend@pipe-a-planes.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-basic:
    - shard-skl:          NOTRUN -> [FAIL][144] ([fdo#108145] / [i915#265]) +2 similar issues
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl1/igt@kms_plane_alpha_blend@pipe-b-alpha-basic.html

  * igt@kms_plane_alpha_blend@pipe-d-alpha-opaque-fb:
    - shard-glk:          NOTRUN -> [SKIP][145] ([fdo#109271]) +49 similar issues
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk6/igt@kms_plane_alpha_blend@pipe-d-alpha-opaque-fb.html

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-4:
    - shard-tglb:         NOTRUN -> [SKIP][146] ([i915#5288])
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb3/igt@kms_plane_multiple@atomic-pipe-b-tiling-4.html

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-yf:
    - shard-tglb:         NOTRUN -> [SKIP][147] ([fdo#111615]) +4 similar issues
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@kms_plane_multiple@atomic-pipe-b-tiling-yf.html

  * igt@kms_plane_scaling@plane-scaler-with-clipping-clamping-rotation@pipe-d-edp-1:
    - shard-tglb:         NOTRUN -> [SKIP][148] ([i915#5176]) +7 similar issues
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_plane_scaling@plane-scaler-with-clipping-clamping-rotation@pipe-d-edp-1.html

  * igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-75@pipe-c-dp-1:
    - shard-apl:          NOTRUN -> [SKIP][149] ([fdo#109271]) +78 similar issues
   [149]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl7/igt@kms_plane_scaling@planes-upscale-20x20-downscale-factor-0-75@pipe-c-dp-1.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area:
    - shard-iclb:         NOTRUN -> [SKIP][150] ([fdo#111068] / [i915#658])
   [150]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html

  * igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-big-fb:
    - shard-skl:          NOTRUN -> [SKIP][151] ([fdo#109271] / [i915#658]) +1 similar issue
   [151]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl9/igt@kms_psr2_sf@primary-plane-update-sf-dmg-area-big-fb.html

  * igt@kms_psr2_su@page_flip-nv12:
    - shard-glk:          NOTRUN -> [SKIP][152] ([fdo#109271] / [i915#658])
   [152]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk6/igt@kms_psr2_su@page_flip-nv12.html

  * igt@kms_psr2_su@page_flip-xrgb8888:
    - shard-tglb:         NOTRUN -> [SKIP][153] ([i915#1911]) +1 similar issue
   [153]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@kms_psr2_su@page_flip-xrgb8888.html

  * igt@kms_psr@psr2_sprite_mmap_gtt:
    - shard-iclb:         [PASS][154] -> [SKIP][155] ([fdo#109441]) +2 similar issues
   [154]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb2/igt@kms_psr@psr2_sprite_mmap_gtt.html
   [155]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb6/igt@kms_psr@psr2_sprite_mmap_gtt.html

  * igt@kms_psr@psr2_suspend:
    - shard-tglb:         NOTRUN -> [FAIL][156] ([i915#132] / [i915#3467]) +2 similar issues
   [156]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@kms_psr@psr2_suspend.html

  * igt@kms_sysfs_edid_timing:
    - shard-apl:          NOTRUN -> [FAIL][157] ([IGT#2])
   [157]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl7/igt@kms_sysfs_edid_timing.html

  * igt@kms_vrr@flipline:
    - shard-tglb:         NOTRUN -> [SKIP][158] ([i915#3555]) +2 similar issues
   [158]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@kms_vrr@flipline.html

  * igt@nouveau_crc@pipe-a-source-rg:
    - shard-tglb:         NOTRUN -> [SKIP][159] ([i915#2530]) +3 similar issues
   [159]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@nouveau_crc@pipe-a-source-rg.html

  * igt@nouveau_crc@pipe-c-ctx-flip-detection:
    - shard-iclb:         NOTRUN -> [SKIP][160] ([i915#2530])
   [160]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@nouveau_crc@pipe-c-ctx-flip-detection.html

  * igt@prime_nv_api@i915_self_import:
    - shard-tglb:         NOTRUN -> [SKIP][161] ([fdo#109291]) +5 similar issues
   [161]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb2/igt@prime_nv_api@i915_self_import.html

  * igt@prime_vgem@coherency-gtt:
    - shard-tglb:         NOTRUN -> [SKIP][162] ([fdo#109295] / [fdo#111656])
   [162]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@prime_vgem@coherency-gtt.html

  * igt@sw_sync@sync_merge_same:
    - shard-tglb:         NOTRUN -> [FAIL][163] ([i915#6140])
   [163]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@sw_sync@sync_merge_same.html

  * igt@sw_sync@sync_multi_timeline_wait:
    - shard-skl:          NOTRUN -> [FAIL][164] ([i915#6140])
   [164]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl7/igt@sw_sync@sync_multi_timeline_wait.html

  * igt@sysfs_clients@busy:
    - shard-skl:          NOTRUN -> [SKIP][165] ([fdo#109271] / [i915#2994]) +1 similar issue
   [165]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl4/igt@sysfs_clients@busy.html

  * igt@sysfs_clients@fair-7:
    - shard-glk:          NOTRUN -> [SKIP][166] ([fdo#109271] / [i915#2994])
   [166]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk6/igt@sysfs_clients@fair-7.html

  * igt@sysfs_clients@sema-50:
    - shard-tglb:         NOTRUN -> [SKIP][167] ([i915#2994]) +1 similar issue
   [167]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@sysfs_clients@sema-50.html

  
#### Possible fixes ####

  * igt@gem_ctx_isolation@preservation-s3@bcs0:
    - shard-apl:          [DMESG-WARN][168] ([i915#180]) -> [PASS][169] +2 similar issues
   [168]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-apl3/igt@gem_ctx_isolation@preservation-s3@bcs0.html
   [169]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl7/igt@gem_ctx_isolation@preservation-s3@bcs0.html

  * igt@gem_exec_fair@basic-none@vcs1:
    - shard-kbl:          [FAIL][170] ([i915#2842]) -> [PASS][171] +4 similar issues
   [170]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-kbl3/igt@gem_exec_fair@basic-none@vcs1.html
   [171]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-kbl3/igt@gem_exec_fair@basic-none@vcs1.html

  * igt@gem_exec_fair@basic-none@vecs0:
    - shard-apl:          [FAIL][172] ([i915#2842]) -> [PASS][173]
   [172]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-apl1/igt@gem_exec_fair@basic-none@vecs0.html
   [173]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl6/igt@gem_exec_fair@basic-none@vecs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - {shard-rkl}:        [FAIL][174] ([i915#2842]) -> [PASS][175]
   [174]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [175]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-glk:          [FAIL][176] ([i915#2842]) -> [PASS][177]
   [176]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-glk5/igt@gem_exec_fair@basic-throttle@rcs0.html
   [177]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-glk1/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_exec_reloc@basic-gtt-cpu-active:
    - {shard-rkl}:        [SKIP][178] ([i915#3281]) -> [PASS][179] +4 similar issues
   [178]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@gem_exec_reloc@basic-gtt-cpu-active.html
   [179]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@gem_exec_reloc@basic-gtt-cpu-active.html

  * igt@gem_exec_whisper@basic-fds-priority-all:
    - shard-tglb:         [INCOMPLETE][180] -> [PASS][181]
   [180]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-tglb6/igt@gem_exec_whisper@basic-fds-priority-all.html
   [181]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@gem_exec_whisper@basic-fds-priority-all.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [SKIP][182] ([i915#2190]) -> [PASS][183]
   [182]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-tglb7/igt@gem_huc_copy@huc-copy.html
   [183]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@gem_huc_copy@huc-copy.html

  * igt@gem_partial_pwrite_pread@writes-after-reads-uncached:
    - {shard-rkl}:        [SKIP][184] ([i915#3282]) -> [PASS][185] +3 similar issues
   [184]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@gem_partial_pwrite_pread@writes-after-reads-uncached.html
   [185]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@gem_partial_pwrite_pread@writes-after-reads-uncached.html

  * igt@gen9_exec_parse@batch-without-end:
    - {shard-rkl}:        [SKIP][186] ([i915#2527]) -> [PASS][187]
   [186]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@gen9_exec_parse@batch-without-end.html
   [187]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@gen9_exec_parse@batch-without-end.html

  * igt@i915_hangman@engine-engine-error@bcs0:
    - {shard-rkl}:        [SKIP][188] -> [PASS][189] +2 similar issues
   [188]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-5/igt@i915_hangman@engine-engine-error@bcs0.html
   [189]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-2/igt@i915_hangman@engine-engine-error@bcs0.html

  * igt@i915_pm_backlight@basic-brightness:
    - {shard-rkl}:        [SKIP][190] ([i915#3012]) -> [PASS][191]
   [190]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@i915_pm_backlight@basic-brightness.html
   [191]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_pm_dc@dc9-dpms:
    - {shard-rkl}:        [SKIP][192] ([i915#3361]) -> [PASS][193]
   [192]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-5/igt@i915_pm_dc@dc9-dpms.html
   [193]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-2/igt@i915_pm_dc@dc9-dpms.html

  * igt@i915_pm_rpm@fences:
    - {shard-rkl}:        [SKIP][194] ([i915#1849]) -> [PASS][195] +1 similar issue
   [194]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@i915_pm_rpm@fences.html
   [195]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@i915_pm_rpm@fences.html

  * igt@i915_pm_rpm@modeset-lpsp-stress:
    - shard-iclb:         [INCOMPLETE][196] ([i915#5096] / [i915#5420]) -> [PASS][197]
   [196]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb4/igt@i915_pm_rpm@modeset-lpsp-stress.html
   [197]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@i915_pm_rpm@modeset-lpsp-stress.html
    - {shard-rkl}:        [SKIP][198] ([i915#1397]) -> [PASS][199]
   [198]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@i915_pm_rpm@modeset-lpsp-stress.html
   [199]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@i915_pm_rpm@modeset-lpsp-stress.html

  * igt@i915_pm_rps@waitboost:
    - {shard-rkl}:        [FAIL][200] ([i915#4016]) -> [PASS][201]
   [200]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@i915_pm_rps@waitboost.html
   [201]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@i915_pm_rps@waitboost.html

  * igt@i915_selftest@live@hangcheck:
    - shard-tglb:         [DMESG-WARN][202] ([i915#5591]) -> [PASS][203]
   [202]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-tglb8/igt@i915_selftest@live@hangcheck.html
   [203]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb3/igt@i915_selftest@live@hangcheck.html

  * igt@kms_color@pipe-b-degamma:
    - {shard-rkl}:        [SKIP][204] ([i915#1149] / [i915#1849] / [i915#4070] / [i915#4098]) -> [PASS][205]
   [204]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@kms_color@pipe-b-degamma.html
   [205]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_color@pipe-b-degamma.html

  * igt@kms_cursor_crc@pipe-a-cursor-128x42-onscreen:
    - {shard-rkl}:        [SKIP][206] ([fdo#112022] / [i915#4070]) -> [PASS][207] +5 similar issues
   [206]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@kms_cursor_crc@pipe-a-cursor-128x42-onscreen.html
   [207]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_cursor_crc@pipe-a-cursor-128x42-onscreen.html

  * igt@kms_cursor_legacy@cursor-vs-flip-toggle:
    - {shard-rkl}:        [SKIP][208] ([fdo#111825] / [i915#4070]) -> [PASS][209] +4 similar issues
   [208]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@kms_cursor_legacy@cursor-vs-flip-toggle.html
   [209]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_cursor_legacy@cursor-vs-flip-toggle.html

  * igt@kms_cursor_legacy@pipe-c-single-move:
    - {shard-rkl}:        [SKIP][210] ([i915#4070]) -> [PASS][211]
   [210]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@kms_cursor_legacy@pipe-c-single-move.html
   [211]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@kms_cursor_legacy@pipe-c-single-move.html

  * igt@kms_draw_crc@draw-method-rgb565-mmap-gtt-untiled:
    - {shard-rkl}:        [SKIP][212] ([fdo#111314] / [i915#4098] / [i915#4369]) -> [PASS][213] +3 similar issues
   [212]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@kms_draw_crc@draw-method-rgb565-mmap-gtt-untiled.html
   [213]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_draw_crc@draw-method-rgb565-mmap-gtt-untiled.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1:
    - shard-skl:          [FAIL][214] ([i915#79]) -> [PASS][215]
   [214]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-skl7/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html
   [215]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl6/igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1.html

  * igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-downscaling:
    - shard-iclb:         [SKIP][216] ([i915#3701]) -> [PASS][217]
   [216]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb2/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-downscaling.html
   [217]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb6/igt@kms_flip_scaled_crc@flip-32bpp-ytileccs-to-64bpp-ytile-downscaling.html

  * igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-shrfb-draw-mmap-cpu:
    - {shard-rkl}:        [SKIP][218] ([i915#1849] / [i915#4098]) -> [PASS][219] +13 similar issues
   [218]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-shrfb-draw-mmap-cpu.html
   [219]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_frontbuffer_tracking@psr-1p-offscren-pri-shrfb-draw-mmap-cpu.html

  * igt@kms_invalid_mode@uint-max-clock:
    - {shard-rkl}:        [SKIP][220] ([i915#4278]) -> [PASS][221]
   [220]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@kms_invalid_mode@uint-max-clock.html
   [221]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_invalid_mode@uint-max-clock.html

  * igt@kms_plane@plane-panning-top-left@pipe-b-planes:
    - {shard-rkl}:        [SKIP][222] ([i915#3558]) -> [PASS][223] +1 similar issue
   [222]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@kms_plane@plane-panning-top-left@pipe-b-planes.html
   [223]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_plane@plane-panning-top-left@pipe-b-planes.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-7efc:
    - {shard-rkl}:        [SKIP][224] ([i915#1849] / [i915#4070] / [i915#4098]) -> [PASS][225]
   [224]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@kms_plane_alpha_blend@pipe-b-alpha-7efc.html
   [225]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_plane_alpha_blend@pipe-b-alpha-7efc.html

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-x:
    - {shard-rkl}:        [SKIP][226] ([i915#1849] / [i915#3558] / [i915#4070]) -> [PASS][227]
   [226]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@kms_plane_multiple@atomic-pipe-b-tiling-x.html
   [227]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_plane_multiple@atomic-pipe-b-tiling-x.html

  * igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1:
    - shard-iclb:         [SKIP][228] ([i915#5235]) -> [PASS][229] +2 similar issues
   [228]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb2/igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1.html
   [229]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb8/igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1.html

  * igt@kms_psr@cursor_blt:
    - {shard-rkl}:        [SKIP][230] ([i915#1072]) -> [PASS][231] +1 similar issue
   [230]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@kms_psr@cursor_blt.html
   [231]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_psr@cursor_blt.html

  * igt@kms_psr@psr2_sprite_blt:
    - shard-iclb:         [SKIP][232] ([fdo#109441]) -> [PASS][233] +1 similar issue
   [232]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb1/igt@kms_psr@psr2_sprite_blt.html
   [233]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb2/igt@kms_psr@psr2_sprite_blt.html

  * igt@kms_psr_stress_test@flip-primary-invalidate-overlay:
    - shard-tglb:         [SKIP][234] ([i915#5519]) -> [PASS][235]
   [234]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-tglb6/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html
   [235]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb1/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html
    - shard-iclb:         [SKIP][236] ([i915#5519]) -> [PASS][237]
   [236]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb3/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html
   [237]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb2/igt@kms_psr_stress_test@flip-primary-invalidate-overlay.html

  * igt@kms_vblank@pipe-a-ts-continuation-dpms-suspend:
    - {shard-rkl}:        [SKIP][238] ([i915#1845] / [i915#4098]) -> [PASS][239] +16 similar issues
   [238]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-1/igt@kms_vblank@pipe-a-ts-continuation-dpms-suspend.html
   [239]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-6/igt@kms_vblank@pipe-a-ts-continuation-dpms-suspend.html

  * igt@perf@polling-parameterized:
    - {shard-rkl}:        [FAIL][240] ([i915#5639]) -> [PASS][241]
   [240]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-rkl-2/igt@perf@polling-parameterized.html
   [241]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-rkl-5/igt@perf@polling-parameterized.html

  
#### Warnings ####

  * igt@gem_eio@unwedge-stress:
    - shard-tglb:         [TIMEOUT][242] ([i915#3063]) -> [FAIL][243] ([i915#5784])
   [242]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-tglb7/igt@gem_eio@unwedge-stress.html
   [243]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-tglb5/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_fair@basic-none-rrul@rcs0:
    - shard-iclb:         [FAIL][244] ([i915#2842]) -> [FAIL][245] ([i915#2852])
   [244]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb4/igt@gem_exec_fair@basic-none-rrul@rcs0.html
   [245]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb1/igt@gem_exec_fair@basic-none-rrul@rcs0.html

  * igt@i915_pm_dc@dc3co-vpb-simulation:
    - shard-iclb:         [SKIP][246] ([i915#658]) -> [SKIP][247] ([i915#588])
   [246]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb1/igt@i915_pm_dc@dc3co-vpb-simulation.html
   [247]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb2/igt@i915_pm_dc@dc3co-vpb-simulation.html

  * igt@kms_big_fb@x-tiled-addfb-size-offset-overflow:
    - shard-skl:          [SKIP][248] ([fdo#109271] / [i915#1888]) -> [SKIP][249] ([fdo#109271])
   [248]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-skl10/igt@kms_big_fb@x-tiled-addfb-size-offset-overflow.html
   [249]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-skl6/igt@kms_big_fb@x-tiled-addfb-size-offset-overflow.html

  * igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-sf:
    - shard-iclb:         [SKIP][250] ([i915#2920]) -> [SKIP][251] ([i915#658])
   [250]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb2/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-sf.html
   [251]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb6/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-sf.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area:
    - shard-iclb:         [SKIP][252] ([i915#2920]) -> [SKIP][253] ([fdo#111068] / [i915#658])
   [252]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb2/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area.html
   [253]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb6/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area.html

  * igt@kms_psr2_sf@plane-move-sf-dmg-area:
    - shard-iclb:         [SKIP][254] ([fdo#111068] / [i915#658]) -> [SKIP][255] ([i915#2920])
   [254]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-iclb1/igt@kms_psr2_sf@plane-move-sf-dmg-area.html
   [255]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-iclb2/igt@kms_psr2_sf@plane-move-sf-dmg-area.html

  * igt@runner@aborted:
    - shard-apl:          ([FAIL][256], [FAIL][257], [FAIL][258], [FAIL][259], [FAIL][260]) ([i915#180] / [i915#3002] / [i915#4312] / [i915#5257]) -> ([FAIL][261], [FAIL][262], [FAIL][263], [FAIL][264], [FAIL][265]) ([i915#3002] / [i915#4312] / [i915#5257])
   [256]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-apl3/igt@runner@aborted.html
   [257]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-apl6/igt@runner@aborted.html
   [258]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-apl4/igt@runner@aborted.html
   [259]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-apl4/igt@runner@aborted.html
   [260]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11761/shard-apl3/igt@runner@aborted.html
   [261]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl2/igt@runner@aborted.html
   [262]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl1/igt@runner@aborted.html
   [263]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl8/igt@runner@aborted.html
   [264]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl4/igt@runner@aborted.html
   [265]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/shard-apl4/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [IGT#2]: https://gitlab.freedesktop.org/drm/igt-gpu-tools/issues/2
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109279]: https://bugs.freedesktop.org/show_bug.cgi?id=109279
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109283]: https://bugs.freedesktop.org/show_bug.cgi?id=109283
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109291]: https://bugs.freedesktop.org/show_bug.cgi?id=109291
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308
  [fdo#109312]: https://bugs.freedesktop.org/show_bug.cgi?id=109312
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109506]: https://bugs.freedesktop.org/show_bug.cgi?id=109506
  [fdo#110542]: https://bugs.freedesktop.org/show_bug.cgi?id=110542
  [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723
  [fdo#110725]: https://bugs.freedesktop.org/show_bug.cgi?id=110725
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111314]: https://bugs.freedesktop.org/show_bug.cgi?id=111314
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111656]: https://bugs.freedesktop.org/show_bug.cgi?id=111656
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [fdo#112022]: https://bugs.freedesktop.org/show_bug.cgi?id=112022
  [i915#1063]: https://gitlab.freedesktop.org/drm/intel/issues/1063
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1149]: https://gitlab.freedesktop.org/drm/intel/issues/1149
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1836]: https://gitlab.freedesktop.org/drm/intel/issues/1836
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1911]: https://gitlab.freedesktop.org/drm/intel/issues/1911
  [i915#2029]: https://gitlab.freedesktop.org/drm/intel/issues/2029
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2530]: https://gitlab.freedesktop.org/drm/intel/issues/2530
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#2658]: https://gitlab.freedesktop.org/drm/intel/issues/2658
  [i915#2705]: https://gitlab.freedesktop.org/drm/intel/issues/2705
  [i915#284]: https://gitlab.freedesktop.org/drm/intel/issues/284
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2852]: https://gitlab.freedesktop.org/drm/intel/issues/2852
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994
  [i915#3002]: https://gitlab.freedesktop.org/drm/intel/issues/3002
  [i915#3012]: https://gitlab.freedesktop.org/drm/intel/issues/3012
  [i915#3063]: https://gitlab.freedesktop.org/drm/intel/issues/3063
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3318]: https://gitlab.freedesktop.org/drm/intel/issues/3318
  [i915#3319]: https://gitlab.freedesktop.org/drm/intel/issues/3319
  [i915#3323]: https://gitlab.freedesktop.org/drm/intel/issues/3323
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361
  [i915#3376]: https://gitlab.freedesktop.org/drm/intel/issues/3376
  [i915#3467]: https://gitlab.freedesktop.org/drm/intel/issues/3467
  [i915#3536]: https://gitlab.freedesktop.org/drm/intel/issues/3536
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3558]: https://gitlab.freedesktop.org/drm/intel/issues/3558
  [i915#3614]: https://gitlab.freedesktop.org/drm/intel/issues/3614
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3701]: https://gitlab.freedesktop.org/drm/intel/issues/3701
  [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734
  [i915#3743]: https://gitlab.freedesktop.org/drm/intel/issues/3743
  [i915#3810]: https://gitlab.freedesktop.org/drm/intel/issues/3810
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#4016]: https://gitlab.freedesktop.org/drm/intel/issues/4016
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4171]: https://gitlab.freedesktop.org/drm/intel/issues/4171
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4278]: https://gitlab.freedesktop.org/drm/intel/issues/4278
  [i915#4281]: https://gitlab.freedesktop.org/drm/intel/issues/4281
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4369]: https://gitlab.freedesktop.org/drm/intel/issues/4369
  [i915#4392]: https://gitlab.freedesktop.org/drm/intel/issues/4392
  [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4817]: https://gitlab.freedesktop.org/drm/intel/issues/4817
  [i915#4877]: https://gitlab.freedesktop.org/drm/intel/issues/4877
  [i915#4893]: https://gitlab.freedesktop.org/drm/intel/issues/4893
  [i915#4911]: https://gitlab.freedesktop.org/drm/intel/issues/4911
  [i915#5096]: https://gitlab.freedesktop.org/drm/intel/issues/5096
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5257]: https://gitlab.freedesktop.org/drm/intel/issues/5257
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5287]: https://gitlab.freedesktop.org/drm/intel/issues/5287
  [i915#5288]: https://gitlab.freedesktop.org/drm/intel/issues/5288
  [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5420]: https://gitlab.freedesktop.org/drm/intel/issues/5420
  [i915#5461]: https://gitlab.freedesktop.org/drm/intel/issues/5461
  [i915#5519]: https://gitlab.freedesktop.org/drm/intel/issues/5519
  [i915#5591]: https://gitlab.freedesktop.org/drm/intel/issues/5591
  [i915#5639]: https://gitlab.freedesktop.org/drm/intel/issues/5639
  [i915#5784]: https://gitlab.freedesktop.org/drm/intel/issues/5784
  [i915#588]: https://gitlab.freedesktop.org/drm/intel/issues/588
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6140]: https://gitlab.freedesktop.org/drm/intel/issues/6140
  [i915#6141]: https://gitlab.freedesktop.org/drm/intel/issues/6141
  [i915#6227]: https://gitlab.freedesktop.org/drm/intel/issues/6227
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79


Build changes
-------------

  * Linux: CI_DRM_11761 -> Patchwork_105167v1

  CI-20190529: 20190529
  CI_DRM_11761: ec90bfe7e13f9a75f02b7f409c8f23911e551b9f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6529: b96bf5a0307fc0bdbf6c8e86872817306e102883 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_105167v1: ec90bfe7e13f9a75f02b7f409c8f23911e551b9f @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_105167v1/index.html

[-- Attachment #2: Type: text/html, Size: 70379 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-16  7:21     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Jason Ekstrand, John Harrison, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost,
	Ramalingam C, Rodrigo Vivi, dri-devel, intel-gfx, linux-kernel,
	mauro.chehab, stable


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> As an extension of the current skip TLB invalidations,
> check if the device is powered down prior to any engine activity,
> 
> as, on such cases, all the TLBs were already invalidated, so an
> explicit TLB invalidation is not needed.
> 
> This becomes more significant  with GuC, as it can only do so when
> the connection to the GuC is awake.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Hmmm is this a fix or "an extension" as the commit text mentions both 
options?! GuC angle does not appear relevant for upstream yet so is cc: 
stable really required is the question.

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++++----
>   drivers/gpu/drm/i915/gt/intel_gt.c        | 26 +++++++++++++++++------
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
>   3 files changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 97c820eee115..6835279943df 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -6,14 +6,15 @@
>   
>   #include <drm/drm_cache.h>
>   
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +
>   #include "i915_drv.h"
>   #include "i915_gem_object.h"
>   #include "i915_scatterlist.h"
>   #include "i915_gem_lmem.h"
>   #include "i915_gem_mman.h"
>   
> -#include "gt/intel_gt.h"
> -
>   void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>   				 struct sg_table *pages,
>   				 unsigned int sg_page_sizes)
> @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>   
>   	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
>   		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +		struct intel_gt *gt = to_gt(i915);
>   		intel_wakeref_t wakeref;
>   
> -		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
> -			intel_gt_invalidate_tlbs(to_gt(i915));
> +		with_intel_gt_pm_if_awake(gt, wakeref)
> +			intel_gt_invalidate_tlbs(gt);
>   	}
>   
>   	return pages;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index f33290358c51..d5ed6a6ac67c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -11,6 +11,7 @@
>   
>   #include "i915_drv.h"
>   #include "intel_context.h"
> +#include "intel_engine_pm.h"
>   #include "intel_engine_regs.h"
>   #include "intel_gt.h"
>   #include "intel_gt_buffer_pool.h"
> @@ -1216,6 +1217,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	struct drm_i915_private *i915 = gt->i915;
>   	struct intel_uncore *uncore = gt->uncore;
>   	struct intel_engine_cs *engine;
> +	intel_engine_mask_t awake, tmp;
>   	enum intel_engine_id id;
>   	const i915_reg_t *regs;
>   	unsigned int num = 0;
> @@ -1239,12 +1241,27 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   
>   	GEM_TRACE("\n");
>   
> -	assert_rpm_wakelock_held(&i915->runtime_pm);
> -
>   	mutex_lock(&gt->tlb_invalidate_lock);
>   	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
> +	awake = 0;
>   	for_each_engine(engine, gt, id) {
> +		struct reg_and_bit rb;
> +
> +		if (!intel_engine_pm_is_awake(engine))
> +			continue;
> +
> +		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> +		if (!i915_mmio_reg_offset(rb.reg))
> +			continue;
> +
> +		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> +		awake |= engine->mask;
> +	}
> +
> +	for_each_engine_masked(engine, gt, awake, tmp) {
> +		struct reg_and_bit rb;
> +
>   		/*
>   		 * HW architecture suggest typical invalidation time at 40us,
>   		 * with pessimistic cases up to 100us and a recommendation to
> @@ -1252,13 +1269,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   		 */
>   		const unsigned int timeout_us = 100;
>   		const unsigned int timeout_ms = 4;
> -		struct reg_and_bit rb;
>   
>   		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> -		if (!i915_mmio_reg_offset(rb.reg))
> -			continue;
> -
> -		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>   		if (__intel_wait_for_register_fw(uncore,
>   						 rb.reg, rb.bit, 0,
>   						 timeout_us, timeout_ms,
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> index bc898df7a48c..a334787a4939 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
>   	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
>   	     intel_gt_pm_put(gt), tmp = 0)
>   
> +#define with_intel_gt_pm_if_awake(gt, wf) \
> +	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
> +
>   static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
>   {
>   	return intel_wakeref_wait_for_idle(&gt->wakeref);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
@ 2022-06-16  7:21     ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Ekstrand, David Airlie, dri-devel, Daniele Ceraolo Spurio,
	Fei Yang, Matthew Brost, Chris Wilson, Matthew Auld, Andi Shyti,
	Dave Airlie, Thomas Hellström, Lucas De Marchi, intel-gfx,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable, John Harrison


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> As an extension of the current skip TLB invalidations,
> check if the device is powered down prior to any engine activity,
> 
> as, on such cases, all the TLBs were already invalidated, so an
> explicit TLB invalidation is not needed.
> 
> This becomes more significant  with GuC, as it can only do so when
> the connection to the GuC is awake.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Hmmm is this a fix or "an extension" as the commit text mentions both 
options?! GuC angle does not appear relevant for upstream yet so is cc: 
stable really required is the question.

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++++----
>   drivers/gpu/drm/i915/gt/intel_gt.c        | 26 +++++++++++++++++------
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
>   3 files changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 97c820eee115..6835279943df 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -6,14 +6,15 @@
>   
>   #include <drm/drm_cache.h>
>   
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +
>   #include "i915_drv.h"
>   #include "i915_gem_object.h"
>   #include "i915_scatterlist.h"
>   #include "i915_gem_lmem.h"
>   #include "i915_gem_mman.h"
>   
> -#include "gt/intel_gt.h"
> -
>   void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>   				 struct sg_table *pages,
>   				 unsigned int sg_page_sizes)
> @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>   
>   	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
>   		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +		struct intel_gt *gt = to_gt(i915);
>   		intel_wakeref_t wakeref;
>   
> -		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
> -			intel_gt_invalidate_tlbs(to_gt(i915));
> +		with_intel_gt_pm_if_awake(gt, wakeref)
> +			intel_gt_invalidate_tlbs(gt);
>   	}
>   
>   	return pages;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index f33290358c51..d5ed6a6ac67c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -11,6 +11,7 @@
>   
>   #include "i915_drv.h"
>   #include "intel_context.h"
> +#include "intel_engine_pm.h"
>   #include "intel_engine_regs.h"
>   #include "intel_gt.h"
>   #include "intel_gt_buffer_pool.h"
> @@ -1216,6 +1217,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	struct drm_i915_private *i915 = gt->i915;
>   	struct intel_uncore *uncore = gt->uncore;
>   	struct intel_engine_cs *engine;
> +	intel_engine_mask_t awake, tmp;
>   	enum intel_engine_id id;
>   	const i915_reg_t *regs;
>   	unsigned int num = 0;
> @@ -1239,12 +1241,27 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   
>   	GEM_TRACE("\n");
>   
> -	assert_rpm_wakelock_held(&i915->runtime_pm);
> -
>   	mutex_lock(&gt->tlb_invalidate_lock);
>   	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
> +	awake = 0;
>   	for_each_engine(engine, gt, id) {
> +		struct reg_and_bit rb;
> +
> +		if (!intel_engine_pm_is_awake(engine))
> +			continue;
> +
> +		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> +		if (!i915_mmio_reg_offset(rb.reg))
> +			continue;
> +
> +		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> +		awake |= engine->mask;
> +	}
> +
> +	for_each_engine_masked(engine, gt, awake, tmp) {
> +		struct reg_and_bit rb;
> +
>   		/*
>   		 * HW architecture suggest typical invalidation time at 40us,
>   		 * with pessimistic cases up to 100us and a recommendation to
> @@ -1252,13 +1269,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   		 */
>   		const unsigned int timeout_us = 100;
>   		const unsigned int timeout_ms = 4;
> -		struct reg_and_bit rb;
>   
>   		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> -		if (!i915_mmio_reg_offset(rb.reg))
> -			continue;
> -
> -		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>   		if (__intel_wait_for_register_fw(uncore,
>   						 rb.reg, rb.bit, 0,
>   						 timeout_us, timeout_ms,
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> index bc898df7a48c..a334787a4939 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
>   	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
>   	     intel_gt_pm_put(gt), tmp = 0)
>   
> +#define with_intel_gt_pm_if_awake(gt, wf) \
> +	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
> +
>   static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
>   {
>   	return intel_wakeref_wait_for_idle(&gt->wakeref);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
@ 2022-06-16  7:21     ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, Lucas De Marchi, intel-gfx,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> As an extension of the current skip TLB invalidations,
> check if the device is powered down prior to any engine activity,
> 
> as, on such cases, all the TLBs were already invalidated, so an
> explicit TLB invalidation is not needed.
> 
> This becomes more significant  with GuC, as it can only do so when
> the connection to the GuC is awake.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Hmmm is this a fix or "an extension" as the commit text mentions both 
options?! GuC angle does not appear relevant for upstream yet so is cc: 
stable really required is the question.

Regards,

Tvrtko

> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 +++++----
>   drivers/gpu/drm/i915/gt/intel_gt.c        | 26 +++++++++++++++++------
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
>   3 files changed, 28 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 97c820eee115..6835279943df 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -6,14 +6,15 @@
>   
>   #include <drm/drm_cache.h>
>   
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +
>   #include "i915_drv.h"
>   #include "i915_gem_object.h"
>   #include "i915_scatterlist.h"
>   #include "i915_gem_lmem.h"
>   #include "i915_gem_mman.h"
>   
> -#include "gt/intel_gt.h"
> -
>   void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>   				 struct sg_table *pages,
>   				 unsigned int sg_page_sizes)
> @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>   
>   	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
>   		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +		struct intel_gt *gt = to_gt(i915);
>   		intel_wakeref_t wakeref;
>   
> -		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
> -			intel_gt_invalidate_tlbs(to_gt(i915));
> +		with_intel_gt_pm_if_awake(gt, wakeref)
> +			intel_gt_invalidate_tlbs(gt);
>   	}
>   
>   	return pages;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index f33290358c51..d5ed6a6ac67c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -11,6 +11,7 @@
>   
>   #include "i915_drv.h"
>   #include "intel_context.h"
> +#include "intel_engine_pm.h"
>   #include "intel_engine_regs.h"
>   #include "intel_gt.h"
>   #include "intel_gt_buffer_pool.h"
> @@ -1216,6 +1217,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	struct drm_i915_private *i915 = gt->i915;
>   	struct intel_uncore *uncore = gt->uncore;
>   	struct intel_engine_cs *engine;
> +	intel_engine_mask_t awake, tmp;
>   	enum intel_engine_id id;
>   	const i915_reg_t *regs;
>   	unsigned int num = 0;
> @@ -1239,12 +1241,27 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   
>   	GEM_TRACE("\n");
>   
> -	assert_rpm_wakelock_held(&i915->runtime_pm);
> -
>   	mutex_lock(&gt->tlb_invalidate_lock);
>   	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
> +	awake = 0;
>   	for_each_engine(engine, gt, id) {
> +		struct reg_and_bit rb;
> +
> +		if (!intel_engine_pm_is_awake(engine))
> +			continue;
> +
> +		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> +		if (!i915_mmio_reg_offset(rb.reg))
> +			continue;
> +
> +		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> +		awake |= engine->mask;
> +	}
> +
> +	for_each_engine_masked(engine, gt, awake, tmp) {
> +		struct reg_and_bit rb;
> +
>   		/*
>   		 * HW architecture suggest typical invalidation time at 40us,
>   		 * with pessimistic cases up to 100us and a recommendation to
> @@ -1252,13 +1269,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   		 */
>   		const unsigned int timeout_us = 100;
>   		const unsigned int timeout_ms = 4;
> -		struct reg_and_bit rb;
>   
>   		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> -		if (!i915_mmio_reg_offset(rb.reg))
> -			continue;
> -
> -		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>   		if (__intel_wait_for_register_fw(uncore,
>   						 rb.reg, rb.bit, 0,
>   						 timeout_us, timeout_ms,
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> index bc898df7a48c..a334787a4939 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
>   	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
>   	     intel_gt_pm_put(gt), tmp = 0)
>   
> +#define with_intel_gt_pm_if_awake(gt, wf) \
> +	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
> +
>   static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
>   {
>   	return intel_wakeref_wait_for_idle(&gt->wakeref);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-16  7:25     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Matthew Auld, Rodrigo Vivi,
	dri-devel, intel-gfx, linux-kernel, mauro.chehab, stable,
	Thomas Hellström


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Are there any real problems fixed or it's just a logical thing to do? 
Not much harm tagging it as fixes in terms of process since it is tiny 
but, again, wanting a clear picture.

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 61b7ec5118f9..fb4fd5273ca4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>   		return;
>   
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
>   	if (GRAPHICS_VER(i915) == 12) {
>   		regs = gen12_regs;
>   		num = ARRAY_SIZE(gen12_regs);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
@ 2022-06-16  7:25     ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang,
	Chris Wilson, Matthew Auld, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable, John Harrison


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Are there any real problems fixed or it's just a logical thing to do? 
Not much harm tagging it as fixes in terms of process since it is tiny 
but, again, wanting a clear picture.

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 61b7ec5118f9..fb4fd5273ca4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>   		return;
>   
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
>   	if (GRAPHICS_VER(i915) == 12) {
>   		regs = gen12_regs;
>   		num = ARRAY_SIZE(gen12_regs);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
@ 2022-06-16  7:25     ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Are there any real problems fixed or it's just a logical thing to do? 
Not much harm tagging it as fixes in terms of process since it is tiny 
but, again, wanting a clear picture.

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 61b7ec5118f9..fb4fd5273ca4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>   		return;
>   
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
>   	if (GRAPHICS_VER(i915) == 12) {
>   		regs = gen12_regs;
>   		num = ARRAY_SIZE(gen12_regs);

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-16  7:33     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:33 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Thomas Hellstrom, Daniel Vetter,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, dri-devel, intel-gfx, linux-kernel, mauro.chehab,
	Andi Shyti, stable, Thomas Hellström


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't flush TLBs when the buffer is only used in the GGTT under full
> control of the kernel, as there's no risk of of concurrent access
> and stale access from prefetch.
> 
> We only need to invalidate the TLB if they are accessible by the user.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Same question as against the other patch - fix or optimisation?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/i915_vma.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 0bffb70b3c5f..7989986161e8 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -537,7 +537,8 @@ int i915_vma_bind(struct i915_vma *vma,
>   				   bind_flags);
>   	}
>   
> -	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> +	if (bind_flags & I915_VMA_LOCAL_BIND)
> +		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
>   
>   	atomic_or(bind_flags, &vma->flags);
>   	return 0;

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
@ 2022-06-16  7:33     ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:33 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Thomas Hellström, mauro.chehab, Fei Yang, David Airlie,
	dri-devel, linux-kernel, Chris Wilson, Thomas Hellstrom,
	Rodrigo Vivi, Andi Shyti, Dave Airlie, stable, intel-gfx


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't flush TLBs when the buffer is only used in the GGTT under full
> control of the kernel, as there's no risk of of concurrent access
> and stale access from prefetch.
> 
> We only need to invalidate the TLB if they are accessible by the user.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Same question as against the other patch - fix or optimisation?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/i915_vma.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 0bffb70b3c5f..7989986161e8 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -537,7 +537,8 @@ int i915_vma_bind(struct i915_vma *vma,
>   				   bind_flags);
>   	}
>   
> -	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> +	if (bind_flags & I915_VMA_LOCAL_BIND)
> +		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
>   
>   	atomic_or(bind_flags, &vma->flags);
>   	return 0;

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
@ 2022-06-16  7:33     ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:33 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Thomas Hellström, mauro.chehab, David Airlie, dri-devel,
	linux-kernel, Chris Wilson, Thomas Hellstrom, Rodrigo Vivi,
	Dave Airlie, stable, intel-gfx


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't flush TLBs when the buffer is only used in the GGTT under full
> control of the kernel, as there's no risk of of concurrent access
> and stale access from prefetch.
> 
> We only need to invalidate the TLB if they are accessible by the user.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Same question as against the other patch - fix or optimisation?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/i915_vma.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 0bffb70b3c5f..7989986161e8 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -537,7 +537,8 @@ int i915_vma_bind(struct i915_vma *vma,
>   				   bind_flags);
>   	}
>   
> -	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> +	if (bind_flags & I915_VMA_LOCAL_BIND)
> +		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
>   
>   	atomic_or(bind_flags, &vma->flags);
>   	return 0;

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-16  7:35     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:35 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Thomas Hellstrom, Bruce Chang,
	Daniel Vetter, Dave Airlie, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Matt Roper, Matthew Brost,
	Rodrigo Vivi, Tejas Upadhyay, Umesh Nerlige Ramappa, dri-devel,
	intel-gfx, linux-kernel, mauro.chehab, Mika Kuoppala,
	Chris Wilson, Andi Shyti, stable, Thomas Hellström


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't allow two engines to be reset in parallel, as they would both
> try to select a reset bit (and send requests to common registers)
> and wait on that register, at the same time. Serialize control of
> the reset requests/acks using the uncore->lock, which will also ensure
> that no other GT state changes at the same time as the actual reset.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Ah okay I get it, the fixes tag was applied indiscriminately to the 
whole series. :) It definitely does not belong in this patch.

Otherwise LGTM:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++-------
>   1 file changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index a5338c3fde7a..c68d36fb5bbd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>   	return err;
>   }
>   
> -static int gen6_reset_engines(struct intel_gt *gt,
> -			      intel_engine_mask_t engine_mask,
> -			      unsigned int retry)
> +static int __gen6_reset_engines(struct intel_gt *gt,
> +				intel_engine_mask_t engine_mask,
> +				unsigned int retry)
>   {
>   	struct intel_engine_cs *engine;
>   	u32 hw_mask;
> @@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt,
>   	return gen6_hw_domain_reset(gt, hw_mask);
>   }
>   
> +static int gen6_reset_engines(struct intel_gt *gt,
> +			      intel_engine_mask_t engine_mask,
> +			      unsigned int retry)
> +{
> +	unsigned long flags;
> +	int ret;
> +
> +	spin_lock_irqsave(&gt->uncore->lock, flags);
> +	ret = __gen6_reset_engines(gt, engine_mask, retry);
> +	spin_unlock_irqrestore(&gt->uncore->lock, flags);
> +
> +	return ret;
> +}
> +
>   static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine)
>   {
>   	int vecs_id;
> @@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine)
>   	rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit);
>   }
>   
> -static int gen11_reset_engines(struct intel_gt *gt,
> -			       intel_engine_mask_t engine_mask,
> -			       unsigned int retry)
> +static int __gen11_reset_engines(struct intel_gt *gt,
> +				 intel_engine_mask_t engine_mask,
> +				 unsigned int retry)
>   {
>   	struct intel_engine_cs *engine;
>   	intel_engine_mask_t tmp;
> @@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt,
>   	struct intel_engine_cs *engine;
>   	const bool reset_non_ready = retry >= 1;
>   	intel_engine_mask_t tmp;
> +	unsigned long flags;
>   	int ret;
>   
> +	spin_lock_irqsave(&gt->uncore->lock, flags);
> +
>   	for_each_engine_masked(engine, gt, engine_mask, tmp) {
>   		ret = gen8_engine_reset_prepare(engine);
>   		if (ret && !reset_non_ready)
> @@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt,
>   	 * This is best effort, so ignore any error from the initial reset.
>   	 */
>   	if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES)
> -		gen11_reset_engines(gt, gt->info.engine_mask, 0);
> +		__gen11_reset_engines(gt, gt->info.engine_mask, 0);
>   
>   	if (GRAPHICS_VER(gt->i915) >= 11)
> -		ret = gen11_reset_engines(gt, engine_mask, retry);
> +		ret = __gen11_reset_engines(gt, engine_mask, retry);
>   	else
> -		ret = gen6_reset_engines(gt, engine_mask, retry);
> +		ret = __gen6_reset_engines(gt, engine_mask, retry);
>   
>   skip_reset:
>   	for_each_engine_masked(engine, gt, engine_mask, tmp)
>   		gen8_engine_reset_cancel(engine);
>   
> +	spin_unlock_irqrestore(&gt->uncore->lock, flags);
> +
>   	return ret;
>   }
>   

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-16  7:35     ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:35 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Fei Yang, Matthew Brost,
	Mika Kuoppala, Chris Wilson, Dave Airlie, Thomas Hellström,
	Andi Shyti, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	mauro.chehab, linux-kernel, stable, Bruce Chang, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't allow two engines to be reset in parallel, as they would both
> try to select a reset bit (and send requests to common registers)
> and wait on that register, at the same time. Serialize control of
> the reset requests/acks using the uncore->lock, which will also ensure
> that no other GT state changes at the same time as the actual reset.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Ah okay I get it, the fixes tag was applied indiscriminately to the 
whole series. :) It definitely does not belong in this patch.

Otherwise LGTM:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++-------
>   1 file changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index a5338c3fde7a..c68d36fb5bbd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>   	return err;
>   }
>   
> -static int gen6_reset_engines(struct intel_gt *gt,
> -			      intel_engine_mask_t engine_mask,
> -			      unsigned int retry)
> +static int __gen6_reset_engines(struct intel_gt *gt,
> +				intel_engine_mask_t engine_mask,
> +				unsigned int retry)
>   {
>   	struct intel_engine_cs *engine;
>   	u32 hw_mask;
> @@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt,
>   	return gen6_hw_domain_reset(gt, hw_mask);
>   }
>   
> +static int gen6_reset_engines(struct intel_gt *gt,
> +			      intel_engine_mask_t engine_mask,
> +			      unsigned int retry)
> +{
> +	unsigned long flags;
> +	int ret;
> +
> +	spin_lock_irqsave(&gt->uncore->lock, flags);
> +	ret = __gen6_reset_engines(gt, engine_mask, retry);
> +	spin_unlock_irqrestore(&gt->uncore->lock, flags);
> +
> +	return ret;
> +}
> +
>   static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine)
>   {
>   	int vecs_id;
> @@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine)
>   	rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit);
>   }
>   
> -static int gen11_reset_engines(struct intel_gt *gt,
> -			       intel_engine_mask_t engine_mask,
> -			       unsigned int retry)
> +static int __gen11_reset_engines(struct intel_gt *gt,
> +				 intel_engine_mask_t engine_mask,
> +				 unsigned int retry)
>   {
>   	struct intel_engine_cs *engine;
>   	intel_engine_mask_t tmp;
> @@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt,
>   	struct intel_engine_cs *engine;
>   	const bool reset_non_ready = retry >= 1;
>   	intel_engine_mask_t tmp;
> +	unsigned long flags;
>   	int ret;
>   
> +	spin_lock_irqsave(&gt->uncore->lock, flags);
> +
>   	for_each_engine_masked(engine, gt, engine_mask, tmp) {
>   		ret = gen8_engine_reset_prepare(engine);
>   		if (ret && !reset_non_ready)
> @@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt,
>   	 * This is best effort, so ignore any error from the initial reset.
>   	 */
>   	if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES)
> -		gen11_reset_engines(gt, gt->info.engine_mask, 0);
> +		__gen11_reset_engines(gt, gt->info.engine_mask, 0);
>   
>   	if (GRAPHICS_VER(gt->i915) >= 11)
> -		ret = gen11_reset_engines(gt, engine_mask, retry);
> +		ret = __gen11_reset_engines(gt, engine_mask, retry);
>   	else
> -		ret = gen6_reset_engines(gt, engine_mask, retry);
> +		ret = __gen6_reset_engines(gt, engine_mask, retry);
>   
>   skip_reset:
>   	for_each_engine_masked(engine, gt, engine_mask, tmp)
>   		gen8_engine_reset_cancel(engine);
>   
> +	spin_unlock_irqrestore(&gt->uncore->lock, flags);
> +
>   	return ret;
>   }
>   

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-16  7:35     ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-16  7:35 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Chris Wilson, Dave Airlie,
	Thomas Hellström, Andi Shyti, intel-gfx, Thomas Hellstrom,
	Rodrigo Vivi, mauro.chehab, linux-kernel, stable, Tejas Upadhyay


On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't allow two engines to be reset in parallel, as they would both
> try to select a reset bit (and send requests to common registers)
> and wait on that register, at the same time. Serialize control of
> the reset requests/acks using the uncore->lock, which will also ensure
> that no other GT state changes at the same time as the actual reset.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Ah okay I get it, the fixes tag was applied indiscriminately to the 
whole series. :) It definitely does not belong in this patch.

Otherwise LGTM:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++-------
>   1 file changed, 28 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index a5338c3fde7a..c68d36fb5bbd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask)
>   	return err;
>   }
>   
> -static int gen6_reset_engines(struct intel_gt *gt,
> -			      intel_engine_mask_t engine_mask,
> -			      unsigned int retry)
> +static int __gen6_reset_engines(struct intel_gt *gt,
> +				intel_engine_mask_t engine_mask,
> +				unsigned int retry)
>   {
>   	struct intel_engine_cs *engine;
>   	u32 hw_mask;
> @@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt,
>   	return gen6_hw_domain_reset(gt, hw_mask);
>   }
>   
> +static int gen6_reset_engines(struct intel_gt *gt,
> +			      intel_engine_mask_t engine_mask,
> +			      unsigned int retry)
> +{
> +	unsigned long flags;
> +	int ret;
> +
> +	spin_lock_irqsave(&gt->uncore->lock, flags);
> +	ret = __gen6_reset_engines(gt, engine_mask, retry);
> +	spin_unlock_irqrestore(&gt->uncore->lock, flags);
> +
> +	return ret;
> +}
> +
>   static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine)
>   {
>   	int vecs_id;
> @@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine)
>   	rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit);
>   }
>   
> -static int gen11_reset_engines(struct intel_gt *gt,
> -			       intel_engine_mask_t engine_mask,
> -			       unsigned int retry)
> +static int __gen11_reset_engines(struct intel_gt *gt,
> +				 intel_engine_mask_t engine_mask,
> +				 unsigned int retry)
>   {
>   	struct intel_engine_cs *engine;
>   	intel_engine_mask_t tmp;
> @@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt,
>   	struct intel_engine_cs *engine;
>   	const bool reset_non_ready = retry >= 1;
>   	intel_engine_mask_t tmp;
> +	unsigned long flags;
>   	int ret;
>   
> +	spin_lock_irqsave(&gt->uncore->lock, flags);
> +
>   	for_each_engine_masked(engine, gt, engine_mask, tmp) {
>   		ret = gen8_engine_reset_prepare(engine);
>   		if (ret && !reset_non_ready)
> @@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt,
>   	 * This is best effort, so ignore any error from the initial reset.
>   	 */
>   	if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES)
> -		gen11_reset_engines(gt, gt->info.engine_mask, 0);
> +		__gen11_reset_engines(gt, gt->info.engine_mask, 0);
>   
>   	if (GRAPHICS_VER(gt->i915) >= 11)
> -		ret = gen11_reset_engines(gt, engine_mask, retry);
> +		ret = __gen11_reset_engines(gt, engine_mask, retry);
>   	else
> -		ret = gen6_reset_engines(gt, engine_mask, retry);
> +		ret = __gen6_reset_engines(gt, engine_mask, retry);
>   
>   skip_reset:
>   	for_each_engine_masked(engine, gt, engine_mask, tmp)
>   		gen8_engine_reset_cancel(engine);
>   
> +	spin_unlock_irqrestore(&gt->uncore->lock, flags);
> +
>   	return ret;
>   }
>   

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-23 11:04     ` Andi Shyti
  -1 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Jason Ekstrand, John Harrison, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost,
	Ramalingam C, Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx,
	linux-kernel, mauro.chehab, stable

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:35PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> As an extension of the current skip TLB invalidations,
> check if the device is powered down prior to any engine activity,
> 
> as, on such cases, all the TLBs were already invalidated, so an
> explicit TLB invalidation is not needed.
> 
> This becomes more significant  with GuC, as it can only do so when
> the connection to the GuC is awake.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
@ 2022-06-23 11:04     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Jason Ekstrand, David Airlie, dri-devel, Daniele Ceraolo Spurio,
	Fei Yang, Matthew Brost, Chris Wilson, Matthew Auld, Andi Shyti,
	Dave Airlie, Thomas Hellström, Lucas De Marchi, intel-gfx,
	Thomas Hellstrom, Rodrigo Vivi, Tvrtko Ursulin, mauro.chehab,
	Michał Winiarski, linux-kernel, stable, John Harrison

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:35PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> As an extension of the current skip TLB invalidations,
> check if the device is powered down prior to any engine activity,
> 
> as, on such cases, all the TLBs were already invalidated, so an
> explicit TLB invalidation is not needed.
> 
> This becomes more significant  with GuC, as it can only do so when
> the connection to the GuC is awake.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
@ 2022-06-23 11:04     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:04 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, Lucas De Marchi, intel-gfx,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:35PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> As an extension of the current skip TLB invalidations,
> check if the device is powered down prior to any engine activity,
> 
> as, on such cases, all the TLBs were already invalidated, so an
> explicit TLB invalidation is not needed.
> 
> This becomes more significant  with GuC, as it can only do so when
> the connection to the GuC is awake.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-23 11:07     ` Andi Shyti
  -1 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Matthew Auld, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, mauro.chehab,
	stable, Thomas Hellström

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:36PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> On gen12 HW, ensure that the TLB of the OA unit is also invalidated
> as just invalidating the TLB of an engine is not enough.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
@ 2022-06-23 11:07     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang,
	Chris Wilson, Matthew Auld, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Tvrtko Ursulin, mauro.chehab,
	Michał Winiarski, linux-kernel, stable, John Harrison

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:36PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> On gen12 HW, ensure that the TLB of the OA unit is also invalidated
> as just invalidating the TLB of an engine is not enough.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
@ 2022-06-23 11:07     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:36PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> On gen12 HW, ensure that the TLB of the OA unit is also invalidated
> as just invalidating the TLB of an engine is not enough.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-23 11:08     ` Andi Shyti
  -1 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Matthew Auld, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, mauro.chehab,
	stable, Thomas Hellström

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:37PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>  drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 61b7ec5118f9..fb4fd5273ca4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>  		return;
>  
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +

This looks familiar :)

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
@ 2022-06-23 11:08     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang,
	Chris Wilson, Matthew Auld, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Tvrtko Ursulin, mauro.chehab,
	Michał Winiarski, linux-kernel, stable, John Harrison

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:37PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>  drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 61b7ec5118f9..fb4fd5273ca4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>  		return;
>  
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +

This looks familiar :)

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged
@ 2022-06-23 11:08     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:37PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/
> 
>  drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 61b7ec5118f9..fb4fd5273ca4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -1226,6 +1226,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>  		return;
>  
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +

This looks familiar :)

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-23 11:13     ` Andi Shyti
  -1 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Thomas Hellstrom, Daniel Vetter,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel,
	mauro.chehab, Andi Shyti, stable, Thomas Hellström

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:38PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't flush TLBs when the buffer is only used in the GGTT under full
> control of the kernel, as there's no risk of of concurrent access
> and stale access from prefetch.
> 
> We only need to invalidate the TLB if they are accessible by the user.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
@ 2022-06-23 11:13     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Tvrtko Ursulin, mauro.chehab, Fei Yang, Thomas Hellström,
	David Airlie, dri-devel, linux-kernel, Chris Wilson,
	Thomas Hellstrom, Rodrigo Vivi, Andi Shyti, Dave Airlie, stable,
	intel-gfx

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:38PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't flush TLBs when the buffer is only used in the GGTT under full
> control of the kernel, as there's no risk of of concurrent access
> and stale access from prefetch.
> 
> We only need to invalidate the TLB if they are accessible by the user.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
@ 2022-06-23 11:13     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: mauro.chehab, Thomas Hellström, David Airlie, dri-devel,
	linux-kernel, Chris Wilson, Thomas Hellstrom, Rodrigo Vivi,
	Dave Airlie, stable, intel-gfx

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:38PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't flush TLBs when the buffer is only used in the GGTT under full
> control of the kernel, as there's no risk of of concurrent access
> and stale access from prefetch.
> 
> We only need to invalidate the TLB if they are accessible by the user.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-23 11:17     ` Andi Shyti
  -1 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:17 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Thomas Hellstrom, Bruce Chang,
	Daniel Vetter, Dave Airlie, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Matt Roper, Matthew Brost,
	Rodrigo Vivi, Tejas Upadhyay, Tvrtko Ursulin,
	Umesh Nerlige Ramappa, dri-devel, intel-gfx, linux-kernel,
	mauro.chehab, Mika Kuoppala, Chris Wilson, stable,
	Thomas Hellström

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't allow two engines to be reset in parallel, as they would both
> try to select a reset bit (and send requests to common registers)
> and wait on that register, at the same time. Serialize control of
> the reset requests/acks using the uncore->lock, which will also ensure
> that no other GT state changes at the same time as the actual reset.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-23 11:17     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:17 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Fei Yang, Matthew Brost,
	Mika Kuoppala, Chris Wilson, Dave Airlie, Thomas Hellström,
	intel-gfx, Thomas Hellstrom, Rodrigo Vivi, Tvrtko Ursulin,
	mauro.chehab, linux-kernel, stable, Bruce Chang, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't allow two engines to be reset in parallel, as they would both
> try to select a reset bit (and send requests to common registers)
> and wait on that register, at the same time. Serialize control of
> the reset requests/acks using the uncore->lock, which will also ensure
> that no other GT state changes at the same time as the actual reset.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-23 11:17     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:17 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Chris Wilson, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	mauro.chehab, linux-kernel, stable, Tejas Upadhyay

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't allow two engines to be reset in parallel, as they would both
> try to select a reset bit (and send requests to common registers)
> and wait on that register, at the same time. Serialize control of
> the reset requests/acks using the uncore->lock, which will also ensure
> that no other GT state changes at the same time as the actual reset.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Andi Shyti <andi.shyti@intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 6/6] drm/i915/gt: Serialize TLB invalidates with GT resets
  2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-23 11:18     ` Andi Shyti
  -1 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:18 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Michał Winiarski, Thomas Hellstrom,
	Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Joonas Lahtinen, Lucas De Marchi, Matt Roper, Matthew Auld,
	Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel,
	mauro.chehab, stable

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:40PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Avoid trying to invalidate the TLB in the middle of performing an
> engine reset, as this may result in the reset timing out. Currently,
> the TLB invalidate is only serialised by its own mutex, forgoing the
> uncore lock, but we can take the uncore->lock as well to serialise
> the mmio access, thereby serialising with the GDRST.
> 
> Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> i915 selftest/hangcheck.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 6/6] drm/i915/gt: Serialize TLB invalidates with GT resets
@ 2022-06-23 11:18     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:18 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang,
	Chris Wilson, Matthew Auld, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, Tvrtko Ursulin, mauro.chehab,
	Michał Winiarski, linux-kernel, stable

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:40PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Avoid trying to invalidate the TLB in the middle of performing an
> engine reset, as this may result in the reset timing out. Currently,
> the TLB invalidate is only serialised by its own mutex, forgoing the
> uncore lock, but we can take the uncore->lock as well to serialise
> the mmio access, thereby serialising with the GDRST.
> 
> Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> i915 selftest/hangcheck.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 6/6] drm/i915/gt: Serialize TLB invalidates with GT resets
@ 2022-06-23 11:18     ` Andi Shyti
  0 siblings, 0 replies; 87+ messages in thread
From: Andi Shyti @ 2022-06-23 11:18 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, intel-gfx, Lucas De Marchi,
	Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	Michał Winiarski, linux-kernel, stable

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:40PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Avoid trying to invalidate the TLB in the middle of performing an
> engine reset, as this may result in the reset timing out. Currently,
> the TLB invalidate is only serialised by its own mutex, forgoing the
> uncore lock, but we can take the uncore->lock as well to serialise
> the mmio access, thereby serialising with the GDRST.
> 
> Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> i915 selftest/hangcheck.
> 
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> 
> Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: stable@vger.kernel.org
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-23 11:17     ` Andi Shyti
  (?)
@ 2022-06-24  8:34       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-24  8:34 UTC (permalink / raw)
  To: Andi Shyti, Mauro Carvalho Chehab
  Cc: Chris Wilson, Fei Yang, Thomas Hellstrom, Bruce Chang,
	Daniel Vetter, Dave Airlie, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Matt Roper, Matthew Brost,
	Rodrigo Vivi, Tejas Upadhyay, Umesh Nerlige Ramappa, dri-devel,
	intel-gfx, linux-kernel, mauro.chehab, Mika Kuoppala,
	Chris Wilson, stable, Thomas Hellström


On 23/06/2022 12:17, Andi Shyti wrote:
> Hi Mauro,
> 
> On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
>> From: Chris Wilson <chris.p.wilson@intel.com>
>>
>> Don't allow two engines to be reset in parallel, as they would both
>> try to select a reset bit (and send requests to common registers)
>> and wait on that register, at the same time. Serialize control of
>> the reset requests/acks using the uncore->lock, which will also ensure
>> that no other GT state changes at the same time as the actual reset.
>>
>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>
>> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Cc: Andi Shyti <andi.shyti@intel.com>
>> Cc: stable@vger.kernel.org
>> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> 
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Notice I had a bunch of questions and asks in this series so please do 
not merge until those are addressed.

In this particular patch (and some others) for instance Fixes: tag, at 
least against that sha, shouldn't be there.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-24  8:34       ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-24  8:34 UTC (permalink / raw)
  To: Andi Shyti, Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Fei Yang, Matthew Brost,
	Mika Kuoppala, Chris Wilson, Dave Airlie, Thomas Hellström,
	intel-gfx, Thomas Hellstrom, Rodrigo Vivi, mauro.chehab,
	linux-kernel, stable, Bruce Chang, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison


On 23/06/2022 12:17, Andi Shyti wrote:
> Hi Mauro,
> 
> On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
>> From: Chris Wilson <chris.p.wilson@intel.com>
>>
>> Don't allow two engines to be reset in parallel, as they would both
>> try to select a reset bit (and send requests to common registers)
>> and wait on that register, at the same time. Serialize control of
>> the reset requests/acks using the uncore->lock, which will also ensure
>> that no other GT state changes at the same time as the actual reset.
>>
>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>
>> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Cc: Andi Shyti <andi.shyti@intel.com>
>> Cc: stable@vger.kernel.org
>> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> 
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Notice I had a bunch of questions and asks in this series so please do 
not merge until those are addressed.

In this particular patch (and some others) for instance Fixes: tag, at 
least against that sha, shouldn't be there.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-24  8:34       ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-24  8:34 UTC (permalink / raw)
  To: Andi Shyti, Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Chris Wilson, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	mauro.chehab, linux-kernel, stable, Tejas Upadhyay


On 23/06/2022 12:17, Andi Shyti wrote:
> Hi Mauro,
> 
> On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
>> From: Chris Wilson <chris.p.wilson@intel.com>
>>
>> Don't allow two engines to be reset in parallel, as they would both
>> try to select a reset bit (and send requests to common registers)
>> and wait on that register, at the same time. Serialize control of
>> the reset requests/acks using the uncore->lock, which will also ensure
>> that no other GT state changes at the same time as the actual reset.
>>
>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>
>> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> Cc: Andi Shyti <andi.shyti@intel.com>
>> Cc: stable@vger.kernel.org
>> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> 
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Notice I had a bunch of questions and asks in this series so please do 
not merge until those are addressed.

In this particular patch (and some others) for instance Fixes: tag, at 
least against that sha, shouldn't be there.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-24  8:34       ` Tvrtko Ursulin
  (?)
@ 2022-06-27  9:00         ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-27  9:00 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Andi Shyti, Mauro Carvalho Chehab, Chris Wilson, Fei Yang,
	Thomas Hellstrom, Bruce Chang, Daniel Vetter, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Matt Roper, Matthew Brost, Rodrigo Vivi, Tejas Upadhyay,
	Umesh Nerlige Ramappa, dri-devel, intel-gfx, linux-kernel,
	Mika Kuoppala, Chris Wilson, stable, Thomas Hellström

Hi Tvrtko,

On Fri, 24 Jun 2022 09:34:21 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 23/06/2022 12:17, Andi Shyti wrote:
> > Hi Mauro,
> > 
> > On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:  
> >> From: Chris Wilson <chris.p.wilson@intel.com>
> >>
> >> Don't allow two engines to be reset in parallel, as they would both
> >> try to select a reset bit (and send requests to common registers)
> >> and wait on that register, at the same time. Serialize control of
> >> the reset requests/acks using the uncore->lock, which will also ensure
> >> that no other GT state changes at the same time as the actual reset.
> >>
> >> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>
> >> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> Cc: Andi Shyti <andi.shyti@intel.com>
> >> Cc: stable@vger.kernel.org
> >> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>  
> > 
> > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>  
> 
> Notice I had a bunch of questions and asks in this series so please do 
> not merge until those are addressed.
> 
> In this particular patch (and some others) for instance Fixes: tag, at 
> least against that sha, shouldn't be there.

Hmm... I sent an answer to your points, but I can't see it at:

	https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com/

Maybe it got lost somewhere, I dunno.

Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not
directly related to changeset 7938d61591d3. Yet, this one is required for
patch 6 to work.

The other patches on this series, though, are modifying the code 
introduced by changeset 7938d61591d3.

Patch 2 is clearly a workaround needed for TLB cache invalidation to
work on some GPUs. So, while not related to Broadwell, they're also
fixing some TLB cache issues. So, IMO, it should keep the fixes.

I tried to port just the two serialize patches to drm-tip, in order
to solve the issues on Broadwell, but it didn't work, as the logic 
inside the spinlock could be calling schedule() with a spinlock hold:
 
	Jun 14 17:38:48 silver kernel: [   23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496
	Jun 14 17:38:48 silver kernel: [   23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1
	Jun 14 17:38:48 silver kernel: [   23.227818] preempt_count: 1, expected: 0
	Jun 14 17:38:48 silver kernel: [   23.227819] RCU nest depth: 0, expected: 0
	Jun 14 17:38:48 silver kernel: [   23.227820] 5 locks held by kworker/u8:1/37:
	Jun 14 17:38:48 silver kernel: [   23.227822]  #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
	Jun 14 17:38:48 silver kernel: [   23.227831]  #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
	Jun 14 17:38:48 silver kernel: [   23.227837]  #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915]
	Jun 14 17:38:48 silver kernel: [   23.228283]  #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915]
	Jun 14 17:38:48 silver kernel: [   23.228663]  #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]

I didn't investigate the root cause, but it seems related to PM, so 
patches 1 and 3 seem to be required for the serialization logic
to actually work.

So, I would keep the Fixes: tag mentioning changeset 7938d61591d3
on patches: 1, 2, 3 and 6.

Yet, IMO the entire series should be merged on -stable.

If that's OK for you and there's no additional issues to be
addressed, I'll submit a v2 of this series removing the Fixes tag
from patches 4 and 5.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-27  9:00         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-27  9:00 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: David Airlie, dri-devel, Chris Wilson, Chris Wilson, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable, Tejas Upadhyay

Hi Tvrtko,

On Fri, 24 Jun 2022 09:34:21 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 23/06/2022 12:17, Andi Shyti wrote:
> > Hi Mauro,
> > 
> > On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:  
> >> From: Chris Wilson <chris.p.wilson@intel.com>
> >>
> >> Don't allow two engines to be reset in parallel, as they would both
> >> try to select a reset bit (and send requests to common registers)
> >> and wait on that register, at the same time. Serialize control of
> >> the reset requests/acks using the uncore->lock, which will also ensure
> >> that no other GT state changes at the same time as the actual reset.
> >>
> >> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>
> >> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> Cc: Andi Shyti <andi.shyti@intel.com>
> >> Cc: stable@vger.kernel.org
> >> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>  
> > 
> > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>  
> 
> Notice I had a bunch of questions and asks in this series so please do 
> not merge until those are addressed.
> 
> In this particular patch (and some others) for instance Fixes: tag, at 
> least against that sha, shouldn't be there.

Hmm... I sent an answer to your points, but I can't see it at:

	https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com/

Maybe it got lost somewhere, I dunno.

Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not
directly related to changeset 7938d61591d3. Yet, this one is required for
patch 6 to work.

The other patches on this series, though, are modifying the code 
introduced by changeset 7938d61591d3.

Patch 2 is clearly a workaround needed for TLB cache invalidation to
work on some GPUs. So, while not related to Broadwell, they're also
fixing some TLB cache issues. So, IMO, it should keep the fixes.

I tried to port just the two serialize patches to drm-tip, in order
to solve the issues on Broadwell, but it didn't work, as the logic 
inside the spinlock could be calling schedule() with a spinlock hold:
 
	Jun 14 17:38:48 silver kernel: [   23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496
	Jun 14 17:38:48 silver kernel: [   23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1
	Jun 14 17:38:48 silver kernel: [   23.227818] preempt_count: 1, expected: 0
	Jun 14 17:38:48 silver kernel: [   23.227819] RCU nest depth: 0, expected: 0
	Jun 14 17:38:48 silver kernel: [   23.227820] 5 locks held by kworker/u8:1/37:
	Jun 14 17:38:48 silver kernel: [   23.227822]  #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
	Jun 14 17:38:48 silver kernel: [   23.227831]  #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
	Jun 14 17:38:48 silver kernel: [   23.227837]  #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915]
	Jun 14 17:38:48 silver kernel: [   23.228283]  #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915]
	Jun 14 17:38:48 silver kernel: [   23.228663]  #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]

I didn't investigate the root cause, but it seems related to PM, so 
patches 1 and 3 seem to be required for the serialization logic
to actually work.

So, I would keep the Fixes: tag mentioning changeset 7938d61591d3
on patches: 1, 2, 3 and 6.

Yet, IMO the entire series should be merged on -stable.

If that's OK for you and there's no additional issues to be
addressed, I'll submit a v2 of this series removing the Fixes tag
from patches 4 and 5.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-27  9:00         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-27  9:00 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: David Airlie, dri-devel, Chris Wilson, Fei Yang, Matthew Brost,
	Mika Kuoppala, Chris Wilson, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable, Bruce Chang,
	Tejas Upadhyay, Umesh Nerlige Ramappa, John Harrison

Hi Tvrtko,

On Fri, 24 Jun 2022 09:34:21 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 23/06/2022 12:17, Andi Shyti wrote:
> > Hi Mauro,
> > 
> > On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:  
> >> From: Chris Wilson <chris.p.wilson@intel.com>
> >>
> >> Don't allow two engines to be reset in parallel, as they would both
> >> try to select a reset bit (and send requests to common registers)
> >> and wait on that register, at the same time. Serialize control of
> >> the reset requests/acks using the uncore->lock, which will also ensure
> >> that no other GT state changes at the same time as the actual reset.
> >>
> >> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>
> >> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> >> Cc: Andi Shyti <andi.shyti@intel.com>
> >> Cc: stable@vger.kernel.org
> >> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>  
> > 
> > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>  
> 
> Notice I had a bunch of questions and asks in this series so please do 
> not merge until those are addressed.
> 
> In this particular patch (and some others) for instance Fixes: tag, at 
> least against that sha, shouldn't be there.

Hmm... I sent an answer to your points, but I can't see it at:

	https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com/

Maybe it got lost somewhere, I dunno.

Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not
directly related to changeset 7938d61591d3. Yet, this one is required for
patch 6 to work.

The other patches on this series, though, are modifying the code 
introduced by changeset 7938d61591d3.

Patch 2 is clearly a workaround needed for TLB cache invalidation to
work on some GPUs. So, while not related to Broadwell, they're also
fixing some TLB cache issues. So, IMO, it should keep the fixes.

I tried to port just the two serialize patches to drm-tip, in order
to solve the issues on Broadwell, but it didn't work, as the logic 
inside the spinlock could be calling schedule() with a spinlock hold:
 
	Jun 14 17:38:48 silver kernel: [   23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496
	Jun 14 17:38:48 silver kernel: [   23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1
	Jun 14 17:38:48 silver kernel: [   23.227818] preempt_count: 1, expected: 0
	Jun 14 17:38:48 silver kernel: [   23.227819] RCU nest depth: 0, expected: 0
	Jun 14 17:38:48 silver kernel: [   23.227820] 5 locks held by kworker/u8:1/37:
	Jun 14 17:38:48 silver kernel: [   23.227822]  #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
	Jun 14 17:38:48 silver kernel: [   23.227831]  #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
	Jun 14 17:38:48 silver kernel: [   23.227837]  #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915]
	Jun 14 17:38:48 silver kernel: [   23.228283]  #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915]
	Jun 14 17:38:48 silver kernel: [   23.228663]  #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]

I didn't investigate the root cause, but it seems related to PM, so 
patches 1 and 3 seem to be required for the serialization logic
to actually work.

So, I would keep the Fixes: tag mentioning changeset 7938d61591d3
on patches: 1, 2, 3 and 6.

Yet, IMO the entire series should be merged on -stable.

If that's OK for you and there's no additional issues to be
addressed, I'll submit a v2 of this series removing the Fixes tag
from patches 4 and 5.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-27  9:00         ` [Intel-gfx] " Mauro Carvalho Chehab
  (?)
@ 2022-06-28 15:49           ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-28 15:49 UTC (permalink / raw)
  To: Mauro Carvalho Chehab (by way of Mauro Carvalho Chehab
  Cc: Andi Shyti, Mauro Carvalho Chehab, Chris Wilson, Fei Yang,
	Thomas Hellstrom, Bruce Chang, Daniel Vetter, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Matt Roper, Matthew Brost, Rodrigo Vivi, Tejas Upadhyay,
	Umesh Nerlige Ramappa, dri-devel, intel-gfx, linux-kernel,
	Mika Kuoppala, Chris Wilson, stable, Thomas Hellström


Hi,

On 27/06/2022 10:00, Mauro Carvalho Chehab (by way of Mauro Carvalho 
Chehab <mauro.chehab@linux.intel.com>) wrote:
> Hi Tvrtko,
> 
> On Fri, 24 Jun 2022 09:34:21 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> On 23/06/2022 12:17, Andi Shyti wrote:
>>> Hi Mauro,
>>>
>>> On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
>>>> From: Chris Wilson <chris.p.wilson@intel.com>
>>>>
>>>> Don't allow two engines to be reset in parallel, as they would both
>>>> try to select a reset bit (and send requests to common registers)
>>>> and wait on that register, at the same time. Serialize control of
>>>> the reset requests/acks using the uncore->lock, which will also ensure
>>>> that no other GT state changes at the same time as the actual reset.
>>>>
>>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>>>
>>>> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>> Cc: Andi Shyti <andi.shyti@intel.com>
>>>> Cc: stable@vger.kernel.org
>>>> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>
>> Notice I had a bunch of questions and asks in this series so please do
>> not merge until those are addressed.
>>
>> In this particular patch (and some others) for instance Fixes: tag, at
>> least against that sha, shouldn't be there.
> 
> Hmm... I sent an answer to your points, but I can't see it at:
> 
> 	https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com/
> 
> Maybe it got lost somewhere, I dunno.

Yeah, no replies received on my end I'm afraid.

> 
> Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not
> directly related to changeset 7938d61591d3. Yet, this one is required for
> patch 6 to work.
> 
> The other patches on this series, though, are modifying the code
> introduced by changeset 7938d61591d3.

Modifying the code does not strictly means something is a fix for a 
certain patch.

> Patch 2 is clearly a workaround needed for TLB cache invalidation to
> work on some GPUs. So, while not related to Broadwell, they're also
> fixing some TLB cache issues. So, IMO, it should keep the fixes.

Umesh commented that patch 2 is not needed - who is right then? :)

> I tried to port just the two serialize patches to drm-tip, in order
> to solve the issues on Broadwell, but it didn't work, as the logic
> inside the spinlock could be calling schedule() with a spinlock hold:
>   
> 	Jun 14 17:38:48 silver kernel: [   23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496
> 	Jun 14 17:38:48 silver kernel: [   23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1
> 	Jun 14 17:38:48 silver kernel: [   23.227818] preempt_count: 1, expected: 0
> 	Jun 14 17:38:48 silver kernel: [   23.227819] RCU nest depth: 0, expected: 0
> 	Jun 14 17:38:48 silver kernel: [   23.227820] 5 locks held by kworker/u8:1/37:
> 	Jun 14 17:38:48 silver kernel: [   23.227822]  #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
> 	Jun 14 17:38:48 silver kernel: [   23.227831]  #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
> 	Jun 14 17:38:48 silver kernel: [   23.227837]  #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915]
> 	Jun 14 17:38:48 silver kernel: [   23.228283]  #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915]
> 	Jun 14 17:38:48 silver kernel: [   23.228663]  #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]
> 
> I didn't investigate the root cause, but it seems related to PM, so
> patches 1 and 3 seem to be required for the serialization logic
> to actually work.

Yes that is clear, what is needed is the split of the for_each_engine 
loop into request and wait.

But question is how much backporting trouble will the _extra_ changes 
patch 1 brings create.

In the ideal world patch 1 wouldn't be an optimising one, I mean adding 
skipping of TLB invalidations on idle engines but just the loop split. 
That would make it smaller and more suitable for Cc: stable. Because 
both i915_gem_pages.c and intel_gt_pm.h hunks wouldn't even be there. 
And the refactor in intel_gt_invalidate_tlbs would be smaller since it 
wouldn't be adding the engine awake checks...

> So, I would keep the Fixes: tag mentioning changeset 7938d61591d3
> on patches: 1, 2, 3 and 6.

... which for me means a different patch 1, followed by patch 6 (moved 
to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible 
to implement and if it is just send those minimal patches out alone?

Maybe it even makes sense to squash such 1&2 into a single patch.

Again, since the original TLB flush was backported quite far back into 
long term stable releases I think it would be much easier to really have 
a minimal patch/series to fix Broadwell in those kernels.

Regards,

Tvrtko

> 
> Yet, IMO the entire series should be merged on -stable.
> 
> If that's OK for you and there's no additional issues to be
> addressed, I'll submit a v2 of this series removing the Fixes tag
> from patches 4 and 5.
> 
> Regards,
> Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-28 15:49           ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-28 15:49 UTC (permalink / raw)
  To: Mauro Carvalho Chehab (by way of Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Fei Yang, Matthew Brost,
	Mika Kuoppala, Chris Wilson, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable, Bruce Chang,
	Tejas Upadhyay, Umesh Nerlige Ramappa, John Harrison


Hi,

On 27/06/2022 10:00, Mauro Carvalho Chehab (by way of Mauro Carvalho 
Chehab <mauro.chehab@linux.intel.com>) wrote:
> Hi Tvrtko,
> 
> On Fri, 24 Jun 2022 09:34:21 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> On 23/06/2022 12:17, Andi Shyti wrote:
>>> Hi Mauro,
>>>
>>> On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
>>>> From: Chris Wilson <chris.p.wilson@intel.com>
>>>>
>>>> Don't allow two engines to be reset in parallel, as they would both
>>>> try to select a reset bit (and send requests to common registers)
>>>> and wait on that register, at the same time. Serialize control of
>>>> the reset requests/acks using the uncore->lock, which will also ensure
>>>> that no other GT state changes at the same time as the actual reset.
>>>>
>>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>>>
>>>> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>> Cc: Andi Shyti <andi.shyti@intel.com>
>>>> Cc: stable@vger.kernel.org
>>>> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>
>> Notice I had a bunch of questions and asks in this series so please do
>> not merge until those are addressed.
>>
>> In this particular patch (and some others) for instance Fixes: tag, at
>> least against that sha, shouldn't be there.
> 
> Hmm... I sent an answer to your points, but I can't see it at:
> 
> 	https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com/
> 
> Maybe it got lost somewhere, I dunno.

Yeah, no replies received on my end I'm afraid.

> 
> Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not
> directly related to changeset 7938d61591d3. Yet, this one is required for
> patch 6 to work.
> 
> The other patches on this series, though, are modifying the code
> introduced by changeset 7938d61591d3.

Modifying the code does not strictly means something is a fix for a 
certain patch.

> Patch 2 is clearly a workaround needed for TLB cache invalidation to
> work on some GPUs. So, while not related to Broadwell, they're also
> fixing some TLB cache issues. So, IMO, it should keep the fixes.

Umesh commented that patch 2 is not needed - who is right then? :)

> I tried to port just the two serialize patches to drm-tip, in order
> to solve the issues on Broadwell, but it didn't work, as the logic
> inside the spinlock could be calling schedule() with a spinlock hold:
>   
> 	Jun 14 17:38:48 silver kernel: [   23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496
> 	Jun 14 17:38:48 silver kernel: [   23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1
> 	Jun 14 17:38:48 silver kernel: [   23.227818] preempt_count: 1, expected: 0
> 	Jun 14 17:38:48 silver kernel: [   23.227819] RCU nest depth: 0, expected: 0
> 	Jun 14 17:38:48 silver kernel: [   23.227820] 5 locks held by kworker/u8:1/37:
> 	Jun 14 17:38:48 silver kernel: [   23.227822]  #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
> 	Jun 14 17:38:48 silver kernel: [   23.227831]  #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
> 	Jun 14 17:38:48 silver kernel: [   23.227837]  #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915]
> 	Jun 14 17:38:48 silver kernel: [   23.228283]  #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915]
> 	Jun 14 17:38:48 silver kernel: [   23.228663]  #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]
> 
> I didn't investigate the root cause, but it seems related to PM, so
> patches 1 and 3 seem to be required for the serialization logic
> to actually work.

Yes that is clear, what is needed is the split of the for_each_engine 
loop into request and wait.

But question is how much backporting trouble will the _extra_ changes 
patch 1 brings create.

In the ideal world patch 1 wouldn't be an optimising one, I mean adding 
skipping of TLB invalidations on idle engines but just the loop split. 
That would make it smaller and more suitable for Cc: stable. Because 
both i915_gem_pages.c and intel_gt_pm.h hunks wouldn't even be there. 
And the refactor in intel_gt_invalidate_tlbs would be smaller since it 
wouldn't be adding the engine awake checks...

> So, I would keep the Fixes: tag mentioning changeset 7938d61591d3
> on patches: 1, 2, 3 and 6.

... which for me means a different patch 1, followed by patch 6 (moved 
to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible 
to implement and if it is just send those minimal patches out alone?

Maybe it even makes sense to squash such 1&2 into a single patch.

Again, since the original TLB flush was backported quite far back into 
long term stable releases I think it would be much easier to really have 
a minimal patch/series to fix Broadwell in those kernels.

Regards,

Tvrtko

> 
> Yet, IMO the entire series should be merged on -stable.
> 
> If that's OK for you and there's no additional issues to be
> addressed, I'll submit a v2 of this series removing the Fixes tag
> from patches 4 and 5.
> 
> Regards,
> Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-28 15:49           ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-28 15:49 UTC (permalink / raw)
  To: Mauro Carvalho Chehab (by way of Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Chris Wilson, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable, Tejas Upadhyay


Hi,

On 27/06/2022 10:00, Mauro Carvalho Chehab (by way of Mauro Carvalho 
Chehab <mauro.chehab@linux.intel.com>) wrote:
> Hi Tvrtko,
> 
> On Fri, 24 Jun 2022 09:34:21 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> On 23/06/2022 12:17, Andi Shyti wrote:
>>> Hi Mauro,
>>>
>>> On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:
>>>> From: Chris Wilson <chris.p.wilson@intel.com>
>>>>
>>>> Don't allow two engines to be reset in parallel, as they would both
>>>> try to select a reset bit (and send requests to common registers)
>>>> and wait on that register, at the same time. Serialize control of
>>>> the reset requests/acks using the uncore->lock, which will also ensure
>>>> that no other GT state changes at the same time as the actual reset.
>>>>
>>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>>>
>>>> Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>>>> Cc: Andi Shyti <andi.shyti@intel.com>
>>>> Cc: stable@vger.kernel.org
>>>> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>
>>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>>
>> Notice I had a bunch of questions and asks in this series so please do
>> not merge until those are addressed.
>>
>> In this particular patch (and some others) for instance Fixes: tag, at
>> least against that sha, shouldn't be there.
> 
> Hmm... I sent an answer to your points, but I can't see it at:
> 
> 	https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel.com/
> 
> Maybe it got lost somewhere, I dunno.

Yeah, no replies received on my end I'm afraid.

> 
> Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not
> directly related to changeset 7938d61591d3. Yet, this one is required for
> patch 6 to work.
> 
> The other patches on this series, though, are modifying the code
> introduced by changeset 7938d61591d3.

Modifying the code does not strictly means something is a fix for a 
certain patch.

> Patch 2 is clearly a workaround needed for TLB cache invalidation to
> work on some GPUs. So, while not related to Broadwell, they're also
> fixing some TLB cache issues. So, IMO, it should keep the fixes.

Umesh commented that patch 2 is not needed - who is right then? :)

> I tried to port just the two serialize patches to drm-tip, in order
> to solve the issues on Broadwell, but it didn't work, as the logic
> inside the spinlock could be calling schedule() with a spinlock hold:
>   
> 	Jun 14 17:38:48 silver kernel: [   23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496
> 	Jun 14 17:38:48 silver kernel: [   23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1
> 	Jun 14 17:38:48 silver kernel: [   23.227818] preempt_count: 1, expected: 0
> 	Jun 14 17:38:48 silver kernel: [   23.227819] RCU nest depth: 0, expected: 0
> 	Jun 14 17:38:48 silver kernel: [   23.227820] 5 locks held by kworker/u8:1/37:
> 	Jun 14 17:38:48 silver kernel: [   23.227822]  #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
> 	Jun 14 17:38:48 silver kernel: [   23.227831]  #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580
> 	Jun 14 17:38:48 silver kernel: [   23.227837]  #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915]
> 	Jun 14 17:38:48 silver kernel: [   23.228283]  #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915]
> 	Jun 14 17:38:48 silver kernel: [   23.228663]  #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]
> 
> I didn't investigate the root cause, but it seems related to PM, so
> patches 1 and 3 seem to be required for the serialization logic
> to actually work.

Yes that is clear, what is needed is the split of the for_each_engine 
loop into request and wait.

But question is how much backporting trouble will the _extra_ changes 
patch 1 brings create.

In the ideal world patch 1 wouldn't be an optimising one, I mean adding 
skipping of TLB invalidations on idle engines but just the loop split. 
That would make it smaller and more suitable for Cc: stable. Because 
both i915_gem_pages.c and intel_gt_pm.h hunks wouldn't even be there. 
And the refactor in intel_gt_invalidate_tlbs would be smaller since it 
wouldn't be adding the engine awake checks...

> So, I would keep the Fixes: tag mentioning changeset 7938d61591d3
> on patches: 1, 2, 3 and 6.

... which for me means a different patch 1, followed by patch 6 (moved 
to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible 
to implement and if it is just send those minimal patches out alone?

Maybe it even makes sense to squash such 1&2 into a single patch.

Again, since the original TLB flush was backported quite far back into 
long term stable releases I think it would be much easier to really have 
a minimal patch/series to fix Broadwell in those kernels.

Regards,

Tvrtko

> 
> Yet, IMO the entire series should be merged on -stable.
> 
> If that's OK for you and there's no additional issues to be
> addressed, I'll submit a v2 of this series removing the Fixes tag
> from patches 4 and 5.
> 
> Regards,
> Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-28 15:49           ` Tvrtko Ursulin
  (?)
@ 2022-06-29 15:30             ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-29 15:30 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Andi Shyti, Mauro Carvalho Chehab, Chris Wilson, Fei Yang,
	Thomas Hellstrom, Bruce Chang, Daniel Vetter, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Matt Roper, Matthew Brost, Rodrigo Vivi, Tejas Upadhyay,
	Umesh Nerlige Ramappa, dri-devel, intel-gfx, linux-kernel,
	Mika Kuoppala, Chris Wilson, stable, Thomas Hellström

On Tue, 28 Jun 2022 16:49:23 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

>.. which for me means a different patch 1, followed by patch 6 (moved 
> to be patch 2) would be ideal stable material.
> 
> Then we have the current patch 2 which is open/unknown (to me at least).
> 
> And the rest seem like optimisations which shouldn't be tagged as fixes.
> 
> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> 
> Could you please double check if what I am suggesting here is feasible 
> to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell
bug.

So, I submitted a v2 of this series with just those. They all need to
be backported to stable.

I still think that other TLB patches are needed/desired upstream, but
I'll submit them on a separate series. Let's fix the regression first ;-)

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-29 15:30             ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-29 15:30 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: David Airlie, dri-devel, Chris Wilson, Fei Yang, Matthew Brost,
	Mika Kuoppala, Chris Wilson, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable, Bruce Chang,
	Tejas Upadhyay, Umesh Nerlige Ramappa, John Harrison

On Tue, 28 Jun 2022 16:49:23 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

>.. which for me means a different patch 1, followed by patch 6 (moved 
> to be patch 2) would be ideal stable material.
> 
> Then we have the current patch 2 which is open/unknown (to me at least).
> 
> And the rest seem like optimisations which shouldn't be tagged as fixes.
> 
> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> 
> Could you please double check if what I am suggesting here is feasible 
> to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell
bug.

So, I submitted a v2 of this series with just those. They all need to
be backported to stable.

I still think that other TLB patches are needed/desired upstream, but
I'll submit them on a separate series. Let's fix the regression first ;-)

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-29 15:30             ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-29 15:30 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: David Airlie, dri-devel, Chris Wilson, Chris Wilson, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable, Tejas Upadhyay

On Tue, 28 Jun 2022 16:49:23 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

>.. which for me means a different patch 1, followed by patch 6 (moved 
> to be patch 2) would be ideal stable material.
> 
> Then we have the current patch 2 which is open/unknown (to me at least).
> 
> And the rest seem like optimisations which shouldn't be tagged as fixes.
> 
> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> 
> Could you please double check if what I am suggesting here is feasible 
> to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell
bug.

So, I submitted a v2 of this series with just those. They all need to
be backported to stable.

I still think that other TLB patches are needed/desired upstream, but
I'll submit them on a separate series. Let's fix the regression first ;-)

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-29 15:30             ` Mauro Carvalho Chehab
  (?)
@ 2022-06-29 16:02               ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-29 16:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Fei Yang, Matthew Brost,
	Mika Kuoppala, Chris Wilson, Andi Shyti, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable, Bruce Chang,
	Tejas Upadhyay, Umesh Nerlige Ramappa, John Harrison


On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
> On Tue, 28 Jun 2022 16:49:23 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> .. which for me means a different patch 1, followed by patch 6 (moved
>> to be patch 2) would be ideal stable material.
>>
>> Then we have the current patch 2 which is open/unknown (to me at least).
>>
>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>
>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>
>> Could you please double check if what I am suggesting here is feasible
>> to implement and if it is just send those minimal patches out alone?
> 
> Tested and porting just those 3 patches are enough to fix the Broadwell
> bug.
> 
> So, I submitted a v2 of this series with just those. They all need to
> be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
Author: Chris Wilson <chris.p.wilson@intel.com>
Date:   Wed Jun 29 16:25:24 2022 +0100

     drm/i915/gt: Serialize TLB invalidates with GT resets
     
     Avoid trying to invalidate the TLB in the middle of performing an
     engine reset, as this may result in the reset timing out. Currently,
     the TLB invalidate is only serialised by its own mutex, forgoing the
     uncore lock, but we can take the uncore->lock as well to serialise
     the mmio access, thereby serialising with the GDRST.
     
     Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
     i915 selftest/hangcheck.
     
     Cc: stable@vger.kernel.org
     Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
     Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
     Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
     Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
     Reviewed-by: Andi Shyti <andi.shyti@intel.com>
     Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 8da3314bb6bf..aaadd0b02043 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
         mutex_lock(&gt->tlb_invalidate_lock);
         intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  
+       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
+
+       for_each_engine(engine, gt, id) {
+               struct reg_and_bit rb;
+
+               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+               if (!i915_mmio_reg_offset(rb.reg))
+                       continue;
+
+               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+       }
+
+       spin_unlock_irq(&uncore->lock);
+
         for_each_engine(engine, gt, id) {
+               struct reg_and_bit rb;
+
                 /*
                  * HW architecture suggest typical invalidation time at 40us,
                  * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
                  */
                 const unsigned int timeout_us = 100;
                 const unsigned int timeout_ms = 4;
-               struct reg_and_bit rb;
  
                 rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
                 if (!i915_mmio_reg_offset(rb.reg))
                         continue;
  
-               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
                 if (__intel_wait_for_register_fw(uncore,
                                                  rb.reg, rb.bit, 0,
                                                  timeout_us, timeout_ms,

If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

> I still think that other TLB patches are needed/desired upstream, but
> I'll submit them on a separate series. Let's fix the regression first ;-)

Yep, that's exactly right.

Regards,

Tvrtko

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-29 16:02               ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-29 16:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Chris Wilson, Chris Wilson, Dave Airlie,
	Thomas Hellström, intel-gfx, Thomas Hellstrom, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable, Tejas Upadhyay


On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
> On Tue, 28 Jun 2022 16:49:23 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> .. which for me means a different patch 1, followed by patch 6 (moved
>> to be patch 2) would be ideal stable material.
>>
>> Then we have the current patch 2 which is open/unknown (to me at least).
>>
>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>
>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>
>> Could you please double check if what I am suggesting here is feasible
>> to implement and if it is just send those minimal patches out alone?
> 
> Tested and porting just those 3 patches are enough to fix the Broadwell
> bug.
> 
> So, I submitted a v2 of this series with just those. They all need to
> be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
Author: Chris Wilson <chris.p.wilson@intel.com>
Date:   Wed Jun 29 16:25:24 2022 +0100

     drm/i915/gt: Serialize TLB invalidates with GT resets
     
     Avoid trying to invalidate the TLB in the middle of performing an
     engine reset, as this may result in the reset timing out. Currently,
     the TLB invalidate is only serialised by its own mutex, forgoing the
     uncore lock, but we can take the uncore->lock as well to serialise
     the mmio access, thereby serialising with the GDRST.
     
     Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
     i915 selftest/hangcheck.
     
     Cc: stable@vger.kernel.org
     Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
     Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
     Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
     Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
     Reviewed-by: Andi Shyti <andi.shyti@intel.com>
     Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 8da3314bb6bf..aaadd0b02043 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
         mutex_lock(&gt->tlb_invalidate_lock);
         intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  
+       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
+
+       for_each_engine(engine, gt, id) {
+               struct reg_and_bit rb;
+
+               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+               if (!i915_mmio_reg_offset(rb.reg))
+                       continue;
+
+               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+       }
+
+       spin_unlock_irq(&uncore->lock);
+
         for_each_engine(engine, gt, id) {
+               struct reg_and_bit rb;
+
                 /*
                  * HW architecture suggest typical invalidation time at 40us,
                  * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
                  */
                 const unsigned int timeout_us = 100;
                 const unsigned int timeout_ms = 4;
-               struct reg_and_bit rb;
  
                 rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
                 if (!i915_mmio_reg_offset(rb.reg))
                         continue;
  
-               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
                 if (__intel_wait_for_register_fw(uncore,
                                                  rb.reg, rb.bit, 0,
                                                  timeout_us, timeout_ms,

If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

> I still think that other TLB patches are needed/desired upstream, but
> I'll submit them on a separate series. Let's fix the regression first ;-)

Yep, that's exactly right.

Regards,

Tvrtko

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-29 16:02               ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-29 16:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Andi Shyti, Mauro Carvalho Chehab, Chris Wilson, Fei Yang,
	Thomas Hellstrom, Bruce Chang, Daniel Vetter, Dave Airlie,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Matt Roper, Matthew Brost, Rodrigo Vivi, Tejas Upadhyay,
	Umesh Nerlige Ramappa, dri-devel, intel-gfx, linux-kernel,
	Mika Kuoppala, Chris Wilson, stable, Thomas Hellström


On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
> On Tue, 28 Jun 2022 16:49:23 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> .. which for me means a different patch 1, followed by patch 6 (moved
>> to be patch 2) would be ideal stable material.
>>
>> Then we have the current patch 2 which is open/unknown (to me at least).
>>
>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>
>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>
>> Could you please double check if what I am suggesting here is feasible
>> to implement and if it is just send those minimal patches out alone?
> 
> Tested and porting just those 3 patches are enough to fix the Broadwell
> bug.
> 
> So, I submitted a v2 of this series with just those. They all need to
> be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
Author: Chris Wilson <chris.p.wilson@intel.com>
Date:   Wed Jun 29 16:25:24 2022 +0100

     drm/i915/gt: Serialize TLB invalidates with GT resets
     
     Avoid trying to invalidate the TLB in the middle of performing an
     engine reset, as this may result in the reset timing out. Currently,
     the TLB invalidate is only serialised by its own mutex, forgoing the
     uncore lock, but we can take the uncore->lock as well to serialise
     the mmio access, thereby serialising with the GDRST.
     
     Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
     i915 selftest/hangcheck.
     
     Cc: stable@vger.kernel.org
     Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
     Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
     Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
     Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
     Reviewed-by: Andi Shyti <andi.shyti@intel.com>
     Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
     Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 8da3314bb6bf..aaadd0b02043 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
         mutex_lock(&gt->tlb_invalidate_lock);
         intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  
+       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
+
+       for_each_engine(engine, gt, id) {
+               struct reg_and_bit rb;
+
+               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+               if (!i915_mmio_reg_offset(rb.reg))
+                       continue;
+
+               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+       }
+
+       spin_unlock_irq(&uncore->lock);
+
         for_each_engine(engine, gt, id) {
+               struct reg_and_bit rb;
+
                 /*
                  * HW architecture suggest typical invalidation time at 40us,
                  * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
                  */
                 const unsigned int timeout_us = 100;
                 const unsigned int timeout_ms = 4;
-               struct reg_and_bit rb;
  
                 rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
                 if (!i915_mmio_reg_offset(rb.reg))
                         continue;
  
-               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
                 if (__intel_wait_for_register_fw(uncore,
                                                  rb.reg, rb.bit, 0,
                                                  timeout_us, timeout_ms,

If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

> I still think that other TLB patches are needed/desired upstream, but
> I'll submit them on a separate series. Let's fix the regression first ;-)

Yep, that's exactly right.

Regards,

Tvrtko

^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-29 16:02               ` [Intel-gfx] " Tvrtko Ursulin
  (?)
@ 2022-06-30  7:32                 ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-30  7:32 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Chris Wilson,
	Fei Yang, Matthew Brost, Mika Kuoppala, Chris Wilson, Andi Shyti,
	Dave Airlie, Thomas Hellström, intel-gfx, Thomas Hellstrom,
	Rodrigo Vivi, linux-kernel, stable, Bruce Chang, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison

Em Wed, 29 Jun 2022 17:02:59 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:

> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
> > On Tue, 28 Jun 2022 16:49:23 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >   
> >> .. which for me means a different patch 1, followed by patch 6 (moved
> >> to be patch 2) would be ideal stable material.
> >>
> >> Then we have the current patch 2 which is open/unknown (to me at least).
> >>
> >> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>
> >> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>
> >> Could you please double check if what I am suggesting here is feasible
> >> to implement and if it is just send those minimal patches out alone?  
> > 
> > Tested and porting just those 3 patches are enough to fix the Broadwell
> > bug.
> > 
> > So, I submitted a v2 of this series with just those. They all need to
> > be backported to stable.  
> 
> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> 
> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> Author: Chris Wilson <chris.p.wilson@intel.com>
> Date:   Wed Jun 29 16:25:24 2022 +0100
> 
>      drm/i915/gt: Serialize TLB invalidates with GT resets
>      
>      Avoid trying to invalidate the TLB in the middle of performing an
>      engine reset, as this may result in the reset timing out. Currently,
>      the TLB invalidate is only serialised by its own mutex, forgoing the
>      uncore lock, but we can take the uncore->lock as well to serialise
>      the mmio access, thereby serialising with the GDRST.
>      
>      Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>      i915 selftest/hangcheck.
>      
>      Cc: stable@vger.kernel.org
>      Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>      Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>      Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>      Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>      Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 8da3314bb6bf..aaadd0b02043 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>          mutex_lock(&gt->tlb_invalidate_lock);
>          intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> +
> +       for_each_engine(engine, gt, id) {
> +               struct reg_and_bit rb;
> +
> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> +               if (!i915_mmio_reg_offset(rb.reg))
> +                       continue;
> +
> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> +       }
> +
> +       spin_unlock_irq(&uncore->lock);
> +
>          for_each_engine(engine, gt, id) {
> +               struct reg_and_bit rb;
> +
>                  /*
>                   * HW architecture suggest typical invalidation time at 40us,
>                   * with pessimistic cases up to 100us and a recommendation to
> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>                   */
>                  const unsigned int timeout_us = 100;
>                  const unsigned int timeout_ms = 4;
> -               struct reg_and_bit rb;
>   
>                  rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>                  if (!i915_mmio_reg_offset(rb.reg))
>                          continue;
>   
> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>                  if (__intel_wait_for_register_fw(uncore,
>                                                   rb.reg, rb.bit, 0,
>                                                   timeout_us, timeout_ms,
> 

This won't work, as it is not serializing TLB cache invalidation with
i915 resets. Besides that, this is more or less merging patches 1 and 3,
placing patches with different rationales altogether. Upstream rule is
to have one logical change per patch.

> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

From backport PoV, it wouldn't make any difference applying one patch
or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
so, it shouldn't have merge conflicts while backporting it, maybe except
if some functions it calls (or parameters) have changed. On such case,
the backport fix should be trivial, and the end result of backporting
one folded patch or two would be the same.

If any conflict happens, I can help doing the backports.

> > I still think that other TLB patches are needed/desired upstream, but
> > I'll submit them on a separate series. Let's fix the regression first ;-)  
> 
> Yep, that's exactly right.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-30  7:32                 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-30  7:32 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Brost, Thomas Hellström, Mauro Carvalho Chehab,
	Andi Shyti, David Airlie, Mika Kuoppala, intel-gfx, linux-kernel,
	dri-devel, Chris Wilson, Thomas Hellstrom, Chris Wilson,
	Fei Yang, Rodrigo Vivi, Dave Airlie, stable, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison, Bruce Chang

Em Wed, 29 Jun 2022 17:02:59 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:

> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
> > On Tue, 28 Jun 2022 16:49:23 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >   
> >> .. which for me means a different patch 1, followed by patch 6 (moved
> >> to be patch 2) would be ideal stable material.
> >>
> >> Then we have the current patch 2 which is open/unknown (to me at least).
> >>
> >> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>
> >> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>
> >> Could you please double check if what I am suggesting here is feasible
> >> to implement and if it is just send those minimal patches out alone?  
> > 
> > Tested and porting just those 3 patches are enough to fix the Broadwell
> > bug.
> > 
> > So, I submitted a v2 of this series with just those. They all need to
> > be backported to stable.  
> 
> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> 
> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> Author: Chris Wilson <chris.p.wilson@intel.com>
> Date:   Wed Jun 29 16:25:24 2022 +0100
> 
>      drm/i915/gt: Serialize TLB invalidates with GT resets
>      
>      Avoid trying to invalidate the TLB in the middle of performing an
>      engine reset, as this may result in the reset timing out. Currently,
>      the TLB invalidate is only serialised by its own mutex, forgoing the
>      uncore lock, but we can take the uncore->lock as well to serialise
>      the mmio access, thereby serialising with the GDRST.
>      
>      Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>      i915 selftest/hangcheck.
>      
>      Cc: stable@vger.kernel.org
>      Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>      Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>      Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>      Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>      Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 8da3314bb6bf..aaadd0b02043 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>          mutex_lock(&gt->tlb_invalidate_lock);
>          intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> +
> +       for_each_engine(engine, gt, id) {
> +               struct reg_and_bit rb;
> +
> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> +               if (!i915_mmio_reg_offset(rb.reg))
> +                       continue;
> +
> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> +       }
> +
> +       spin_unlock_irq(&uncore->lock);
> +
>          for_each_engine(engine, gt, id) {
> +               struct reg_and_bit rb;
> +
>                  /*
>                   * HW architecture suggest typical invalidation time at 40us,
>                   * with pessimistic cases up to 100us and a recommendation to
> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>                   */
>                  const unsigned int timeout_us = 100;
>                  const unsigned int timeout_ms = 4;
> -               struct reg_and_bit rb;
>   
>                  rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>                  if (!i915_mmio_reg_offset(rb.reg))
>                          continue;
>   
> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>                  if (__intel_wait_for_register_fw(uncore,
>                                                   rb.reg, rb.bit, 0,
>                                                   timeout_us, timeout_ms,
> 

This won't work, as it is not serializing TLB cache invalidation with
i915 resets. Besides that, this is more or less merging patches 1 and 3,
placing patches with different rationales altogether. Upstream rule is
to have one logical change per patch.

> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

From backport PoV, it wouldn't make any difference applying one patch
or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
so, it shouldn't have merge conflicts while backporting it, maybe except
if some functions it calls (or parameters) have changed. On such case,
the backport fix should be trivial, and the end result of backporting
one folded patch or two would be the same.

If any conflict happens, I can help doing the backports.

> > I still think that other TLB patches are needed/desired upstream, but
> > I'll submit them on a separate series. Let's fix the regression first ;-)  
> 
> Yep, that's exactly right.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-30  7:32                 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-30  7:32 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Thomas Hellström, Mauro Carvalho Chehab, David Airlie,
	intel-gfx, linux-kernel, dri-devel, Chris Wilson,
	Thomas Hellstrom, Chris Wilson, Rodrigo Vivi, Dave Airlie,
	stable, Tejas Upadhyay

Em Wed, 29 Jun 2022 17:02:59 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:

> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
> > On Tue, 28 Jun 2022 16:49:23 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >   
> >> .. which for me means a different patch 1, followed by patch 6 (moved
> >> to be patch 2) would be ideal stable material.
> >>
> >> Then we have the current patch 2 which is open/unknown (to me at least).
> >>
> >> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>
> >> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>
> >> Could you please double check if what I am suggesting here is feasible
> >> to implement and if it is just send those minimal patches out alone?  
> > 
> > Tested and porting just those 3 patches are enough to fix the Broadwell
> > bug.
> > 
> > So, I submitted a v2 of this series with just those. They all need to
> > be backported to stable.  
> 
> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> 
> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> Author: Chris Wilson <chris.p.wilson@intel.com>
> Date:   Wed Jun 29 16:25:24 2022 +0100
> 
>      drm/i915/gt: Serialize TLB invalidates with GT resets
>      
>      Avoid trying to invalidate the TLB in the middle of performing an
>      engine reset, as this may result in the reset timing out. Currently,
>      the TLB invalidate is only serialised by its own mutex, forgoing the
>      uncore lock, but we can take the uncore->lock as well to serialise
>      the mmio access, thereby serialising with the GDRST.
>      
>      Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>      i915 selftest/hangcheck.
>      
>      Cc: stable@vger.kernel.org
>      Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>      Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>      Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>      Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>      Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>      Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 8da3314bb6bf..aaadd0b02043 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>          mutex_lock(&gt->tlb_invalidate_lock);
>          intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> +
> +       for_each_engine(engine, gt, id) {
> +               struct reg_and_bit rb;
> +
> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> +               if (!i915_mmio_reg_offset(rb.reg))
> +                       continue;
> +
> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> +       }
> +
> +       spin_unlock_irq(&uncore->lock);
> +
>          for_each_engine(engine, gt, id) {
> +               struct reg_and_bit rb;
> +
>                  /*
>                   * HW architecture suggest typical invalidation time at 40us,
>                   * with pessimistic cases up to 100us and a recommendation to
> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>                   */
>                  const unsigned int timeout_us = 100;
>                  const unsigned int timeout_ms = 4;
> -               struct reg_and_bit rb;
>   
>                  rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>                  if (!i915_mmio_reg_offset(rb.reg))
>                          continue;
>   
> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>                  if (__intel_wait_for_register_fw(uncore,
>                                                   rb.reg, rb.bit, 0,
>                                                   timeout_us, timeout_ms,
> 

This won't work, as it is not serializing TLB cache invalidation with
i915 resets. Besides that, this is more or less merging patches 1 and 3,
placing patches with different rationales altogether. Upstream rule is
to have one logical change per patch.

> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

From backport PoV, it wouldn't make any difference applying one patch
or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
so, it shouldn't have merge conflicts while backporting it, maybe except
if some functions it calls (or parameters) have changed. On such case,
the backport fix should be trivial, and the end result of backporting
one folded patch or two would be the same.

If any conflict happens, I can help doing the backports.

> > I still think that other TLB patches are needed/desired upstream, but
> > I'll submit them on a separate series. Let's fix the regression first ;-)  
> 
> Yep, that's exactly right.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-30  7:32                 ` Mauro Carvalho Chehab
  (?)
@ 2022-06-30  8:12                   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-30  8:12 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Matthew Brost, Thomas Hellström, Mauro Carvalho Chehab,
	Andi Shyti, David Airlie, Mika Kuoppala, intel-gfx, linux-kernel,
	dri-devel, Chris Wilson, Thomas Hellstrom, Chris Wilson,
	Fei Yang, Rodrigo Vivi, Dave Airlie, stable, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison, Bruce Chang

[-- Attachment #1: Type: text/plain, Size: 6309 bytes --]


On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
> Em Wed, 29 Jun 2022 17:02:59 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> 
>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
>>> On Tue, 28 Jun 2022 16:49:23 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>    
>>>> .. which for me means a different patch 1, followed by patch 6 (moved
>>>> to be patch 2) would be ideal stable material.
>>>>
>>>> Then we have the current patch 2 which is open/unknown (to me at least).
>>>>
>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>>>
>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>>>
>>>> Could you please double check if what I am suggesting here is feasible
>>>> to implement and if it is just send those minimal patches out alone?
>>>
>>> Tested and porting just those 3 patches are enough to fix the Broadwell
>>> bug.
>>>
>>> So, I submitted a v2 of this series with just those. They all need to
>>> be backported to stable.
>>
>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
>>
>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
>> Author: Chris Wilson <chris.p.wilson@intel.com>
>> Date:   Wed Jun 29 16:25:24 2022 +0100
>>
>>       drm/i915/gt: Serialize TLB invalidates with GT resets
>>       
>>       Avoid trying to invalidate the TLB in the middle of performing an
>>       engine reset, as this may result in the reset timing out. Currently,
>>       the TLB invalidate is only serialised by its own mutex, forgoing the
>>       uncore lock, but we can take the uncore->lock as well to serialise
>>       the mmio access, thereby serialising with the GDRST.
>>       
>>       Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>>       i915 selftest/hangcheck.
>>       
>>       Cc: stable@vger.kernel.org
>>       Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>       Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>>       Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>       Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>       Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>>       Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>> index 8da3314bb6bf..aaadd0b02043 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>           mutex_lock(&gt->tlb_invalidate_lock);
>>           intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>>    
>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>> +
>> +       for_each_engine(engine, gt, id) {
>> +               struct reg_and_bit rb;
>> +
>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>> +               if (!i915_mmio_reg_offset(rb.reg))
>> +                       continue;
>> +
>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>> +       }
>> +
>> +       spin_unlock_irq(&uncore->lock);
>> +
>>           for_each_engine(engine, gt, id) {
>> +               struct reg_and_bit rb;
>> +
>>                   /*
>>                    * HW architecture suggest typical invalidation time at 40us,
>>                    * with pessimistic cases up to 100us and a recommendation to
>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>                    */
>>                   const unsigned int timeout_us = 100;
>>                   const unsigned int timeout_ms = 4;
>> -               struct reg_and_bit rb;
>>    
>>                   rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>                   if (!i915_mmio_reg_offset(rb.reg))
>>                           continue;
>>    
>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>                   if (__intel_wait_for_register_fw(uncore,
>>                                                    rb.reg, rb.bit, 0,
>>                                                    timeout_us, timeout_ms,
>>
> 
> This won't work, as it is not serializing TLB cache invalidation with
> i915 resets. Besides that, this is more or less merging patches 1 and 3,

Could you explain why you think it is not doing exactly that? In both 
versions end result is TLB flush requests are under the uncore lock and 
waits are outside it.

> placing patches with different rationales altogether. Upstream rule is
> to have one logical change per patch.

I don't think it applies in this case. It is simply splitting into two 
loops so lock can be held across all mmio writes. I think of it this way 
- what is the rationale for sending only the first patch to stable? What 
does it _fix_ on it's own?

>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.
> 
>  From backport PoV, it wouldn't make any difference applying one patch
> or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
> changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
> so, it shouldn't have merge conflicts while backporting it, maybe except
> if some functions it calls (or parameters) have changed. On such case,
> the backport fix should be trivial, and the end result of backporting
> one folded patch or two would be the same.

Yes a lot of things changed. Not least engine and GT pm code. Note that 
TLB flushing was backported all the way to 4.4 so any hunk you don't 
strictly need can and will bite you. I have attached a tarball of 
patches for you to explore. :)
Regards,

Tvrtko

> If any conflict happens, I can help doing the backports.
> 
>>> I still think that other TLB patches are needed/desired upstream, but
>>> I'll submit them on a separate series. Let's fix the regression first ;-)
>>
>> Yep, that's exactly right.
>>
>> Regards,
>>
>> Tvrtko

[-- Attachment #2: tlbflush-220114-patches.tar.gz --]
[-- Type: application/gzip, Size: 10180 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-30  8:12                   ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-30  8:12 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Thomas Hellström, Mauro Carvalho Chehab, David Airlie,
	intel-gfx, linux-kernel, dri-devel, Chris Wilson,
	Thomas Hellstrom, Chris Wilson, Rodrigo Vivi, Dave Airlie,
	stable, Tejas Upadhyay

[-- Attachment #1: Type: text/plain, Size: 6309 bytes --]


On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
> Em Wed, 29 Jun 2022 17:02:59 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> 
>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
>>> On Tue, 28 Jun 2022 16:49:23 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>    
>>>> .. which for me means a different patch 1, followed by patch 6 (moved
>>>> to be patch 2) would be ideal stable material.
>>>>
>>>> Then we have the current patch 2 which is open/unknown (to me at least).
>>>>
>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>>>
>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>>>
>>>> Could you please double check if what I am suggesting here is feasible
>>>> to implement and if it is just send those minimal patches out alone?
>>>
>>> Tested and porting just those 3 patches are enough to fix the Broadwell
>>> bug.
>>>
>>> So, I submitted a v2 of this series with just those. They all need to
>>> be backported to stable.
>>
>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
>>
>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
>> Author: Chris Wilson <chris.p.wilson@intel.com>
>> Date:   Wed Jun 29 16:25:24 2022 +0100
>>
>>       drm/i915/gt: Serialize TLB invalidates with GT resets
>>       
>>       Avoid trying to invalidate the TLB in the middle of performing an
>>       engine reset, as this may result in the reset timing out. Currently,
>>       the TLB invalidate is only serialised by its own mutex, forgoing the
>>       uncore lock, but we can take the uncore->lock as well to serialise
>>       the mmio access, thereby serialising with the GDRST.
>>       
>>       Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>>       i915 selftest/hangcheck.
>>       
>>       Cc: stable@vger.kernel.org
>>       Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>       Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>>       Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>       Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>       Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>>       Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>> index 8da3314bb6bf..aaadd0b02043 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>           mutex_lock(&gt->tlb_invalidate_lock);
>>           intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>>    
>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>> +
>> +       for_each_engine(engine, gt, id) {
>> +               struct reg_and_bit rb;
>> +
>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>> +               if (!i915_mmio_reg_offset(rb.reg))
>> +                       continue;
>> +
>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>> +       }
>> +
>> +       spin_unlock_irq(&uncore->lock);
>> +
>>           for_each_engine(engine, gt, id) {
>> +               struct reg_and_bit rb;
>> +
>>                   /*
>>                    * HW architecture suggest typical invalidation time at 40us,
>>                    * with pessimistic cases up to 100us and a recommendation to
>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>                    */
>>                   const unsigned int timeout_us = 100;
>>                   const unsigned int timeout_ms = 4;
>> -               struct reg_and_bit rb;
>>    
>>                   rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>                   if (!i915_mmio_reg_offset(rb.reg))
>>                           continue;
>>    
>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>                   if (__intel_wait_for_register_fw(uncore,
>>                                                    rb.reg, rb.bit, 0,
>>                                                    timeout_us, timeout_ms,
>>
> 
> This won't work, as it is not serializing TLB cache invalidation with
> i915 resets. Besides that, this is more or less merging patches 1 and 3,

Could you explain why you think it is not doing exactly that? In both 
versions end result is TLB flush requests are under the uncore lock and 
waits are outside it.

> placing patches with different rationales altogether. Upstream rule is
> to have one logical change per patch.

I don't think it applies in this case. It is simply splitting into two 
loops so lock can be held across all mmio writes. I think of it this way 
- what is the rationale for sending only the first patch to stable? What 
does it _fix_ on it's own?

>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.
> 
>  From backport PoV, it wouldn't make any difference applying one patch
> or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
> changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
> so, it shouldn't have merge conflicts while backporting it, maybe except
> if some functions it calls (or parameters) have changed. On such case,
> the backport fix should be trivial, and the end result of backporting
> one folded patch or two would be the same.

Yes a lot of things changed. Not least engine and GT pm code. Note that 
TLB flushing was backported all the way to 4.4 so any hunk you don't 
strictly need can and will bite you. I have attached a tarball of 
patches for you to explore. :)
Regards,

Tvrtko

> If any conflict happens, I can help doing the backports.
> 
>>> I still think that other TLB patches are needed/desired upstream, but
>>> I'll submit them on a separate series. Let's fix the regression first ;-)
>>
>> Yep, that's exactly right.
>>
>> Regards,
>>
>> Tvrtko

[-- Attachment #2: tlbflush-220114-patches.tar.gz --]
[-- Type: application/gzip, Size: 10180 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-30  8:12                   ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-06-30  8:12 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Chris Wilson,
	Fei Yang, Matthew Brost, Mika Kuoppala, Chris Wilson, Andi Shyti,
	Dave Airlie, Thomas Hellström, intel-gfx, Thomas Hellstrom,
	Rodrigo Vivi, linux-kernel, stable, Bruce Chang, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison

[-- Attachment #1: Type: text/plain, Size: 6309 bytes --]


On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
> Em Wed, 29 Jun 2022 17:02:59 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> 
>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
>>> On Tue, 28 Jun 2022 16:49:23 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>    
>>>> .. which for me means a different patch 1, followed by patch 6 (moved
>>>> to be patch 2) would be ideal stable material.
>>>>
>>>> Then we have the current patch 2 which is open/unknown (to me at least).
>>>>
>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>>>
>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>>>
>>>> Could you please double check if what I am suggesting here is feasible
>>>> to implement and if it is just send those minimal patches out alone?
>>>
>>> Tested and porting just those 3 patches are enough to fix the Broadwell
>>> bug.
>>>
>>> So, I submitted a v2 of this series with just those. They all need to
>>> be backported to stable.
>>
>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
>>
>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
>> Author: Chris Wilson <chris.p.wilson@intel.com>
>> Date:   Wed Jun 29 16:25:24 2022 +0100
>>
>>       drm/i915/gt: Serialize TLB invalidates with GT resets
>>       
>>       Avoid trying to invalidate the TLB in the middle of performing an
>>       engine reset, as this may result in the reset timing out. Currently,
>>       the TLB invalidate is only serialised by its own mutex, forgoing the
>>       uncore lock, but we can take the uncore->lock as well to serialise
>>       the mmio access, thereby serialising with the GDRST.
>>       
>>       Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>>       i915 selftest/hangcheck.
>>       
>>       Cc: stable@vger.kernel.org
>>       Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>       Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>>       Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>       Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>       Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>>       Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>       Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>> index 8da3314bb6bf..aaadd0b02043 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>           mutex_lock(&gt->tlb_invalidate_lock);
>>           intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>>    
>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>> +
>> +       for_each_engine(engine, gt, id) {
>> +               struct reg_and_bit rb;
>> +
>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>> +               if (!i915_mmio_reg_offset(rb.reg))
>> +                       continue;
>> +
>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>> +       }
>> +
>> +       spin_unlock_irq(&uncore->lock);
>> +
>>           for_each_engine(engine, gt, id) {
>> +               struct reg_and_bit rb;
>> +
>>                   /*
>>                    * HW architecture suggest typical invalidation time at 40us,
>>                    * with pessimistic cases up to 100us and a recommendation to
>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>                    */
>>                   const unsigned int timeout_us = 100;
>>                   const unsigned int timeout_ms = 4;
>> -               struct reg_and_bit rb;
>>    
>>                   rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>                   if (!i915_mmio_reg_offset(rb.reg))
>>                           continue;
>>    
>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>                   if (__intel_wait_for_register_fw(uncore,
>>                                                    rb.reg, rb.bit, 0,
>>                                                    timeout_us, timeout_ms,
>>
> 
> This won't work, as it is not serializing TLB cache invalidation with
> i915 resets. Besides that, this is more or less merging patches 1 and 3,

Could you explain why you think it is not doing exactly that? In both 
versions end result is TLB flush requests are under the uncore lock and 
waits are outside it.

> placing patches with different rationales altogether. Upstream rule is
> to have one logical change per patch.

I don't think it applies in this case. It is simply splitting into two 
loops so lock can be held across all mmio writes. I think of it this way 
- what is the rationale for sending only the first patch to stable? What 
does it _fix_ on it's own?

>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.
> 
>  From backport PoV, it wouldn't make any difference applying one patch
> or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
> changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
> so, it shouldn't have merge conflicts while backporting it, maybe except
> if some functions it calls (or parameters) have changed. On such case,
> the backport fix should be trivial, and the end result of backporting
> one folded patch or two would be the same.

Yes a lot of things changed. Not least engine and GT pm code. Note that 
TLB flushing was backported all the way to 4.4 so any hunk you don't 
strictly need can and will bite you. I have attached a tarball of 
patches for you to explore. :)
Regards,

Tvrtko

> If any conflict happens, I can help doing the backports.
> 
>>> I still think that other TLB patches are needed/desired upstream, but
>>> I'll submit them on a separate series. Let's fix the regression first ;-)
>>
>> Yep, that's exactly right.
>>
>> Regards,
>>
>> Tvrtko

[-- Attachment #2: tlbflush-220114-patches.tar.gz --]
[-- Type: application/gzip, Size: 10180 bytes --]

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-30  8:12                   ` [Intel-gfx] " Tvrtko Ursulin
  (?)
@ 2022-06-30 16:01                     ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-30 16:01 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Brost, Thomas Hellström, Mauro Carvalho Chehab,
	Andi Shyti, David Airlie, Mika Kuoppala, intel-gfx, linux-kernel,
	dri-devel, Chris Wilson, Thomas Hellstrom, Chris Wilson,
	Fei Yang, Rodrigo Vivi, Dave Airlie, stable, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison, Bruce Chang

Em Thu, 30 Jun 2022 09:12:41 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:

> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
> > Em Wed, 29 Jun 2022 17:02:59 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >   
> >> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:  
> >>> On Tue, 28 Jun 2022 16:49:23 +0100
> >>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >>>      
> >>>> .. which for me means a different patch 1, followed by patch 6 (moved
> >>>> to be patch 2) would be ideal stable material.
> >>>>
> >>>> Then we have the current patch 2 which is open/unknown (to me at least).
> >>>>
> >>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>>>
> >>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>>>
> >>>> Could you please double check if what I am suggesting here is feasible
> >>>> to implement and if it is just send those minimal patches out alone?  
> >>>
> >>> Tested and porting just those 3 patches are enough to fix the Broadwell
> >>> bug.
> >>>
> >>> So, I submitted a v2 of this series with just those. They all need to
> >>> be backported to stable.  
> >>
> >> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> >>
> >> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> >> Author: Chris Wilson <chris.p.wilson@intel.com>
> >> Date:   Wed Jun 29 16:25:24 2022 +0100
> >>
> >>       drm/i915/gt: Serialize TLB invalidates with GT resets
> >>       
> >>       Avoid trying to invalidate the TLB in the middle of performing an
> >>       engine reset, as this may result in the reset timing out. Currently,
> >>       the TLB invalidate is only serialised by its own mutex, forgoing the
> >>       uncore lock, but we can take the uncore->lock as well to serialise
> >>       the mmio access, thereby serialising with the GDRST.
> >>       
> >>       Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> >>       i915 selftest/hangcheck.
> >>       
> >>       Cc: stable@vger.kernel.org
> >>       Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>       Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> >>       Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>       Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >>       Reviewed-by: Andi Shyti <andi.shyti@intel.com>
> >>       Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> >> index 8da3314bb6bf..aaadd0b02043 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>           mutex_lock(&gt->tlb_invalidate_lock);
> >>           intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
> >>    
> >> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> >> +
> >> +       for_each_engine(engine, gt, id) {
> >> +               struct reg_and_bit rb;
> >> +
> >> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >> +               if (!i915_mmio_reg_offset(rb.reg))
> >> +                       continue;
> >> +
> >> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >> +       }
> >> +
> >> +       spin_unlock_irq(&uncore->lock);
> >> +
> >>           for_each_engine(engine, gt, id) {
> >> +               struct reg_and_bit rb;
> >> +
> >>                   /*
> >>                    * HW architecture suggest typical invalidation time at 40us,
> >>                    * with pessimistic cases up to 100us and a recommendation to
> >> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>                    */
> >>                   const unsigned int timeout_us = 100;
> >>                   const unsigned int timeout_ms = 4;
> >> -               struct reg_and_bit rb;
> >>    
> >>                   rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>                   if (!i915_mmio_reg_offset(rb.reg))
> >>                           continue;
> >>    
> >> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>                   if (__intel_wait_for_register_fw(uncore,
> >>                                                    rb.reg, rb.bit, 0,
> >>                                                    timeout_us, timeout_ms,
> >>  
> > 
> > This won't work, as it is not serializing TLB cache invalidation with
> > i915 resets. Besides that, this is more or less merging patches 1 and 3,  
> 
> Could you explain why you think it is not doing exactly that? In both 
> versions end result is TLB flush requests are under the uncore lock and 
> waits are outside it.

Sure, but patch 2/3 (see v2) serializes i915 reset with TLB cache changes.
This is needed in order to fix the regression.

> > placing patches with different rationales altogether. Upstream rule is
> > to have one logical change per patch.  
> 
> I don't think it applies in this case. It is simply splitting into two 
> loops so lock can be held across all mmio writes. I think of it this way 
> - what is the rationale for sending only the first patch to stable? What 
> does it _fix_ on it's own?

There's no -stable rule enforcing that only one patch would be allowed,
nor saying that patches should be fold, doing multiple changes on as single
patch just due to "Fixes" tag.

So, while several -stable fixes can be done on a single patch, there are
fixes that will require multiple patches. That's nothing wrong with that.

The only rule is that backports should follow what's merged upstream.
So, if, in order to fix a regression, multiple patches are needed upstream,
in principle, all of those can be backported if they fit at -stable rules.

As an example, once we backported a patch series on media that had ~20 patches,
addressing security issues at the media compat32 logic (media ioctls usually
pass structs and some with pointers). As the issue was discovered several
years after compat32 got introduced, those 22 patches (some containing
compat32 redesigns) had to be backported to all maintained LTS.

-

In this specific case, fixing the regression requires 3 logical changes:

	1) Split the loop;
	2) Add serialize logic to i915 reset;
	3) use the same i915 reset spinlock to serialize TLB cache
	   invalidation.

Neither one of those logical changes alone would solve the issue. That's
why I originally added the same Fixes: to the entire series: basically,
any Kernel that has the TLB patch backported will require those
three logical changes to be backported too.

That basically will follow what's there at the Kernel process docs:

	"If your patch fixes a bug in a specific commit, e.g. you found an issue using
	 ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
	 the SHA-1 ID, and the one line summary."

	Documentation/process/submitting-patches.rst

See, Fixes was originally introduced to be a hint to help stable 
and distro maintainers to identify how far they need to backport
a patch. That's mainly why I placed fixes to the entire series. 
Yet, the same will also happen, in practice, if we place:

	Cc: stable@vger.kernel.org # Up to version 4.4

Greg, Sasha and others -stable/distro maintainers will also have a 
(much less precise) hint about how far the backport is needed.

>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.  
> > 
> >  From backport PoV, it wouldn't make any difference applying one patch
> > or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
> > changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
> > so, it shouldn't have merge conflicts while backporting it, maybe except
> > if some functions it calls (or parameters) have changed. On such case,
> > the backport fix should be trivial, and the end result of backporting
> > one folded patch or two would be the same.  
> 
> Yes a lot of things changed. Not least engine and GT pm code. Note that 
> TLB flushing was backported all the way to 4.4 so any hunk you don't 
> strictly need can and will bite you. I have attached a tarball of 
> patches for you to explore. :)
> Regards,

Thanks! That's very helpful to check the amount of work. It makes easy
to use interdiff and (k)diff3 to check what changed.

From it, the differences between 5.4 and 5.16 at intel_gt_invalidate_tlbs()
are really trivial.

On 4.14, the function was added on a different file (intel_gem), and
there were a few more API differences, as only gen8 code is there,
but again, the changes are trivial: mostly macros/functions were renamed
and some function parameters changed.

From 4.9 to 4.14 there were also some changes but they also look trivial.

Kernel 4.4 has some other differences - the loop logic is different, and
there's a ring initialization function, but, as version 4.4 is not listed
anymore as LTS at kernel.org, we probably need to backport only up to
4.9.

All the above should be affecting patch v2 1/3. Patches v2 2/3 and 3/3 just
have spin lock/unlock for the gt uncore spinlock. Those will very likely
require some work on Kernels 4.x, but folding (or not) the patches won't
really help.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-30 16:01                     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-30 16:01 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Brost, Thomas Hellström, Mauro Carvalho Chehab,
	Fei Yang, David Airlie, Mika Kuoppala, intel-gfx, linux-kernel,
	dri-devel, Chris Wilson, Thomas Hellstrom, Chris Wilson,
	Andi Shyti, Rodrigo Vivi, Dave Airlie, stable, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison, Bruce Chang

Em Thu, 30 Jun 2022 09:12:41 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:

> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
> > Em Wed, 29 Jun 2022 17:02:59 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >   
> >> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:  
> >>> On Tue, 28 Jun 2022 16:49:23 +0100
> >>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >>>      
> >>>> .. which for me means a different patch 1, followed by patch 6 (moved
> >>>> to be patch 2) would be ideal stable material.
> >>>>
> >>>> Then we have the current patch 2 which is open/unknown (to me at least).
> >>>>
> >>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>>>
> >>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>>>
> >>>> Could you please double check if what I am suggesting here is feasible
> >>>> to implement and if it is just send those minimal patches out alone?  
> >>>
> >>> Tested and porting just those 3 patches are enough to fix the Broadwell
> >>> bug.
> >>>
> >>> So, I submitted a v2 of this series with just those. They all need to
> >>> be backported to stable.  
> >>
> >> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> >>
> >> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> >> Author: Chris Wilson <chris.p.wilson@intel.com>
> >> Date:   Wed Jun 29 16:25:24 2022 +0100
> >>
> >>       drm/i915/gt: Serialize TLB invalidates with GT resets
> >>       
> >>       Avoid trying to invalidate the TLB in the middle of performing an
> >>       engine reset, as this may result in the reset timing out. Currently,
> >>       the TLB invalidate is only serialised by its own mutex, forgoing the
> >>       uncore lock, but we can take the uncore->lock as well to serialise
> >>       the mmio access, thereby serialising with the GDRST.
> >>       
> >>       Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> >>       i915 selftest/hangcheck.
> >>       
> >>       Cc: stable@vger.kernel.org
> >>       Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>       Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> >>       Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>       Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >>       Reviewed-by: Andi Shyti <andi.shyti@intel.com>
> >>       Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> >> index 8da3314bb6bf..aaadd0b02043 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>           mutex_lock(&gt->tlb_invalidate_lock);
> >>           intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
> >>    
> >> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> >> +
> >> +       for_each_engine(engine, gt, id) {
> >> +               struct reg_and_bit rb;
> >> +
> >> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >> +               if (!i915_mmio_reg_offset(rb.reg))
> >> +                       continue;
> >> +
> >> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >> +       }
> >> +
> >> +       spin_unlock_irq(&uncore->lock);
> >> +
> >>           for_each_engine(engine, gt, id) {
> >> +               struct reg_and_bit rb;
> >> +
> >>                   /*
> >>                    * HW architecture suggest typical invalidation time at 40us,
> >>                    * with pessimistic cases up to 100us and a recommendation to
> >> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>                    */
> >>                   const unsigned int timeout_us = 100;
> >>                   const unsigned int timeout_ms = 4;
> >> -               struct reg_and_bit rb;
> >>    
> >>                   rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>                   if (!i915_mmio_reg_offset(rb.reg))
> >>                           continue;
> >>    
> >> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>                   if (__intel_wait_for_register_fw(uncore,
> >>                                                    rb.reg, rb.bit, 0,
> >>                                                    timeout_us, timeout_ms,
> >>  
> > 
> > This won't work, as it is not serializing TLB cache invalidation with
> > i915 resets. Besides that, this is more or less merging patches 1 and 3,  
> 
> Could you explain why you think it is not doing exactly that? In both 
> versions end result is TLB flush requests are under the uncore lock and 
> waits are outside it.

Sure, but patch 2/3 (see v2) serializes i915 reset with TLB cache changes.
This is needed in order to fix the regression.

> > placing patches with different rationales altogether. Upstream rule is
> > to have one logical change per patch.  
> 
> I don't think it applies in this case. It is simply splitting into two 
> loops so lock can be held across all mmio writes. I think of it this way 
> - what is the rationale for sending only the first patch to stable? What 
> does it _fix_ on it's own?

There's no -stable rule enforcing that only one patch would be allowed,
nor saying that patches should be fold, doing multiple changes on as single
patch just due to "Fixes" tag.

So, while several -stable fixes can be done on a single patch, there are
fixes that will require multiple patches. That's nothing wrong with that.

The only rule is that backports should follow what's merged upstream.
So, if, in order to fix a regression, multiple patches are needed upstream,
in principle, all of those can be backported if they fit at -stable rules.

As an example, once we backported a patch series on media that had ~20 patches,
addressing security issues at the media compat32 logic (media ioctls usually
pass structs and some with pointers). As the issue was discovered several
years after compat32 got introduced, those 22 patches (some containing
compat32 redesigns) had to be backported to all maintained LTS.

-

In this specific case, fixing the regression requires 3 logical changes:

	1) Split the loop;
	2) Add serialize logic to i915 reset;
	3) use the same i915 reset spinlock to serialize TLB cache
	   invalidation.

Neither one of those logical changes alone would solve the issue. That's
why I originally added the same Fixes: to the entire series: basically,
any Kernel that has the TLB patch backported will require those
three logical changes to be backported too.

That basically will follow what's there at the Kernel process docs:

	"If your patch fixes a bug in a specific commit, e.g. you found an issue using
	 ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
	 the SHA-1 ID, and the one line summary."

	Documentation/process/submitting-patches.rst

See, Fixes was originally introduced to be a hint to help stable 
and distro maintainers to identify how far they need to backport
a patch. That's mainly why I placed fixes to the entire series. 
Yet, the same will also happen, in practice, if we place:

	Cc: stable@vger.kernel.org # Up to version 4.4

Greg, Sasha and others -stable/distro maintainers will also have a 
(much less precise) hint about how far the backport is needed.

>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.  
> > 
> >  From backport PoV, it wouldn't make any difference applying one patch
> > or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
> > changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
> > so, it shouldn't have merge conflicts while backporting it, maybe except
> > if some functions it calls (or parameters) have changed. On such case,
> > the backport fix should be trivial, and the end result of backporting
> > one folded patch or two would be the same.  
> 
> Yes a lot of things changed. Not least engine and GT pm code. Note that 
> TLB flushing was backported all the way to 4.4 so any hunk you don't 
> strictly need can and will bite you. I have attached a tarball of 
> patches for you to explore. :)
> Regards,

Thanks! That's very helpful to check the amount of work. It makes easy
to use interdiff and (k)diff3 to check what changed.

From it, the differences between 5.4 and 5.16 at intel_gt_invalidate_tlbs()
are really trivial.

On 4.14, the function was added on a different file (intel_gem), and
there were a few more API differences, as only gen8 code is there,
but again, the changes are trivial: mostly macros/functions were renamed
and some function parameters changed.

From 4.9 to 4.14 there were also some changes but they also look trivial.

Kernel 4.4 has some other differences - the loop logic is different, and
there's a ring initialization function, but, as version 4.4 is not listed
anymore as LTS at kernel.org, we probably need to backport only up to
4.9.

All the above should be affecting patch v2 1/3. Patches v2 2/3 and 3/3 just
have spin lock/unlock for the gt uncore spinlock. Those will very likely
require some work on Kernels 4.x, but folding (or not) the patches won't
really help.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-06-30 16:01                     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-06-30 16:01 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Thomas Hellström, Mauro Carvalho Chehab, David Airlie,
	intel-gfx, linux-kernel, dri-devel, Chris Wilson,
	Thomas Hellstrom, Chris Wilson, Rodrigo Vivi, Dave Airlie,
	stable, Tejas Upadhyay

Em Thu, 30 Jun 2022 09:12:41 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:

> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
> > Em Wed, 29 Jun 2022 17:02:59 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >   
> >> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:  
> >>> On Tue, 28 Jun 2022 16:49:23 +0100
> >>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >>>      
> >>>> .. which for me means a different patch 1, followed by patch 6 (moved
> >>>> to be patch 2) would be ideal stable material.
> >>>>
> >>>> Then we have the current patch 2 which is open/unknown (to me at least).
> >>>>
> >>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>>>
> >>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>>>
> >>>> Could you please double check if what I am suggesting here is feasible
> >>>> to implement and if it is just send those minimal patches out alone?  
> >>>
> >>> Tested and porting just those 3 patches are enough to fix the Broadwell
> >>> bug.
> >>>
> >>> So, I submitted a v2 of this series with just those. They all need to
> >>> be backported to stable.  
> >>
> >> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> >>
> >> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> >> Author: Chris Wilson <chris.p.wilson@intel.com>
> >> Date:   Wed Jun 29 16:25:24 2022 +0100
> >>
> >>       drm/i915/gt: Serialize TLB invalidates with GT resets
> >>       
> >>       Avoid trying to invalidate the TLB in the middle of performing an
> >>       engine reset, as this may result in the reset timing out. Currently,
> >>       the TLB invalidate is only serialised by its own mutex, forgoing the
> >>       uncore lock, but we can take the uncore->lock as well to serialise
> >>       the mmio access, thereby serialising with the GDRST.
> >>       
> >>       Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> >>       i915 selftest/hangcheck.
> >>       
> >>       Cc: stable@vger.kernel.org
> >>       Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>       Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> >>       Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>       Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >>       Reviewed-by: Andi Shyti <andi.shyti@intel.com>
> >>       Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>       Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> >> index 8da3314bb6bf..aaadd0b02043 100644
> >> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>           mutex_lock(&gt->tlb_invalidate_lock);
> >>           intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
> >>    
> >> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> >> +
> >> +       for_each_engine(engine, gt, id) {
> >> +               struct reg_and_bit rb;
> >> +
> >> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >> +               if (!i915_mmio_reg_offset(rb.reg))
> >> +                       continue;
> >> +
> >> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >> +       }
> >> +
> >> +       spin_unlock_irq(&uncore->lock);
> >> +
> >>           for_each_engine(engine, gt, id) {
> >> +               struct reg_and_bit rb;
> >> +
> >>                   /*
> >>                    * HW architecture suggest typical invalidation time at 40us,
> >>                    * with pessimistic cases up to 100us and a recommendation to
> >> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>                    */
> >>                   const unsigned int timeout_us = 100;
> >>                   const unsigned int timeout_ms = 4;
> >> -               struct reg_and_bit rb;
> >>    
> >>                   rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>                   if (!i915_mmio_reg_offset(rb.reg))
> >>                           continue;
> >>    
> >> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>                   if (__intel_wait_for_register_fw(uncore,
> >>                                                    rb.reg, rb.bit, 0,
> >>                                                    timeout_us, timeout_ms,
> >>  
> > 
> > This won't work, as it is not serializing TLB cache invalidation with
> > i915 resets. Besides that, this is more or less merging patches 1 and 3,  
> 
> Could you explain why you think it is not doing exactly that? In both 
> versions end result is TLB flush requests are under the uncore lock and 
> waits are outside it.

Sure, but patch 2/3 (see v2) serializes i915 reset with TLB cache changes.
This is needed in order to fix the regression.

> > placing patches with different rationales altogether. Upstream rule is
> > to have one logical change per patch.  
> 
> I don't think it applies in this case. It is simply splitting into two 
> loops so lock can be held across all mmio writes. I think of it this way 
> - what is the rationale for sending only the first patch to stable? What 
> does it _fix_ on it's own?

There's no -stable rule enforcing that only one patch would be allowed,
nor saying that patches should be fold, doing multiple changes on as single
patch just due to "Fixes" tag.

So, while several -stable fixes can be done on a single patch, there are
fixes that will require multiple patches. That's nothing wrong with that.

The only rule is that backports should follow what's merged upstream.
So, if, in order to fix a regression, multiple patches are needed upstream,
in principle, all of those can be backported if they fit at -stable rules.

As an example, once we backported a patch series on media that had ~20 patches,
addressing security issues at the media compat32 logic (media ioctls usually
pass structs and some with pointers). As the issue was discovered several
years after compat32 got introduced, those 22 patches (some containing
compat32 redesigns) had to be backported to all maintained LTS.

-

In this specific case, fixing the regression requires 3 logical changes:

	1) Split the loop;
	2) Add serialize logic to i915 reset;
	3) use the same i915 reset spinlock to serialize TLB cache
	   invalidation.

Neither one of those logical changes alone would solve the issue. That's
why I originally added the same Fixes: to the entire series: basically,
any Kernel that has the TLB patch backported will require those
three logical changes to be backported too.

That basically will follow what's there at the Kernel process docs:

	"If your patch fixes a bug in a specific commit, e.g. you found an issue using
	 ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
	 the SHA-1 ID, and the one line summary."

	Documentation/process/submitting-patches.rst

See, Fixes was originally introduced to be a hint to help stable 
and distro maintainers to identify how far they need to backport
a patch. That's mainly why I placed fixes to the entire series. 
Yet, the same will also happen, in practice, if we place:

	Cc: stable@vger.kernel.org # Up to version 4.4

Greg, Sasha and others -stable/distro maintainers will also have a 
(much less precise) hint about how far the backport is needed.

>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.  
> > 
> >  From backport PoV, it wouldn't make any difference applying one patch
> > or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
> > changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
> > so, it shouldn't have merge conflicts while backporting it, maybe except
> > if some functions it calls (or parameters) have changed. On such case,
> > the backport fix should be trivial, and the end result of backporting
> > one folded patch or two would be the same.  
> 
> Yes a lot of things changed. Not least engine and GT pm code. Note that 
> TLB flushing was backported all the way to 4.4 so any hunk you don't 
> strictly need can and will bite you. I have attached a tarball of 
> patches for you to explore. :)
> Regards,

Thanks! That's very helpful to check the amount of work. It makes easy
to use interdiff and (k)diff3 to check what changed.

From it, the differences between 5.4 and 5.16 at intel_gt_invalidate_tlbs()
are really trivial.

On 4.14, the function was added on a different file (intel_gem), and
there were a few more API differences, as only gen8 code is there,
but again, the changes are trivial: mostly macros/functions were renamed
and some function parameters changed.

From 4.9 to 4.14 there were also some changes but they also look trivial.

Kernel 4.4 has some other differences - the loop logic is different, and
there's a ring initialization function, but, as version 4.4 is not listed
anymore as LTS at kernel.org, we probably need to backport only up to
4.9.

All the above should be affecting patch v2 1/3. Patches v2 2/3 and 3/3 just
have spin lock/unlock for the gt uncore spinlock. Those will very likely
require some work on Kernels 4.x, but folding (or not) the patches won't
really help.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-06-30 16:01                     ` Mauro Carvalho Chehab
  (?)
@ 2022-07-01  7:56                       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-07-01  7:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Matthew Brost, Thomas Hellström, Mauro Carvalho Chehab,
	Andi Shyti, David Airlie, Mika Kuoppala, intel-gfx, linux-kernel,
	dri-devel, Chris Wilson, Thomas Hellstrom, Chris Wilson,
	Fei Yang, Rodrigo Vivi, Dave Airlie, stable, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison, Bruce Chang


On 30/06/2022 17:01, Mauro Carvalho Chehab wrote:
> Em Thu, 30 Jun 2022 09:12:41 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> 
>> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
>>> Em Wed, 29 Jun 2022 17:02:59 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
>>>    
>>>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
>>>>> On Tue, 28 Jun 2022 16:49:23 +0100
>>>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>       
>>>>>> .. which for me means a different patch 1, followed by patch 6 (moved
>>>>>> to be patch 2) would be ideal stable material.
>>>>>>
>>>>>> Then we have the current patch 2 which is open/unknown (to me at least).
>>>>>>
>>>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>>>>>
>>>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>>>>>
>>>>>> Could you please double check if what I am suggesting here is feasible
>>>>>> to implement and if it is just send those minimal patches out alone?
>>>>>
>>>>> Tested and porting just those 3 patches are enough to fix the Broadwell
>>>>> bug.
>>>>>
>>>>> So, I submitted a v2 of this series with just those. They all need to
>>>>> be backported to stable.
>>>>
>>>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
>>>>
>>>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
>>>> Author: Chris Wilson <chris.p.wilson@intel.com>
>>>> Date:   Wed Jun 29 16:25:24 2022 +0100
>>>>
>>>>        drm/i915/gt: Serialize TLB invalidates with GT resets
>>>>        
>>>>        Avoid trying to invalidate the TLB in the middle of performing an
>>>>        engine reset, as this may result in the reset timing out. Currently,
>>>>        the TLB invalidate is only serialised by its own mutex, forgoing the
>>>>        uncore lock, but we can take the uncore->lock as well to serialise
>>>>        the mmio access, thereby serialising with the GDRST.
>>>>        
>>>>        Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>>>>        i915 selftest/hangcheck.
>>>>        
>>>>        Cc: stable@vger.kernel.org
>>>>        Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>>>        Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>>>>        Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>>        Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>        Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>>>>        Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> index 8da3314bb6bf..aaadd0b02043 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>>            mutex_lock(&gt->tlb_invalidate_lock);
>>>>            intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>>>>     
>>>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>>>> +
>>>> +       for_each_engine(engine, gt, id) {
>>>> +               struct reg_and_bit rb;
>>>> +
>>>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>>> +               if (!i915_mmio_reg_offset(rb.reg))
>>>> +                       continue;
>>>> +
>>>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>>> +       }
>>>> +
>>>> +       spin_unlock_irq(&uncore->lock);
>>>> +
>>>>            for_each_engine(engine, gt, id) {
>>>> +               struct reg_and_bit rb;
>>>> +
>>>>                    /*
>>>>                     * HW architecture suggest typical invalidation time at 40us,
>>>>                     * with pessimistic cases up to 100us and a recommendation to
>>>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>>                     */
>>>>                    const unsigned int timeout_us = 100;
>>>>                    const unsigned int timeout_ms = 4;
>>>> -               struct reg_and_bit rb;
>>>>     
>>>>                    rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>>>                    if (!i915_mmio_reg_offset(rb.reg))
>>>>                            continue;
>>>>     
>>>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>>>                    if (__intel_wait_for_register_fw(uncore,
>>>>                                                     rb.reg, rb.bit, 0,
>>>>                                                     timeout_us, timeout_ms,
>>>>   
>>>
>>> This won't work, as it is not serializing TLB cache invalidation with
>>> i915 resets. Besides that, this is more or less merging patches 1 and 3,
>>
>> Could you explain why you think it is not doing exactly that? In both
>> versions end result is TLB flush requests are under the uncore lock and
>> waits are outside it.
> 
> Sure, but patch 2/3 (see v2) serializes i915 reset with TLB cache changes.
> This is needed in order to fix the regression.

Not "the" regression, and not even _a_ *regression*. 2/3 fixes an pre-existing and unrelated problem. Or only tangentially related if you want. 2/3 fixes a hang if two engine resets would happen to coincide. Nothing about TLB flushing.

>>> placing patches with different rationales altogether. Upstream rule is
>>> to have one logical change per patch.
>>
>> I don't think it applies in this case. It is simply splitting into two
>> loops so lock can be held across all mmio writes. I think of it this way
>> - what is the rationale for sending only the first patch to stable? What
>> does it _fix_ on it's own?
> 
> There's no -stable rule enforcing that only one patch would be allowed,
> nor saying that patches should be fold, doing multiple changes on as single
> patch just due to "Fixes" tag.

Well if we want to be pedantic what do stable rules say about adding new features - is skipping idle engines (which is a software concept) a fix or a new optimisation?

> So, while several -stable fixes can be done on a single patch, there are
> fixes that will require multiple patches. That's nothing wrong with that.

Agreed. But the point of my argument is that a) 1st patch does not fix anything on it's own (in relation to the regression), b) is adding improvements which will just be extra work to backport to old kernels.

> The only rule is that backports should follow what's merged upstream.
> So, if, in order to fix a regression, multiple patches are needed upstream,
> in principle, all of those can be backported if they fit at -stable rules.
> 
> As an example, once we backported a patch series on media that had ~20 patches,
> addressing security issues at the media compat32 logic (media ioctls usually
> pass structs and some with pointers). As the issue was discovered several
> years after compat32 got introduced, those 22 patches (some containing
> compat32 redesigns) had to be backported to all maintained LTS.
> 
> -
> 
> In this specific case, fixing the regression requires 3 logical changes:
> 
> 	1) Split the loop;
> 	2) Add serialize logic to i915 reset;
> 	3) use the same i915 reset spinlock to serialize TLB cache
> 	   invalidation.
> 
> Neither one of those logical changes alone would solve the issue. That's
> why I originally added the same Fixes: to the entire series: basically,
> any Kernel that has the TLB patch backported will require those
> three logical changes to be backported too.
> 
> That basically will follow what's there at the Kernel process docs:
> 
> 	"If your patch fixes a bug in a specific commit, e.g. you found an issue using
> 	 ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
> 	 the SHA-1 ID, and the one line summary."
> 
> 	Documentation/process/submitting-patches.rst
> 
> See, Fixes was originally introduced to be a hint to help stable
> and distro maintainers to identify how far they need to backport
> a patch. That's mainly why I placed fixes to the entire series.
> Yet, the same will also happen, in practice, if we place:
> 
> 	Cc: stable@vger.kernel.org # Up to version 4.4
> 
> Greg, Sasha and others -stable/distro maintainers will also have a
> (much less precise) hint about how far the backport is needed.
> 
>>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.
>>>
>>>   From backport PoV, it wouldn't make any difference applying one patch
>>> or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
>>> changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
>>> so, it shouldn't have merge conflicts while backporting it, maybe except
>>> if some functions it calls (or parameters) have changed. On such case,
>>> the backport fix should be trivial, and the end result of backporting
>>> one folded patch or two would be the same.
>>
>> Yes a lot of things changed. Not least engine and GT pm code. Note that
>> TLB flushing was backported all the way to 4.4 so any hunk you don't
>> strictly need can and will bite you. I have attached a tarball of
>> patches for you to explore. :)
>> Regards,
> 
> Thanks! That's very helpful to check the amount of work. It makes easy
> to use interdiff and (k)diff3 to check what changed.
> 
>  From it, the differences between 5.4 and 5.16 at intel_gt_invalidate_tlbs()
> are really trivial.
> 
> On 4.14, the function was added on a different file (intel_gem), and
> there were a few more API differences, as only gen8 code is there,
> but again, the changes are trivial: mostly macros/functions were renamed
> and some function parameters changed.
> 
>  From 4.9 to 4.14 there were also some changes but they also look trivial.
> 
> Kernel 4.4 has some other differences - the loop logic is different, and
> there's a ring initialization function, but, as version 4.4 is not listed
> anymore as LTS at kernel.org, we probably need to backport only up to
> 4.9.
> 
> All the above should be affecting patch v2 1/3. Patches v2 2/3 and 3/3 just
> have spin lock/unlock for the gt uncore spinlock. Those will very likely
> require some work on Kernels 4.x, but folding (or not) the patches won't
> really help.

What about intel_engine_pm_is_awake, what will you do with that one?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-07-01  7:56                       ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-07-01  7:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Matthew Brost, Thomas Hellström, Mauro Carvalho Chehab,
	Fei Yang, David Airlie, Mika Kuoppala, intel-gfx, linux-kernel,
	dri-devel, Chris Wilson, Thomas Hellstrom, Chris Wilson,
	Andi Shyti, Rodrigo Vivi, Dave Airlie, stable, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison, Bruce Chang


On 30/06/2022 17:01, Mauro Carvalho Chehab wrote:
> Em Thu, 30 Jun 2022 09:12:41 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> 
>> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
>>> Em Wed, 29 Jun 2022 17:02:59 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
>>>    
>>>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
>>>>> On Tue, 28 Jun 2022 16:49:23 +0100
>>>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>       
>>>>>> .. which for me means a different patch 1, followed by patch 6 (moved
>>>>>> to be patch 2) would be ideal stable material.
>>>>>>
>>>>>> Then we have the current patch 2 which is open/unknown (to me at least).
>>>>>>
>>>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>>>>>
>>>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>>>>>
>>>>>> Could you please double check if what I am suggesting here is feasible
>>>>>> to implement and if it is just send those minimal patches out alone?
>>>>>
>>>>> Tested and porting just those 3 patches are enough to fix the Broadwell
>>>>> bug.
>>>>>
>>>>> So, I submitted a v2 of this series with just those. They all need to
>>>>> be backported to stable.
>>>>
>>>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
>>>>
>>>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
>>>> Author: Chris Wilson <chris.p.wilson@intel.com>
>>>> Date:   Wed Jun 29 16:25:24 2022 +0100
>>>>
>>>>        drm/i915/gt: Serialize TLB invalidates with GT resets
>>>>        
>>>>        Avoid trying to invalidate the TLB in the middle of performing an
>>>>        engine reset, as this may result in the reset timing out. Currently,
>>>>        the TLB invalidate is only serialised by its own mutex, forgoing the
>>>>        uncore lock, but we can take the uncore->lock as well to serialise
>>>>        the mmio access, thereby serialising with the GDRST.
>>>>        
>>>>        Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>>>>        i915 selftest/hangcheck.
>>>>        
>>>>        Cc: stable@vger.kernel.org
>>>>        Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>>>        Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>>>>        Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>>        Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>        Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>>>>        Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> index 8da3314bb6bf..aaadd0b02043 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>>            mutex_lock(&gt->tlb_invalidate_lock);
>>>>            intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>>>>     
>>>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>>>> +
>>>> +       for_each_engine(engine, gt, id) {
>>>> +               struct reg_and_bit rb;
>>>> +
>>>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>>> +               if (!i915_mmio_reg_offset(rb.reg))
>>>> +                       continue;
>>>> +
>>>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>>> +       }
>>>> +
>>>> +       spin_unlock_irq(&uncore->lock);
>>>> +
>>>>            for_each_engine(engine, gt, id) {
>>>> +               struct reg_and_bit rb;
>>>> +
>>>>                    /*
>>>>                     * HW architecture suggest typical invalidation time at 40us,
>>>>                     * with pessimistic cases up to 100us and a recommendation to
>>>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>>                     */
>>>>                    const unsigned int timeout_us = 100;
>>>>                    const unsigned int timeout_ms = 4;
>>>> -               struct reg_and_bit rb;
>>>>     
>>>>                    rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>>>                    if (!i915_mmio_reg_offset(rb.reg))
>>>>                            continue;
>>>>     
>>>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>>>                    if (__intel_wait_for_register_fw(uncore,
>>>>                                                     rb.reg, rb.bit, 0,
>>>>                                                     timeout_us, timeout_ms,
>>>>   
>>>
>>> This won't work, as it is not serializing TLB cache invalidation with
>>> i915 resets. Besides that, this is more or less merging patches 1 and 3,
>>
>> Could you explain why you think it is not doing exactly that? In both
>> versions end result is TLB flush requests are under the uncore lock and
>> waits are outside it.
> 
> Sure, but patch 2/3 (see v2) serializes i915 reset with TLB cache changes.
> This is needed in order to fix the regression.

Not "the" regression, and not even _a_ *regression*. 2/3 fixes an pre-existing and unrelated problem. Or only tangentially related if you want. 2/3 fixes a hang if two engine resets would happen to coincide. Nothing about TLB flushing.

>>> placing patches with different rationales altogether. Upstream rule is
>>> to have one logical change per patch.
>>
>> I don't think it applies in this case. It is simply splitting into two
>> loops so lock can be held across all mmio writes. I think of it this way
>> - what is the rationale for sending only the first patch to stable? What
>> does it _fix_ on it's own?
> 
> There's no -stable rule enforcing that only one patch would be allowed,
> nor saying that patches should be fold, doing multiple changes on as single
> patch just due to "Fixes" tag.

Well if we want to be pedantic what do stable rules say about adding new features - is skipping idle engines (which is a software concept) a fix or a new optimisation?

> So, while several -stable fixes can be done on a single patch, there are
> fixes that will require multiple patches. That's nothing wrong with that.

Agreed. But the point of my argument is that a) 1st patch does not fix anything on it's own (in relation to the regression), b) is adding improvements which will just be extra work to backport to old kernels.

> The only rule is that backports should follow what's merged upstream.
> So, if, in order to fix a regression, multiple patches are needed upstream,
> in principle, all of those can be backported if they fit at -stable rules.
> 
> As an example, once we backported a patch series on media that had ~20 patches,
> addressing security issues at the media compat32 logic (media ioctls usually
> pass structs and some with pointers). As the issue was discovered several
> years after compat32 got introduced, those 22 patches (some containing
> compat32 redesigns) had to be backported to all maintained LTS.
> 
> -
> 
> In this specific case, fixing the regression requires 3 logical changes:
> 
> 	1) Split the loop;
> 	2) Add serialize logic to i915 reset;
> 	3) use the same i915 reset spinlock to serialize TLB cache
> 	   invalidation.
> 
> Neither one of those logical changes alone would solve the issue. That's
> why I originally added the same Fixes: to the entire series: basically,
> any Kernel that has the TLB patch backported will require those
> three logical changes to be backported too.
> 
> That basically will follow what's there at the Kernel process docs:
> 
> 	"If your patch fixes a bug in a specific commit, e.g. you found an issue using
> 	 ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
> 	 the SHA-1 ID, and the one line summary."
> 
> 	Documentation/process/submitting-patches.rst
> 
> See, Fixes was originally introduced to be a hint to help stable
> and distro maintainers to identify how far they need to backport
> a patch. That's mainly why I placed fixes to the entire series.
> Yet, the same will also happen, in practice, if we place:
> 
> 	Cc: stable@vger.kernel.org # Up to version 4.4
> 
> Greg, Sasha and others -stable/distro maintainers will also have a
> (much less precise) hint about how far the backport is needed.
> 
>>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.
>>>
>>>   From backport PoV, it wouldn't make any difference applying one patch
>>> or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
>>> changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
>>> so, it shouldn't have merge conflicts while backporting it, maybe except
>>> if some functions it calls (or parameters) have changed. On such case,
>>> the backport fix should be trivial, and the end result of backporting
>>> one folded patch or two would be the same.
>>
>> Yes a lot of things changed. Not least engine and GT pm code. Note that
>> TLB flushing was backported all the way to 4.4 so any hunk you don't
>> strictly need can and will bite you. I have attached a tarball of
>> patches for you to explore. :)
>> Regards,
> 
> Thanks! That's very helpful to check the amount of work. It makes easy
> to use interdiff and (k)diff3 to check what changed.
> 
>  From it, the differences between 5.4 and 5.16 at intel_gt_invalidate_tlbs()
> are really trivial.
> 
> On 4.14, the function was added on a different file (intel_gem), and
> there were a few more API differences, as only gen8 code is there,
> but again, the changes are trivial: mostly macros/functions were renamed
> and some function parameters changed.
> 
>  From 4.9 to 4.14 there were also some changes but they also look trivial.
> 
> Kernel 4.4 has some other differences - the loop logic is different, and
> there's a ring initialization function, but, as version 4.4 is not listed
> anymore as LTS at kernel.org, we probably need to backport only up to
> 4.9.
> 
> All the above should be affecting patch v2 1/3. Patches v2 2/3 and 3/3 just
> have spin lock/unlock for the gt uncore spinlock. Those will very likely
> require some work on Kernels 4.x, but folding (or not) the patches won't
> really help.

What about intel_engine_pm_is_awake, what will you do with that one?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-07-01  7:56                       ` Tvrtko Ursulin
  0 siblings, 0 replies; 87+ messages in thread
From: Tvrtko Ursulin @ 2022-07-01  7:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Thomas Hellström, Mauro Carvalho Chehab, David Airlie,
	intel-gfx, linux-kernel, dri-devel, Chris Wilson,
	Thomas Hellstrom, Chris Wilson, Rodrigo Vivi, Dave Airlie,
	stable, Tejas Upadhyay


On 30/06/2022 17:01, Mauro Carvalho Chehab wrote:
> Em Thu, 30 Jun 2022 09:12:41 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> 
>> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:
>>> Em Wed, 29 Jun 2022 17:02:59 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
>>>    
>>>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:
>>>>> On Tue, 28 Jun 2022 16:49:23 +0100
>>>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>       
>>>>>> .. which for me means a different patch 1, followed by patch 6 (moved
>>>>>> to be patch 2) would be ideal stable material.
>>>>>>
>>>>>> Then we have the current patch 2 which is open/unknown (to me at least).
>>>>>>
>>>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
>>>>>>
>>>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
>>>>>>
>>>>>> Could you please double check if what I am suggesting here is feasible
>>>>>> to implement and if it is just send those minimal patches out alone?
>>>>>
>>>>> Tested and porting just those 3 patches are enough to fix the Broadwell
>>>>> bug.
>>>>>
>>>>> So, I submitted a v2 of this series with just those. They all need to
>>>>> be backported to stable.
>>>>
>>>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
>>>>
>>>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
>>>> Author: Chris Wilson <chris.p.wilson@intel.com>
>>>> Date:   Wed Jun 29 16:25:24 2022 +0100
>>>>
>>>>        drm/i915/gt: Serialize TLB invalidates with GT resets
>>>>        
>>>>        Avoid trying to invalidate the TLB in the middle of performing an
>>>>        engine reset, as this may result in the reset timing out. Currently,
>>>>        the TLB invalidate is only serialised by its own mutex, forgoing the
>>>>        uncore lock, but we can take the uncore->lock as well to serialise
>>>>        the mmio access, thereby serialising with the GDRST.
>>>>        
>>>>        Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
>>>>        i915 selftest/hangcheck.
>>>>        
>>>>        Cc: stable@vger.kernel.org
>>>>        Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>>>        Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>>>>        Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>>        Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>>        Reviewed-by: Andi Shyti <andi.shyti@intel.com>
>>>>        Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>>>        Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> index 8da3314bb6bf..aaadd0b02043 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>>            mutex_lock(&gt->tlb_invalidate_lock);
>>>>            intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>>>>     
>>>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>>>> +
>>>> +       for_each_engine(engine, gt, id) {
>>>> +               struct reg_and_bit rb;
>>>> +
>>>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>>> +               if (!i915_mmio_reg_offset(rb.reg))
>>>> +                       continue;
>>>> +
>>>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>>> +       }
>>>> +
>>>> +       spin_unlock_irq(&uncore->lock);
>>>> +
>>>>            for_each_engine(engine, gt, id) {
>>>> +               struct reg_and_bit rb;
>>>> +
>>>>                    /*
>>>>                     * HW architecture suggest typical invalidation time at 40us,
>>>>                     * with pessimistic cases up to 100us and a recommendation to
>>>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>>                     */
>>>>                    const unsigned int timeout_us = 100;
>>>>                    const unsigned int timeout_ms = 4;
>>>> -               struct reg_and_bit rb;
>>>>     
>>>>                    rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>>>                    if (!i915_mmio_reg_offset(rb.reg))
>>>>                            continue;
>>>>     
>>>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>>>                    if (__intel_wait_for_register_fw(uncore,
>>>>                                                     rb.reg, rb.bit, 0,
>>>>                                                     timeout_us, timeout_ms,
>>>>   
>>>
>>> This won't work, as it is not serializing TLB cache invalidation with
>>> i915 resets. Besides that, this is more or less merging patches 1 and 3,
>>
>> Could you explain why you think it is not doing exactly that? In both
>> versions end result is TLB flush requests are under the uncore lock and
>> waits are outside it.
> 
> Sure, but patch 2/3 (see v2) serializes i915 reset with TLB cache changes.
> This is needed in order to fix the regression.

Not "the" regression, and not even _a_ *regression*. 2/3 fixes an pre-existing and unrelated problem. Or only tangentially related if you want. 2/3 fixes a hang if two engine resets would happen to coincide. Nothing about TLB flushing.

>>> placing patches with different rationales altogether. Upstream rule is
>>> to have one logical change per patch.
>>
>> I don't think it applies in this case. It is simply splitting into two
>> loops so lock can be held across all mmio writes. I think of it this way
>> - what is the rationale for sending only the first patch to stable? What
>> does it _fix_ on it's own?
> 
> There's no -stable rule enforcing that only one patch would be allowed,
> nor saying that patches should be fold, doing multiple changes on as single
> patch just due to "Fixes" tag.

Well if we want to be pedantic what do stable rules say about adding new features - is skipping idle engines (which is a software concept) a fix or a new optimisation?

> So, while several -stable fixes can be done on a single patch, there are
> fixes that will require multiple patches. That's nothing wrong with that.

Agreed. But the point of my argument is that a) 1st patch does not fix anything on it's own (in relation to the regression), b) is adding improvements which will just be extra work to backport to old kernels.

> The only rule is that backports should follow what's merged upstream.
> So, if, in order to fix a regression, multiple patches are needed upstream,
> in principle, all of those can be backported if they fit at -stable rules.
> 
> As an example, once we backported a patch series on media that had ~20 patches,
> addressing security issues at the media compat32 logic (media ioctls usually
> pass structs and some with pointers). As the issue was discovered several
> years after compat32 got introduced, those 22 patches (some containing
> compat32 redesigns) had to be backported to all maintained LTS.
> 
> -
> 
> In this specific case, fixing the regression requires 3 logical changes:
> 
> 	1) Split the loop;
> 	2) Add serialize logic to i915 reset;
> 	3) use the same i915 reset spinlock to serialize TLB cache
> 	   invalidation.
> 
> Neither one of those logical changes alone would solve the issue. That's
> why I originally added the same Fixes: to the entire series: basically,
> any Kernel that has the TLB patch backported will require those
> three logical changes to be backported too.
> 
> That basically will follow what's there at the Kernel process docs:
> 
> 	"If your patch fixes a bug in a specific commit, e.g. you found an issue using
> 	 ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of
> 	 the SHA-1 ID, and the one line summary."
> 
> 	Documentation/process/submitting-patches.rst
> 
> See, Fixes was originally introduced to be a hint to help stable
> and distro maintainers to identify how far they need to backport
> a patch. That's mainly why I placed fixes to the entire series.
> Yet, the same will also happen, in practice, if we place:
> 
> 	Cc: stable@vger.kernel.org # Up to version 4.4
> 
> Greg, Sasha and others -stable/distro maintainers will also have a
> (much less precise) hint about how far the backport is needed.
> 
>>> If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.
>>>
>>>   From backport PoV, it wouldn't make any difference applying one patch
>>> or two. See, intel_gt_invalidate_tlbs() function doesn't exist before
>>> changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"),
>>> so, it shouldn't have merge conflicts while backporting it, maybe except
>>> if some functions it calls (or parameters) have changed. On such case,
>>> the backport fix should be trivial, and the end result of backporting
>>> one folded patch or two would be the same.
>>
>> Yes a lot of things changed. Not least engine and GT pm code. Note that
>> TLB flushing was backported all the way to 4.4 so any hunk you don't
>> strictly need can and will bite you. I have attached a tarball of
>> patches for you to explore. :)
>> Regards,
> 
> Thanks! That's very helpful to check the amount of work. It makes easy
> to use interdiff and (k)diff3 to check what changed.
> 
>  From it, the differences between 5.4 and 5.16 at intel_gt_invalidate_tlbs()
> are really trivial.
> 
> On 4.14, the function was added on a different file (intel_gem), and
> there were a few more API differences, as only gen8 code is there,
> but again, the changes are trivial: mostly macros/functions were renamed
> and some function parameters changed.
> 
>  From 4.9 to 4.14 there were also some changes but they also look trivial.
> 
> Kernel 4.4 has some other differences - the loop logic is different, and
> there's a ring initialization function, but, as version 4.4 is not listed
> anymore as LTS at kernel.org, we probably need to backport only up to
> 4.9.
> 
> All the above should be affecting patch v2 1/3. Patches v2 2/3 and 3/3 just
> have spin lock/unlock for the gt uncore spinlock. Those will very likely
> require some work on Kernels 4.x, but folding (or not) the patches won't
> really help.

What about intel_engine_pm_is_awake, what will you do with that one?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
  2022-07-01  7:56                       ` Tvrtko Ursulin
  (?)
@ 2022-07-04  8:42                         ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-04  8:42 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Mauro Carvalho Chehab, Matthew Brost, Thomas Hellström,
	Andi Shyti, David Airlie, Mika Kuoppala, intel-gfx, linux-kernel,
	dri-devel, Chris Wilson, Thomas Hellstrom, Chris Wilson,
	Fei Yang, Rodrigo Vivi, Dave Airlie, stable, Tejas Upadhyay,
	Umesh Nerlige Ramappa, John Harrison, Bruce Chang

On Fri, 1 Jul 2022 08:56:53 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 30/06/2022 17:01, Mauro Carvalho Chehab wrote:
> > Em Thu, 30 Jun 2022 09:12:41 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >   
> >> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:  
> >>> Em Wed, 29 Jun 2022 17:02:59 +0100
> >>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >>>      
> >>>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:  
> >>>>> On Tue, 28 Jun 2022 16:49:23 +0100
> >>>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>         
> >>>>>> .. which for me means a different patch 1, followed by patch 6 (moved
> >>>>>> to be patch 2) would be ideal stable material.
> >>>>>>
> >>>>>> Then we have the current patch 2 which is open/unknown (to me at least).
> >>>>>>
> >>>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>>>>>
> >>>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>>>>>
> >>>>>> Could you please double check if what I am suggesting here is feasible
> >>>>>> to implement and if it is just send those minimal patches out alone?  
> >>>>>
> >>>>> Tested and porting just those 3 patches are enough to fix the Broadwell
> >>>>> bug.
> >>>>>
> >>>>> So, I submitted a v2 of this series with just those. They all need to
> >>>>> be backported to stable.  
> >>>>
> >>>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> >>>>
> >>>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> >>>> Author: Chris Wilson <chris.p.wilson@intel.com>
> >>>> Date:   Wed Jun 29 16:25:24 2022 +0100
> >>>>
> >>>>        drm/i915/gt: Serialize TLB invalidates with GT resets
> >>>>        
> >>>>        Avoid trying to invalidate the TLB in the middle of performing an
> >>>>        engine reset, as this may result in the reset timing out. Currently,
> >>>>        the TLB invalidate is only serialised by its own mutex, forgoing the
> >>>>        uncore lock, but we can take the uncore->lock as well to serialise
> >>>>        the mmio access, thereby serialising with the GDRST.
> >>>>        
> >>>>        Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> >>>>        i915 selftest/hangcheck.
> >>>>        
> >>>>        Cc: stable@vger.kernel.org
> >>>>        Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>>>        Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> >>>>        Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>>>        Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >>>>        Reviewed-by: Andi Shyti <andi.shyti@intel.com>
> >>>>        Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> index 8da3314bb6bf..aaadd0b02043 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>>>            mutex_lock(&gt->tlb_invalidate_lock);
> >>>>            intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
> >>>>     
> >>>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> >>>> +
> >>>> +       for_each_engine(engine, gt, id) {
> >>>> +               struct reg_and_bit rb;
> >>>> +
> >>>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>>> +               if (!i915_mmio_reg_offset(rb.reg))
> >>>> +                       continue;
> >>>> +
> >>>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>>> +       }
> >>>> +
> >>>> +       spin_unlock_irq(&uncore->lock);
> >>>> +
> >>>>            for_each_engine(engine, gt, id) {
> >>>> +               struct reg_and_bit rb;
> >>>> +
> >>>>                    /*
> >>>>                     * HW architecture suggest typical invalidation time at 40us,
> >>>>                     * with pessimistic cases up to 100us and a recommendation to
> >>>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>>>                     */
> >>>>                    const unsigned int timeout_us = 100;
> >>>>                    const unsigned int timeout_ms = 4;
> >>>> -               struct reg_and_bit rb;
> >>>>     
> >>>>                    rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>>>                    if (!i915_mmio_reg_offset(rb.reg))
> >>>>                            continue;
> >>>>     
> >>>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>>>                    if (__intel_wait_for_register_fw(uncore,
> >>>>                                                     rb.reg, rb.bit, 0,
> >>>>                                                     timeout_us, timeout_ms,
> >>>>     

...

> What about intel_engine_pm_is_awake, what will you do with that one?

Ok, let's keep this series plain simple. I'm dropping PM awake logic
as you suggested on v3, keeping just the bare minimal required to
fix the selftest breakage.

That actually means that we're not considering on such backports that TLB 
cache invalidation does add performance penalties and might cause apps
to break.

I suspect that we'll need to also backport at least some of the other
patches like the PM awake logic and the one that avoids TLB cache 
invalidation when the memory was not touched by userspace, but let's
focus first on fixing the regression pointed by selftest.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-07-04  8:42                         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-04  8:42 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Matthew Brost, Thomas Hellström, Tejas Upadhyay, Fei Yang,
	David Airlie, Mika Kuoppala, intel-gfx, linux-kernel, dri-devel,
	Chris Wilson, Thomas Hellstrom, Chris Wilson, Andi Shyti,
	Rodrigo Vivi, Dave Airlie, stable, Mauro Carvalho Chehab,
	Umesh Nerlige Ramappa, John Harrison, Bruce Chang

On Fri, 1 Jul 2022 08:56:53 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 30/06/2022 17:01, Mauro Carvalho Chehab wrote:
> > Em Thu, 30 Jun 2022 09:12:41 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >   
> >> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:  
> >>> Em Wed, 29 Jun 2022 17:02:59 +0100
> >>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >>>      
> >>>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:  
> >>>>> On Tue, 28 Jun 2022 16:49:23 +0100
> >>>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>         
> >>>>>> .. which for me means a different patch 1, followed by patch 6 (moved
> >>>>>> to be patch 2) would be ideal stable material.
> >>>>>>
> >>>>>> Then we have the current patch 2 which is open/unknown (to me at least).
> >>>>>>
> >>>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>>>>>
> >>>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>>>>>
> >>>>>> Could you please double check if what I am suggesting here is feasible
> >>>>>> to implement and if it is just send those minimal patches out alone?  
> >>>>>
> >>>>> Tested and porting just those 3 patches are enough to fix the Broadwell
> >>>>> bug.
> >>>>>
> >>>>> So, I submitted a v2 of this series with just those. They all need to
> >>>>> be backported to stable.  
> >>>>
> >>>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> >>>>
> >>>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> >>>> Author: Chris Wilson <chris.p.wilson@intel.com>
> >>>> Date:   Wed Jun 29 16:25:24 2022 +0100
> >>>>
> >>>>        drm/i915/gt: Serialize TLB invalidates with GT resets
> >>>>        
> >>>>        Avoid trying to invalidate the TLB in the middle of performing an
> >>>>        engine reset, as this may result in the reset timing out. Currently,
> >>>>        the TLB invalidate is only serialised by its own mutex, forgoing the
> >>>>        uncore lock, but we can take the uncore->lock as well to serialise
> >>>>        the mmio access, thereby serialising with the GDRST.
> >>>>        
> >>>>        Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> >>>>        i915 selftest/hangcheck.
> >>>>        
> >>>>        Cc: stable@vger.kernel.org
> >>>>        Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>>>        Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> >>>>        Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>>>        Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >>>>        Reviewed-by: Andi Shyti <andi.shyti@intel.com>
> >>>>        Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> index 8da3314bb6bf..aaadd0b02043 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>>>            mutex_lock(&gt->tlb_invalidate_lock);
> >>>>            intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
> >>>>     
> >>>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> >>>> +
> >>>> +       for_each_engine(engine, gt, id) {
> >>>> +               struct reg_and_bit rb;
> >>>> +
> >>>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>>> +               if (!i915_mmio_reg_offset(rb.reg))
> >>>> +                       continue;
> >>>> +
> >>>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>>> +       }
> >>>> +
> >>>> +       spin_unlock_irq(&uncore->lock);
> >>>> +
> >>>>            for_each_engine(engine, gt, id) {
> >>>> +               struct reg_and_bit rb;
> >>>> +
> >>>>                    /*
> >>>>                     * HW architecture suggest typical invalidation time at 40us,
> >>>>                     * with pessimistic cases up to 100us and a recommendation to
> >>>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>>>                     */
> >>>>                    const unsigned int timeout_us = 100;
> >>>>                    const unsigned int timeout_ms = 4;
> >>>> -               struct reg_and_bit rb;
> >>>>     
> >>>>                    rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>>>                    if (!i915_mmio_reg_offset(rb.reg))
> >>>>                            continue;
> >>>>     
> >>>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>>>                    if (__intel_wait_for_register_fw(uncore,
> >>>>                                                     rb.reg, rb.bit, 0,
> >>>>                                                     timeout_us, timeout_ms,
> >>>>     

...

> What about intel_engine_pm_is_awake, what will you do with that one?

Ok, let's keep this series plain simple. I'm dropping PM awake logic
as you suggested on v3, keeping just the bare minimal required to
fix the selftest breakage.

That actually means that we're not considering on such backports that TLB 
cache invalidation does add performance penalties and might cause apps
to break.

I suspect that we'll need to also backport at least some of the other
patches like the PM awake logic and the one that avoids TLB cache 
invalidation when the memory was not touched by userspace, but let's
focus first on fixing the regression pointed by selftest.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [Intel-gfx] [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets
@ 2022-07-04  8:42                         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 87+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-04  8:42 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Thomas Hellström, Tejas Upadhyay, David Airlie, intel-gfx,
	linux-kernel, dri-devel, Chris Wilson, Thomas Hellstrom,
	Chris Wilson, Rodrigo Vivi, Dave Airlie, stable,
	Mauro Carvalho Chehab

On Fri, 1 Jul 2022 08:56:53 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 30/06/2022 17:01, Mauro Carvalho Chehab wrote:
> > Em Thu, 30 Jun 2022 09:12:41 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >   
> >> On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:  
> >>> Em Wed, 29 Jun 2022 17:02:59 +0100
> >>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> escreveu:
> >>>      
> >>>> On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:  
> >>>>> On Tue, 28 Jun 2022 16:49:23 +0100
> >>>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >>>>>         
> >>>>>> .. which for me means a different patch 1, followed by patch 6 (moved
> >>>>>> to be patch 2) would be ideal stable material.
> >>>>>>
> >>>>>> Then we have the current patch 2 which is open/unknown (to me at least).
> >>>>>>
> >>>>>> And the rest seem like optimisations which shouldn't be tagged as fixes.
> >>>>>>
> >>>>>> Apart from patch 5 which should be cc: stable, but no fixes as agreed.
> >>>>>>
> >>>>>> Could you please double check if what I am suggesting here is feasible
> >>>>>> to implement and if it is just send those minimal patches out alone?  
> >>>>>
> >>>>> Tested and porting just those 3 patches are enough to fix the Broadwell
> >>>>> bug.
> >>>>>
> >>>>> So, I submitted a v2 of this series with just those. They all need to
> >>>>> be backported to stable.  
> >>>>
> >>>> I would really like to give even a smaller fix a try. Something like, although not even compile tested:
> >>>>
> >>>> commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680
> >>>> Author: Chris Wilson <chris.p.wilson@intel.com>
> >>>> Date:   Wed Jun 29 16:25:24 2022 +0100
> >>>>
> >>>>        drm/i915/gt: Serialize TLB invalidates with GT resets
> >>>>        
> >>>>        Avoid trying to invalidate the TLB in the middle of performing an
> >>>>        engine reset, as this may result in the reset timing out. Currently,
> >>>>        the TLB invalidate is only serialised by its own mutex, forgoing the
> >>>>        uncore lock, but we can take the uncore->lock as well to serialise
> >>>>        the mmio access, thereby serialising with the GDRST.
> >>>>        
> >>>>        Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
> >>>>        i915 selftest/hangcheck.
> >>>>        
> >>>>        Cc: stable@vger.kernel.org
> >>>>        Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >>>>        Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> >>>>        Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>>>        Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> >>>>        Reviewed-by: Andi Shyti <andi.shyti@intel.com>
> >>>>        Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> >>>>        Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> index 8da3314bb6bf..aaadd0b02043 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> >>>> @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>>>            mutex_lock(&gt->tlb_invalidate_lock);
> >>>>            intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
> >>>>     
> >>>> +       spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> >>>> +
> >>>> +       for_each_engine(engine, gt, id) {
> >>>> +               struct reg_and_bit rb;
> >>>> +
> >>>> +               rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>>> +               if (!i915_mmio_reg_offset(rb.reg))
> >>>> +                       continue;
> >>>> +
> >>>> +               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>>> +       }
> >>>> +
> >>>> +       spin_unlock_irq(&uncore->lock);
> >>>> +
> >>>>            for_each_engine(engine, gt, id) {
> >>>> +               struct reg_and_bit rb;
> >>>> +
> >>>>                    /*
> >>>>                     * HW architecture suggest typical invalidation time at 40us,
> >>>>                     * with pessimistic cases up to 100us and a recommendation to
> >>>> @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >>>>                     */
> >>>>                    const unsigned int timeout_us = 100;
> >>>>                    const unsigned int timeout_ms = 4;
> >>>> -               struct reg_and_bit rb;
> >>>>     
> >>>>                    rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >>>>                    if (!i915_mmio_reg_offset(rb.reg))
> >>>>                            continue;
> >>>>     
> >>>> -               intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> >>>>                    if (__intel_wait_for_register_fw(uncore,
> >>>>                                                     rb.reg, rb.bit, 0,
> >>>>                                                     timeout_us, timeout_ms,
> >>>>     

...

> What about intel_engine_pm_is_awake, what will you do with that one?

Ok, let's keep this series plain simple. I'm dropping PM awake logic
as you suggested on v3, keeping just the bare minimal required to
fix the selftest breakage.

That actually means that we're not considering on such backports that TLB 
cache invalidation does add performance penalties and might cause apps
to break.

I suspect that we'll need to also backport at least some of the other
patches like the PM awake logic and the one that avoids TLB cache 
invalidation when the memory was not touched by userspace, but let's
focus first on fixing the regression pointed by selftest.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2022-07-04 16:19 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-15 15:27 [PATCH 0/6] Fix TLB invalidate issues with Broadwell Mauro Carvalho Chehab
2022-06-15 15:27 ` Mauro Carvalho Chehab
2022-06-15 15:27 ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-15 15:27 ` [PATCH 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab
2022-06-15 15:27   ` Mauro Carvalho Chehab
2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-16  7:21   ` Tvrtko Ursulin
2022-06-16  7:21     ` [Intel-gfx] " Tvrtko Ursulin
2022-06-16  7:21     ` Tvrtko Ursulin
2022-06-23 11:04   ` Andi Shyti
2022-06-23 11:04     ` [Intel-gfx] " Andi Shyti
2022-06-23 11:04     ` Andi Shyti
2022-06-15 15:27 ` [PATCH 2/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations Mauro Carvalho Chehab
2022-06-15 15:27   ` Mauro Carvalho Chehab
2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-15 17:03   ` Umesh Nerlige Ramappa
2022-06-15 17:03     ` Umesh Nerlige Ramappa
2022-06-23 11:07   ` Andi Shyti
2022-06-23 11:07     ` [Intel-gfx] " Andi Shyti
2022-06-23 11:07     ` Andi Shyti
2022-06-15 15:27 ` [PATCH 3/6] drm/i915/gt: Skip TLB invalidations once wedged Mauro Carvalho Chehab
2022-06-15 15:27   ` Mauro Carvalho Chehab
2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-16  7:25   ` Tvrtko Ursulin
2022-06-16  7:25     ` [Intel-gfx] " Tvrtko Ursulin
2022-06-16  7:25     ` Tvrtko Ursulin
2022-06-23 11:08   ` Andi Shyti
2022-06-23 11:08     ` [Intel-gfx] " Andi Shyti
2022-06-23 11:08     ` Andi Shyti
2022-06-15 15:27 ` [PATCH 4/6] drm/i915/gt: Only invalidate TLBs exposed to user manipulation Mauro Carvalho Chehab
2022-06-15 15:27   ` Mauro Carvalho Chehab
2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-16  7:33   ` Tvrtko Ursulin
2022-06-16  7:33     ` [Intel-gfx] " Tvrtko Ursulin
2022-06-16  7:33     ` Tvrtko Ursulin
2022-06-23 11:13   ` Andi Shyti
2022-06-23 11:13     ` [Intel-gfx] " Andi Shyti
2022-06-23 11:13     ` Andi Shyti
2022-06-15 15:27 ` [PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets Mauro Carvalho Chehab
2022-06-15 15:27   ` Mauro Carvalho Chehab
2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-16  7:35   ` Tvrtko Ursulin
2022-06-16  7:35     ` [Intel-gfx] " Tvrtko Ursulin
2022-06-16  7:35     ` Tvrtko Ursulin
2022-06-23 11:17   ` Andi Shyti
2022-06-23 11:17     ` [Intel-gfx] " Andi Shyti
2022-06-23 11:17     ` Andi Shyti
2022-06-24  8:34     ` Tvrtko Ursulin
2022-06-24  8:34       ` [Intel-gfx] " Tvrtko Ursulin
2022-06-24  8:34       ` Tvrtko Ursulin
2022-06-27  9:00       ` Mauro Carvalho Chehab
2022-06-27  9:00         ` Mauro Carvalho Chehab
2022-06-27  9:00         ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-28 15:49         ` Tvrtko Ursulin
2022-06-28 15:49           ` [Intel-gfx] " Tvrtko Ursulin
2022-06-28 15:49           ` Tvrtko Ursulin
2022-06-29 15:30           ` Mauro Carvalho Chehab
2022-06-29 15:30             ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-29 15:30             ` Mauro Carvalho Chehab
2022-06-29 16:02             ` Tvrtko Ursulin
2022-06-29 16:02               ` Tvrtko Ursulin
2022-06-29 16:02               ` [Intel-gfx] " Tvrtko Ursulin
2022-06-30  7:32               ` Mauro Carvalho Chehab
2022-06-30  7:32                 ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-30  7:32                 ` Mauro Carvalho Chehab
2022-06-30  8:12                 ` Tvrtko Ursulin
2022-06-30  8:12                   ` Tvrtko Ursulin
2022-06-30  8:12                   ` [Intel-gfx] " Tvrtko Ursulin
2022-06-30 16:01                   ` Mauro Carvalho Chehab
2022-06-30 16:01                     ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-30 16:01                     ` Mauro Carvalho Chehab
2022-07-01  7:56                     ` Tvrtko Ursulin
2022-07-01  7:56                       ` [Intel-gfx] " Tvrtko Ursulin
2022-07-01  7:56                       ` Tvrtko Ursulin
2022-07-04  8:42                       ` Mauro Carvalho Chehab
2022-07-04  8:42                         ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-04  8:42                         ` Mauro Carvalho Chehab
2022-06-15 15:27 ` [PATCH 6/6] drm/i915/gt: Serialize TLB invalidates with GT resets Mauro Carvalho Chehab
2022-06-15 15:27   ` Mauro Carvalho Chehab
2022-06-15 15:27   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-06-23 11:18   ` Andi Shyti
2022-06-23 11:18     ` [Intel-gfx] " Andi Shyti
2022-06-23 11:18     ` Andi Shyti
2022-06-15 17:26 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Fix TLB invalidate issues with Broadwell Patchwork
2022-06-15 17:26 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-06-15 17:48 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-06-15 23:45 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.