linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support
@ 2022-07-14 12:06 Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab
                   ` (20 more replies)
  0 siblings, 21 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Christian König, Daniel Vetter,
	David Airlie, Sumit Semwal, dri-devel, intel-gfx, linaro-mm-sig,
	linux-kernel, linux-media

TLB invalidation is a slow operation. It should not be doing lightly, as it
causes performance regressions, like this:

[178.821002] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!

This series contain 

1) some patches that makes TLB invalidation to happen only on
active, non-wedged engines, doing cache invalidation in batch 
and only when GT objects are exposed to userspace:

  drm/i915/gt: Ignore TLB invalidations on idle engines
  drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  drm/i915/gt: Skip TLB invalidations once wedged
  drm/i915/gt: Batch TLB invalidations
  drm/i915/gt: Move TLB invalidation to its own file

2) It fixes two bugs, being the first a workaround:

  drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  drm/i915: Invalidate the TLBs on each GT

  drm/i915/guc: Introduce TLB_INVALIDATION_ALL action

3) It adds GuC support. Besides providing TLB invalidation on some
additional hardware, this should also help serializing GuC operations
with TLB invalidation:

  drm/i915/guc: Introduce TLB_INVALIDATION_ALL action
  drm/i915/guc: Define CTB based TLB invalidation routines
  drm/i915: Add platform macro for selective tlb flush
  drm/i915: Define GuC Based TLB invalidation routines
  drm/i915: Add generic interface for tlb invalidation for XeHP
  drm/i915: Use selective tlb invalidations where supported

4) It adds the corresponding kernel-doc markups for the kAPI
used for TLB invalidation.

While I could have split this into smaller pieces, I'm opting to send
them altogether, in order for CI trybot to better verify what issues
will be closed with this series.

---

v2:
  - no changes. Just rebased on the top of drm-tip: 2022y-07m-14d-08h-35m-36s,
   as CI trybot was having troubles applying it. Hopefully, it will now work.

Chris Wilson (7):
  drm/i915/gt: Ignore TLB invalidations on idle engines
  drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  drm/i915/gt: Skip TLB invalidations once wedged
  drm/i915/gt: Batch TLB invalidations
  drm/i915/gt: Move TLB invalidation to its own file
  drm/i915: Invalidate the TLBs on each GT

Mauro Carvalho Chehab (8):
  drm/i915/gt: document with_intel_gt_pm_if_awake()
  drm/i915/gt: describe the new tlb parameter at i915_vma_resource
  drm/i915/guc: use kernel-doc for enum intel_guc_tlb_inval_mode
  drm/i915/guc: document the TLB invalidation struct members
  drm/i915: document tlb field at struct drm_i915_gem_object
  drm/i915/gt: document TLB cache invalidation functions
  drm/i915/guc: describe enum intel_guc_tlb_invalidation_type
  drm/i915/guc: document TLB cache invalidation functions

Piotr Piórkowski (1):
  drm/i915/guc: Introduce TLB_INVALIDATION_ALL action

Prathap Kumar Valsan (5):
  drm/i915/guc: Define CTB based TLB invalidation routines
  drm/i915: Add platform macro for selective tlb flush
  drm/i915: Define GuC Based TLB invalidation routines
  drm/i915: Add generic interface for tlb invalidation for XeHP
  drm/i915: Use selective tlb invalidations where supported

 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     |  28 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |   1 +
 drivers/gpu/drm/i915/gt/intel_gt.c            | 125 +-------
 drivers/gpu/drm/i915/gt/intel_gt.h            |   2 -
 .../gpu/drm/i915/gt/intel_gt_buffer_pool.h    |   3 +-
 drivers/gpu/drm/i915/gt/intel_gt_defines.h    |  11 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.h         |  10 +
 drivers/gpu/drm/i915/gt/intel_gt_regs.h       |   8 +
 drivers/gpu/drm/i915/gt/intel_gt_types.h      |  22 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |   8 +-
 drivers/gpu/drm/i915/gt/intel_tlb.c           | 295 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_tlb.h           |  30 ++
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  54 ++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 232 ++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  36 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  24 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   9 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  91 +++++-
 drivers/gpu/drm/i915/i915_drv.h               |   4 +-
 drivers/gpu/drm/i915/i915_pci.c               |   1 +
 drivers/gpu/drm/i915/i915_vma.c               |  46 ++-
 drivers/gpu/drm/i915/i915_vma.h               |   2 +
 drivers/gpu/drm/i915/i915_vma_resource.c      |   9 +-
 drivers/gpu/drm/i915/i915_vma_resource.h      |   6 +-
 drivers/gpu/drm/i915/intel_device_info.h      |   1 +
 27 files changed, 910 insertions(+), 155 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_defines.h
 create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h

-- 
2.36.1



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-18 13:16   ` Tvrtko Ursulin
  2022-07-22 11:56   ` Andi Shyti
  2022-07-14 12:06 ` [PATCH v2 02/21] drm/i915/gt: document with_intel_gt_pm_if_awake() Mauro Carvalho Chehab
                   ` (19 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Chris Wilson, Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Jason Ekstrand, John Harrison, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost,
	Mauro Carvalho Chehab, Rodrigo Vivi, Tvrtko Ursulin, dri-devel,
	intel-gfx, linux-kernel, stable, Fei Yang

From: Chris Wilson <chris.p.wilson@intel.com>

Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.

This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++----
 drivers/gpu/drm/i915/gt/intel_gt.c        | 17 ++++++++++-------
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 97c820eee115..6835279943df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -6,14 +6,15 @@
 
 #include <drm/drm_cache.h>
 
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+
 #include "i915_drv.h"
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
-#include "gt/intel_gt.h"
-
 void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 				 struct sg_table *pages,
 				 unsigned int sg_page_sizes)
@@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 
 	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct intel_gt *gt = to_gt(i915);
 		intel_wakeref_t wakeref;
 
-		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
-			intel_gt_invalidate_tlbs(to_gt(i915));
+		with_intel_gt_pm_if_awake(gt, wakeref)
+			intel_gt_invalidate_tlbs(gt);
 	}
 
 	return pages;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 68c2b0d8f187..c4d43da84d8e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -12,6 +12,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
@@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
 	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
 	enum intel_engine_id id;
 	const i915_reg_t *regs;
 	unsigned int num = 0;
@@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 
 	GEM_TRACE("\n");
 
-	assert_rpm_wakelock_held(&i915->runtime_pm);
-
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
 	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
 
+	awake = 0;
 	for_each_engine(engine, gt, id) {
 		struct reg_and_bit rb;
 
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
 		if (!i915_mmio_reg_offset(rb.reg))
 			continue;
 
 		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
 	}
 
 	spin_unlock_irq(&uncore->lock);
 
-	for_each_engine(engine, gt, id) {
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
 		/*
 		 * HW architecture suggest typical invalidation time at 40us,
 		 * with pessimistic cases up to 100us and a recommendation to
@@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		 */
 		const unsigned int timeout_us = 100;
 		const unsigned int timeout_ms = 4;
-		struct reg_and_bit rb;
 
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
 		if (__intel_wait_for_register_fw(uncore,
 						 rb.reg, rb.bit, 0,
 						 timeout_us, timeout_ms,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index bc898df7a48c..a334787a4939 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 02/21] drm/i915/gt: document with_intel_gt_pm_if_awake()
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-18 13:21   ` Tvrtko Ursulin
  2022-07-14 12:06 ` [PATCH v2 03/21] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations Mauro Carvalho Chehab
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Chris Wilson, Daniel Vetter, David Airlie,
	Jani Nikula, John Harrison, Joonas Lahtinen, Matthew Brost,
	Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel

Add a kernel-doc markup to document this new macro.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt_pm.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index a334787a4939..4d4caf612fdc 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,13 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+/**
+ * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent
+ *	it to sleep, run some code and then put the reference away.
+ *
+ * @gt: pointer to the gt
+ * @wf: pointer to a temporary wakeref.
+ */
 #define with_intel_gt_pm_if_awake(gt, wf) \
 	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 03/21] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 02/21] drm/i915/gt: document with_intel_gt_pm_if_awake() Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-18 13:24   ` Tvrtko Ursulin
  2022-07-22 11:57   ` Andi Shyti
  2022-07-14 12:06 ` [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation Mauro Carvalho Chehab
                   ` (17 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Mauro Carvalho Chehab, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, stable,
	Fei Yang, Thomas Hellström

From: Chris Wilson <chris.p.wilson@intel.com>

Ensure that the TLB of the OA unit is also invalidated
on gen12 HW, as just invalidating the TLB of an engine is not
enough.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index c4d43da84d8e..1d84418e8676 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,6 +11,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
@@ -969,6 +970,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
 	spin_unlock_irq(&uncore->lock);
 
 	for_each_engine_masked(engine, gt, awake, tmp) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 03/21] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-18 13:39   ` Tvrtko Ursulin
  2022-07-22 11:58   ` Andi Shyti
  2022-07-14 12:06 ` [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged Mauro Carvalho Chehab
                   ` (16 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Chris Wilson, Daniel Vetter, Dave Airlie, David Airlie,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	dri-devel, intel-gfx, linux-kernel, stable, Fei Yang, Andi Shyti,
	Thomas Hellström, Mauro Carvalho Chehab

From: Chris Wilson <chris.p.wilson@intel.com>

Don't flush TLBs when the buffer is only used in the GGTT under full
control of the kernel, as there's no risk of concurrent access
and stale access from prefetch.

We only need to invalidate the TLB if they are accessible by the user.
That helps to reduce the performance regression introduced by TLB
invalidate logic.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_vma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ef3b04c7e153..646f419b2035 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -538,7 +538,8 @@ int i915_vma_bind(struct i915_vma *vma,
 				   bind_flags);
 	}
 
-	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
+	if (bind_flags & I915_VMA_LOCAL_BIND)
+		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
 
 	atomic_or(bind_flags, &vma->flags);
 	return 0;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-18 13:45   ` Tvrtko Ursulin
  2022-07-22 12:00   ` Andi Shyti
  2022-07-14 12:06 ` [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab
                   ` (15 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Mauro Carvalho Chehab, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, stable,
	Fei Yang, Thomas Hellström

From: Chris Wilson <chris.p.wilson@intel.com>

Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 1d84418e8676..5c55a90672f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -934,6 +934,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
 		return;
 
+	if (intel_gt_is_wedged(gt))
+		return;
+
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-18 13:52   ` Tvrtko Ursulin
  2022-07-20 10:54   ` Tvrtko Ursulin
  2022-07-14 12:06 ` [PATCH v2 07/21] drm/i915/gt: describe the new tlb parameter at i915_vma_resource Mauro Carvalho Chehab
                   ` (14 subsequent siblings)
  20 siblings, 2 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Chris Wilson, Christian König, Thomas Hellström,
	Andi Shyti, Ashutosh Dixit, Ayaz A Siddiqui, Casey Bowman,
	Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie, David Airlie,
	Jani Nikula, Jason Ekstrand, John Harrison, Joonas Lahtinen,
	Lucas De Marchi, Maarten Lankhorst, Matt Roper, Matthew Auld,
	Mauro Carvalho Chehab, Prathap Kumar Valsan, Ramalingam C,
	Rodrigo Vivi, Sumit Semwal, Tomas Winkler, Tvrtko Ursulin,
	dri-devel, intel-gfx, linaro-mm-sig, linux-kernel, linux-media,
	stable, Tvrtko Ursulin, Fei Yang

From: Chris Wilson <chris.p.wilson@intel.com>

Invalidate TLB in patch, in order to reduce performance regressions.

Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.

We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

[mchehab: rebased to not require moving the code to a separate file]

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
 drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
 drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
 drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
 drivers/gpu/drm/i915/i915_vma.c               | 34 +++++++++---
 drivers/gpu/drm/i915/i915_vma.h               |  1 +
 drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
 drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
 10 files changed, 125 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5cf36a130061..9f6b14ec189a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -335,7 +335,6 @@ struct drm_i915_gem_object {
 #define I915_BO_READONLY          BIT(7)
 #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
 #define I915_BO_PROTECTED         BIT(9)
-#define I915_BO_WAS_BOUND_BIT     10
 	/**
 	 * @mem_flags - Mutable placement-related flags
 	 *
@@ -616,6 +615,8 @@ struct drm_i915_gem_object {
 		 * pages were last acquired.
 		 */
 		bool dirty:1;
+
+		u32 tlb;
 	} mm;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 6835279943df..8357dbdcab5c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
 		vunmap(ptr);
 }
 
+static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct intel_gt *gt = to_gt(i915);
+
+	if (!obj->mm.tlb)
+		return;
+
+	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
+	obj->mm.tlb = 0;
+}
+
 struct sg_table *
 __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 {
@@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 	__i915_gem_object_reset_page_iter(obj);
 	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
 
-	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
-		struct drm_i915_private *i915 = to_i915(obj->base.dev);
-		struct intel_gt *gt = to_gt(i915);
-		intel_wakeref_t wakeref;
-
-		with_intel_gt_pm_if_awake(gt, wakeref)
-			intel_gt_invalidate_tlbs(gt);
-	}
+	flush_tlb_invalidate(obj);
 
 	return pages;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 5c55a90672f4..f435e06125aa 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 {
 	spin_lock_init(&gt->irq_lock);
 
-	mutex_init(&gt->tlb_invalidate_lock);
-
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
@@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
 	intel_gt_init_timelines(gt);
+	mutex_init(&gt->tlb.invalidate_lock);
+	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
 	intel_gt_pm_init_early(gt);
 
 	intel_uc_init_early(&gt->uc);
@@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
 		intel_gt_fini_requests(gt);
 		intel_gt_fini_reset(gt);
 		intel_gt_fini_timelines(gt);
+		mutex_destroy(&gt->tlb.invalidate_lock);
 		intel_engines_free(gt);
 	}
 }
@@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
 	return rb;
 }
 
-void intel_gt_invalidate_tlbs(struct intel_gt *gt)
+static void mmio_invalidate_full(struct intel_gt *gt)
 {
 	static const i915_reg_t gen8_regs[] = {
 		[RENDER_CLASS]			= GEN8_RTCR,
@@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	const i915_reg_t *regs;
 	unsigned int num = 0;
 
-	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
-		return;
-
-	if (intel_gt_is_wedged(gt))
-		return;
-
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
@@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 			  "Platform does not implement TLB invalidation!"))
 		return;
 
-	GEM_TRACE("\n");
-
-	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
 	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
@@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	GT_TRACE(gt, "invalidated engines %08x\n", awake);
+
 	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
 	if (awake &&
 	    (IS_TIGERLAKE(i915) ||
@@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	 * transitions.
 	 */
 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
-	mutex_unlock(&gt->tlb_invalidate_lock);
+}
+
+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
+{
+	u32 cur = intel_gt_tlb_seqno(gt);
+
+	/* Only skip if a *full* TLB invalidate barrier has passed */
+	return (s32)(cur - ALIGN(seqno, 2)) > 0;
+}
+
+void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
+{
+	intel_wakeref_t wakeref;
+
+	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
+		return;
+
+	if (intel_gt_is_wedged(gt))
+		return;
+
+	if (tlb_seqno_passed(gt, seqno))
+		return;
+
+	with_intel_gt_pm_if_awake(gt, wakeref) {
+		mutex_lock(&gt->tlb.invalidate_lock);
+		if (tlb_seqno_passed(gt, seqno))
+			goto unlock;
+
+		mmio_invalidate_full(gt);
+
+		write_seqcount_invalidate(&gt->tlb.seqno);
+unlock:
+		mutex_unlock(&gt->tlb.invalidate_lock);
+	}
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 82d6f248d876..40b06adf509a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
 void intel_gt_watchdog_work(struct work_struct *work);
 
-void intel_gt_invalidate_tlbs(struct intel_gt *gt);
+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
+{
+	return seqprop_sequence(&gt->tlb.seqno);
+}
+
+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
+{
+	return intel_gt_tlb_seqno(gt) | 1;
+}
+
+void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
 
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index df708802889d..3804a583382b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -11,6 +11,7 @@
 #include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
+#include <linux/seqlock.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
@@ -83,7 +84,22 @@ struct intel_gt {
 	struct intel_uc uc;
 	struct intel_gsc gsc;
 
-	struct mutex tlb_invalidate_lock;
+	struct {
+		/* Serialize global tlb invalidations */
+		struct mutex invalidate_lock;
+
+		/*
+		 * Batch TLB invalidations
+		 *
+		 * After unbinding the PTE, we need to ensure the TLB
+		 * are invalidated prior to releasing the physical pages.
+		 * But we only need one such invalidation for all unbinds,
+		 * so we track how many TLB invalidations have been
+		 * performed since unbind the PTE and only emit an extra
+		 * invalidate if no full barrier has been passed.
+		 */
+		seqcount_mutex_t seqno;
+	} tlb;
 
 	struct i915_wa_list wa_list;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index d8b94d638559..2da6c82a8bd2 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
 void ppgtt_unbind_vma(struct i915_address_space *vm,
 		      struct i915_vma_resource *vma_res)
 {
-	if (vma_res->allocated)
-		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
+	if (!vma_res->allocated)
+		return;
+
+	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
+	if (vma_res->tlb)
+		vma_invalidate_tlb(vm, *vma_res->tlb);
 }
 
 static unsigned long pd_count(u64 size, int shift)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 646f419b2035..84a9ccbc5fc5 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -538,9 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
 				   bind_flags);
 	}
 
-	if (bind_flags & I915_VMA_LOCAL_BIND)
-		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
-
 	atomic_or(bind_flags, &vma->flags);
 	return 0;
 }
@@ -1311,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
 	return err;
 }
 
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
+{
+	/*
+	 * Before we release the pages that were bound by this vma, we
+	 * must invalidate all the TLBs that may still have a reference
+	 * back to our physical address. It only needs to be done once,
+	 * so after updating the PTE to point away from the pages, record
+	 * the most recent TLB invalidation seqno, and if we have not yet
+	 * flushed the TLBs upon release, perform a full invalidation.
+	 */
+	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
+}
+
 static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
 {
 	/* We allocate under vma_get_pages, so beware the shrinker */
@@ -1942,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 		vma->vm->skip_pte_rewrite;
 	trace_i915_vma_unbind(vma);
 
-	unbind_fence = i915_vma_resource_unbind(vma_res);
+	if (async)
+		unbind_fence = i915_vma_resource_unbind(vma_res,
+							&vma->obj->mm.tlb);
+	else
+		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
+
 	vma->resource = NULL;
 
 	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
@@ -1950,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 
 	i915_vma_detach(vma);
 
-	if (!async && unbind_fence) {
-		dma_fence_wait(unbind_fence, false);
-		dma_fence_put(unbind_fence);
-		unbind_fence = NULL;
+	if (!async) {
+		if (unbind_fence) {
+			dma_fence_wait(unbind_fence, false);
+			dma_fence_put(unbind_fence);
+			unbind_fence = NULL;
+		}
+		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 88ca0bd9c900..5048eed536da 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
 			u64 size, u64 alignment, u64 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 void i915_vma_revoke_mmap(struct i915_vma *vma);
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
 struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
 int __i915_vma_unbind(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
index 27c55027387a..5a67995ea5fe 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.c
+++ b/drivers/gpu/drm/i915/i915_vma_resource.c
@@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
  * Return: A refcounted pointer to a dma-fence that signals when unbinding is
  * complete.
  */
-struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
+struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
+					   u32 *tlb)
 {
 	struct i915_address_space *vm = vma_res->vm;
 
+	vma_res->tlb = tlb;
+
 	/* Reference for the sw fence */
 	i915_vma_resource_get(vma_res);
 
diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
index 5d8427caa2ba..06923d1816e7 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.h
+++ b/drivers/gpu/drm/i915/i915_vma_resource.h
@@ -67,6 +67,7 @@ struct i915_page_sizes {
  * taken when the unbind is scheduled.
  * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
  * needs to be skipped for unbind.
+ * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
  *
  * The lifetime of a struct i915_vma_resource is from a binding request to
  * the actual possible asynchronous unbind has completed.
@@ -119,6 +120,8 @@ struct i915_vma_resource {
 	bool immediate_unbind:1;
 	bool needs_wakeref:1;
 	bool skip_pte_rewrite:1;
+
+	u32 *tlb;
 };
 
 bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
@@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
 
 void i915_vma_resource_free(struct i915_vma_resource *vma_res);
 
-struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
+struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
+					   u32 *tlb);
 
 void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 07/21] drm/i915/gt: describe the new tlb parameter at i915_vma_resource
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 08/21] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Daniel Vetter, David Airlie, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, dri-devel,
	intel-gfx, linux-kernel

TLB cache invalidation can happen on two different situations:

1. synchronously, at __vma_put_pages();
2. asynchronously.

On the first case, TLB cache invalidation happens inside
__vma_put_pages(). So, no need to do it later on.

However, on the second case, the pages will keep in memory
until __i915_vma_evict() is called.

So, we need to store the TLB data at struct i915_vma_resource,
in order to do a TLB cache invalidation before allowing
userspace to re-use the same memory.

So, i915_vma_resource_unbind() has gained a new parameter
in order to store the TLB data at the second case.

Document it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
index 5a67995ea5fe..4fe09ea0a825 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.c
+++ b/drivers/gpu/drm/i915/i915_vma_resource.c
@@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
 /**
  * i915_vma_resource_unbind - Unbind a vma resource
  * @vma_res: The vma resource to unbind.
+ * @tlb: pointer to vma->obj->mm.tlb associated with the resource
+ *	 to be stored at vma_res->tlb. When not-NULL, it will be used
+ *	 to do TLB cache invalidation before freeing a VMA resource.
+ *	 used only for async unbind.
  *
  * At this point this function does little more than publish a fence that
  * signals immediately unless signaling is held back.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 08/21] drm/i915/gt: Move TLB invalidation to its own file
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (6 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 07/21] drm/i915/gt: describe the new tlb parameter at i915_vma_resource Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-22 12:07   ` Andi Shyti
  2022-07-14 12:06 ` [PATCH v2 09/21] drm/i915/guc: Define CTB based TLB invalidation routines Mauro Carvalho Chehab
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Chris Wilson, Thomas Hellström, Andi Shyti, Casey Bowman,
	Daniel Vetter, Daniele Ceraolo Spurio, David Airlie, Jani Nikula,
	Jason Ekstrand, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld,
	Mauro Carvalho Chehab, Prathap Kumar Valsan, Rodrigo Vivi,
	Tomas Winkler, Tvrtko Ursulin, dri-devel, intel-gfx,
	linux-kernel, Fei Yang

From: Chris Wilson <chris.p.wilson@intel.com>

Prepare for supporting more TLB invalidation scenarios by moving
the current MMIO invalidation to its own file.

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/Makefile             |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   4 +-
 drivers/gpu/drm/i915/gt/intel_gt.c        | 168 +-------------------
 drivers/gpu/drm/i915/gt/intel_gt.h        |  12 --
 drivers/gpu/drm/i915/gt/intel_tlb.c       | 183 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_tlb.h       |  29 ++++
 drivers/gpu/drm/i915/i915_vma.c           |   1 +
 7 files changed, 219 insertions(+), 179 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
 create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 522ef9b4aff3..d3df9832d1f7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -126,6 +126,7 @@ gt-y += \
 	gt/intel_sseu.o \
 	gt/intel_sseu_debugfs.o \
 	gt/intel_timeline.o \
+	gt/intel_tlb.o \
 	gt/intel_workarounds.o \
 	gt/shmem_utils.o \
 	gt/sysfs_engines.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 8357dbdcab5c..1cd76cc5d9f3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -7,7 +7,7 @@
 #include <drm/drm_cache.h>
 
 #include "gt/intel_gt.h"
-#include "gt/intel_gt_pm.h"
+#include "gt/intel_tlb.h"
 
 #include "i915_drv.h"
 #include "i915_gem_object.h"
@@ -199,7 +199,7 @@ static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
 	if (!obj->mm.tlb)
 		return;
 
-	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
+	intel_gt_invalidate_tlb_full(gt, obj->mm.tlb);
 	obj->mm.tlb = 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index f435e06125aa..18d82cd620bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,9 +11,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
-#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
-#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
@@ -31,6 +29,7 @@
 #include "intel_renderstate.h"
 #include "intel_rps.h"
 #include "intel_gt_sysfs.h"
+#include "intel_tlb.h"
 #include "intel_uncore.h"
 #include "shmem_utils.h"
 
@@ -48,8 +47,7 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
 	intel_gt_init_timelines(gt);
-	mutex_init(&gt->tlb.invalidate_lock);
-	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
+	intel_gt_init_tlb(gt);
 	intel_gt_pm_init_early(gt);
 
 	intel_uc_init_early(&gt->uc);
@@ -770,7 +768,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
 		intel_gt_fini_requests(gt);
 		intel_gt_fini_reset(gt);
 		intel_gt_fini_timelines(gt);
-		mutex_destroy(&gt->tlb.invalidate_lock);
+		intel_gt_fini_tlb(gt);
 		intel_engines_free(gt);
 	}
 }
@@ -881,163 +879,3 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
 	intel_sseu_dump(&info->sseu, p);
 }
-
-struct reg_and_bit {
-	i915_reg_t reg;
-	u32 bit;
-};
-
-static struct reg_and_bit
-get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
-		const i915_reg_t *regs, const unsigned int num)
-{
-	const unsigned int class = engine->class;
-	struct reg_and_bit rb = { };
-
-	if (drm_WARN_ON_ONCE(&engine->i915->drm,
-			     class >= num || !regs[class].reg))
-		return rb;
-
-	rb.reg = regs[class];
-	if (gen8 && class == VIDEO_DECODE_CLASS)
-		rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
-	else
-		rb.bit = engine->instance;
-
-	rb.bit = BIT(rb.bit);
-
-	return rb;
-}
-
-static void mmio_invalidate_full(struct intel_gt *gt)
-{
-	static const i915_reg_t gen8_regs[] = {
-		[RENDER_CLASS]			= GEN8_RTCR,
-		[VIDEO_DECODE_CLASS]		= GEN8_M1TCR, /* , GEN8_M2TCR */
-		[VIDEO_ENHANCEMENT_CLASS]	= GEN8_VTCR,
-		[COPY_ENGINE_CLASS]		= GEN8_BTCR,
-	};
-	static const i915_reg_t gen12_regs[] = {
-		[RENDER_CLASS]			= GEN12_GFX_TLB_INV_CR,
-		[VIDEO_DECODE_CLASS]		= GEN12_VD_TLB_INV_CR,
-		[VIDEO_ENHANCEMENT_CLASS]	= GEN12_VE_TLB_INV_CR,
-		[COPY_ENGINE_CLASS]		= GEN12_BLT_TLB_INV_CR,
-		[COMPUTE_CLASS]			= GEN12_COMPCTX_TLB_INV_CR,
-	};
-	struct drm_i915_private *i915 = gt->i915;
-	struct intel_uncore *uncore = gt->uncore;
-	struct intel_engine_cs *engine;
-	intel_engine_mask_t awake, tmp;
-	enum intel_engine_id id;
-	const i915_reg_t *regs;
-	unsigned int num = 0;
-
-	if (GRAPHICS_VER(i915) == 12) {
-		regs = gen12_regs;
-		num = ARRAY_SIZE(gen12_regs);
-	} else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
-		regs = gen8_regs;
-		num = ARRAY_SIZE(gen8_regs);
-	} else if (GRAPHICS_VER(i915) < 8) {
-		return;
-	}
-
-	if (drm_WARN_ONCE(&i915->drm, !num,
-			  "Platform does not implement TLB invalidation!"))
-		return;
-
-	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
-
-	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
-
-	awake = 0;
-	for_each_engine(engine, gt, id) {
-		struct reg_and_bit rb;
-
-		if (!intel_engine_pm_is_awake(engine))
-			continue;
-
-		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
-		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
-		awake |= engine->mask;
-	}
-
-	GT_TRACE(gt, "invalidated engines %08x\n", awake);
-
-	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
-	if (awake &&
-	    (IS_TIGERLAKE(i915) ||
-	     IS_DG1(i915) ||
-	     IS_ROCKETLAKE(i915) ||
-	     IS_ALDERLAKE_S(i915) ||
-	     IS_ALDERLAKE_P(i915)))
-		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
-
-	spin_unlock_irq(&uncore->lock);
-
-	for_each_engine_masked(engine, gt, awake, tmp) {
-		struct reg_and_bit rb;
-
-		/*
-		 * HW architecture suggest typical invalidation time at 40us,
-		 * with pessimistic cases up to 100us and a recommendation to
-		 * cap at 1ms. We go a bit higher just in case.
-		 */
-		const unsigned int timeout_us = 100;
-		const unsigned int timeout_ms = 4;
-
-		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (__intel_wait_for_register_fw(uncore,
-						 rb.reg, rb.bit, 0,
-						 timeout_us, timeout_ms,
-						 NULL))
-			drm_err_ratelimited(&gt->i915->drm,
-					    "%s TLB invalidation did not complete in %ums!\n",
-					    engine->name, timeout_ms);
-	}
-
-	/*
-	 * Use delayed put since a) we mostly expect a flurry of TLB
-	 * invalidations so it is good to avoid paying the forcewake cost and
-	 * b) it works around a bug in Icelake which cannot cope with too rapid
-	 * transitions.
-	 */
-	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
-}
-
-static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
-{
-	u32 cur = intel_gt_tlb_seqno(gt);
-
-	/* Only skip if a *full* TLB invalidate barrier has passed */
-	return (s32)(cur - ALIGN(seqno, 2)) > 0;
-}
-
-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
-{
-	intel_wakeref_t wakeref;
-
-	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
-		return;
-
-	if (intel_gt_is_wedged(gt))
-		return;
-
-	if (tlb_seqno_passed(gt, seqno))
-		return;
-
-	with_intel_gt_pm_if_awake(gt, wakeref) {
-		mutex_lock(&gt->tlb.invalidate_lock);
-		if (tlb_seqno_passed(gt, seqno))
-			goto unlock;
-
-		mmio_invalidate_full(gt);
-
-		write_seqcount_invalidate(&gt->tlb.seqno);
-unlock:
-		mutex_unlock(&gt->tlb.invalidate_lock);
-	}
-}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 40b06adf509a..b4bba16cdb53 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -101,16 +101,4 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
 void intel_gt_watchdog_work(struct work_struct *work);
 
-static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
-{
-	return seqprop_sequence(&gt->tlb.seqno);
-}
-
-static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
-{
-	return intel_gt_tlb_seqno(gt) | 1;
-}
-
-void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
-
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
new file mode 100644
index 000000000000..af8cae979489
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -0,0 +1,183 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
+#include "intel_engine_pm.h"
+#include "intel_gt.h"
+#include "intel_gt_pm.h"
+#include "intel_gt_regs.h"
+#include "intel_tlb.h"
+
+struct reg_and_bit {
+	i915_reg_t reg;
+	u32 bit;
+};
+
+static struct reg_and_bit
+get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
+		const i915_reg_t *regs, const unsigned int num)
+{
+	const unsigned int class = engine->class;
+	struct reg_and_bit rb = { };
+
+	if (drm_WARN_ON_ONCE(&engine->i915->drm,
+			     class >= num || !regs[class].reg))
+		return rb;
+
+	rb.reg = regs[class];
+	if (gen8 && class == VIDEO_DECODE_CLASS)
+		rb.reg.reg += 4 * engine->instance; /* GEN8_M2TCR */
+	else
+		rb.bit = engine->instance;
+
+	rb.bit = BIT(rb.bit);
+
+	return rb;
+}
+
+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
+{
+	u32 cur = intel_gt_tlb_seqno(gt);
+
+	/* Only skip if a *full* TLB invalidate barrier has passed */
+	return (s32)(cur - ALIGN(seqno, 2)) > 0;
+}
+
+static void mmio_invalidate_full(struct intel_gt *gt)
+{
+	static const i915_reg_t gen8_regs[] = {
+		[RENDER_CLASS]			= GEN8_RTCR,
+		[VIDEO_DECODE_CLASS]		= GEN8_M1TCR, /* , GEN8_M2TCR */
+		[VIDEO_ENHANCEMENT_CLASS]	= GEN8_VTCR,
+		[COPY_ENGINE_CLASS]		= GEN8_BTCR,
+	};
+	static const i915_reg_t gen12_regs[] = {
+		[RENDER_CLASS]			= GEN12_GFX_TLB_INV_CR,
+		[VIDEO_DECODE_CLASS]		= GEN12_VD_TLB_INV_CR,
+		[VIDEO_ENHANCEMENT_CLASS]	= GEN12_VE_TLB_INV_CR,
+		[COPY_ENGINE_CLASS]		= GEN12_BLT_TLB_INV_CR,
+		[COMPUTE_CLASS]			= GEN12_COMPCTX_TLB_INV_CR,
+	};
+	struct drm_i915_private *i915 = gt->i915;
+	struct intel_uncore *uncore = gt->uncore;
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
+	enum intel_engine_id id;
+	const i915_reg_t *regs;
+	unsigned int num = 0;
+
+	if (GRAPHICS_VER(i915) == 12) {
+		regs = gen12_regs;
+		num = ARRAY_SIZE(gen12_regs);
+	} else if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) <= 11) {
+		regs = gen8_regs;
+		num = ARRAY_SIZE(gen8_regs);
+	} else if (GRAPHICS_VER(i915) < 8) {
+		return;
+	}
+
+	if (drm_WARN_ONCE(&i915->drm, !num,
+			  "Platform does not implement TLB invalidation!"))
+		return;
+
+	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
+
+	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
+
+	awake = 0;
+	for_each_engine(engine, gt, id) {
+		struct reg_and_bit rb;
+
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+		if (!i915_mmio_reg_offset(rb.reg))
+			continue;
+
+		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
+	}
+
+	GT_TRACE(gt, "invalidated engines %08x\n", awake);
+
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
+	spin_unlock_irq(&uncore->lock);
+
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
+		/*
+		 * HW architecture suggest typical invalidation time at 40us,
+		 * with pessimistic cases up to 100us and a recommendation to
+		 * cap at 1ms. We go a bit higher just in case.
+		 */
+		const unsigned int timeout_us = 100;
+		const unsigned int timeout_ms = 4;
+
+		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
+		if (__intel_wait_for_register_fw(uncore,
+						 rb.reg, rb.bit, 0,
+						 timeout_us, timeout_ms,
+						 NULL))
+			drm_err_ratelimited(&gt->i915->drm,
+					    "%s TLB invalidation did not complete in %ums!\n",
+					    engine->name, timeout_ms);
+	}
+
+	/*
+	 * Use delayed put since a) we mostly expect a flurry of TLB
+	 * invalidations so it is good to avoid paying the forcewake cost and
+	 * b) it works around a bug in Icelake which cannot cope with too rapid
+	 * transitions.
+	 */
+	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
+}
+
+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
+{
+	intel_wakeref_t wakeref;
+
+	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
+		return;
+
+	if (intel_gt_is_wedged(gt))
+		return;
+
+	if (tlb_seqno_passed(gt, seqno))
+		return;
+
+	with_intel_gt_pm_if_awake(gt, wakeref) {
+		mutex_lock(&gt->tlb.invalidate_lock);
+		if (tlb_seqno_passed(gt, seqno))
+			goto unlock;
+
+		mmio_invalidate_full(gt);
+
+		write_seqcount_invalidate(&gt->tlb.seqno);
+unlock:
+		mutex_unlock(&gt->tlb.invalidate_lock);
+	}
+}
+
+void intel_gt_init_tlb(struct intel_gt *gt)
+{
+	mutex_init(&gt->tlb.invalidate_lock);
+	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
+}
+
+void intel_gt_fini_tlb(struct intel_gt *gt)
+{
+	mutex_destroy(&gt->tlb.invalidate_lock);
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
new file mode 100644
index 000000000000..46ce25bf5afe
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2022 Intel Corporation
+ */
+
+#ifndef INTEL_TLB_H
+#define INTEL_TLB_H
+
+#include <linux/seqlock.h>
+#include <linux/types.h>
+
+#include "intel_gt_types.h"
+
+void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
+
+void intel_gt_init_tlb(struct intel_gt *gt);
+void intel_gt_fini_tlb(struct intel_gt *gt);
+
+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
+{
+	return seqprop_sequence(&gt->tlb.seqno);
+}
+
+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
+{
+	return intel_gt_tlb_seqno(gt) | 1;
+}
+
+#endif /* INTEL_TLB_H */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 84a9ccbc5fc5..fe947d1456d5 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -33,6 +33,7 @@
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_requests.h"
+#include "gt/intel_tlb.h"
 
 #include "i915_drv.h"
 #include "i915_gem_evict.h"
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 09/21] drm/i915/guc: Define CTB based TLB invalidation routines
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (7 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 08/21] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 14:06   ` Michal Wajdeczko
  2022-07-14 12:06 ` [PATCH v2 10/21] drm/i915/guc: use kernel-doc for enum intel_guc_tlb_inval_mode Mauro Carvalho Chehab
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Prathap Kumar Valsan, Alan Previn, Borislav Petkov,
	Daniel Vetter, Daniele Ceraolo Spurio, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Lucas De Marchi, Matt Roper,
	Matthew Brost, Mauro Carvalho Chehab, Michal Wajdeczko,
	Rodrigo Vivi, Tvrtko Ursulin, Umesh Nerlige Ramappa,
	Vinay Belgaumkar, dri-devel, intel-gfx, linux-kernel,
	Bruce Chang, Chris Wilson

From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>

Add routines to interface with GuC firmware for TLB invalidation.

Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Bruce Chang <yu.bruce.chang@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 35 +++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 90 ++++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 13 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 24 ++++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  6 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
 6 files changed, 253 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 4ef9990ed7f8..2e39d8df4c82 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -134,6 +134,10 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
 	INTEL_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
 	INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
+	INTEL_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR = 0x6000,
+	INTEL_GUC_ACTION_PAGE_FAULT_NOTIFICATION = 0x6001,
+	INTEL_GUC_ACTION_TLB_INVALIDATION = 0x7000,
+	INTEL_GUC_ACTION_TLB_INVALIDATION_DONE = 0x7001,
 	INTEL_GUC_ACTION_STATE_CAPTURE_NOTIFICATION = 0x8002,
 	INTEL_GUC_ACTION_NOTIFY_FLUSH_LOG_BUFFER_TO_FILE = 0x8003,
 	INTEL_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED = 0x8004,
@@ -177,4 +181,35 @@ enum intel_guc_state_capture_event_status {
 
 #define INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_MASK      0x000000FF
 
+#define INTEL_GUC_TLB_INVAL_TYPE_SHIFT 0
+#define INTEL_GUC_TLB_INVAL_MODE_SHIFT 8
+/* Flush PPC or SMRO caches along with TLB invalidation request */
+#define INTEL_GUC_TLB_INVAL_FLUSH_CACHE (1 << 31)
+
+enum intel_guc_tlb_invalidation_type {
+	INTEL_GUC_TLB_INVAL_GUC = 0x3,
+};
+
+/*
+ * 0: Heavy mode of Invalidation:
+ * The pipeline of the engine(s) for which the invalidation is targeted to is
+ * blocked, and all the in-flight transactions are guaranteed to be Globally
+ * Observed before completing the TLB invalidation
+ * 1: Lite mode of Invalidation:
+ * TLBs of the targeted engine(s) are immediately invalidated.
+ * In-flight transactions are NOT guaranteed to be Globally Observed before
+ * completing TLB invalidation.
+ * Light Invalidation Mode is to be used only when
+ * it can be guaranteed (by SW) that the address translations remain invariant
+ * for the in-flight transactions across the TLB invalidation. In other words,
+ * this mode can be used when the TLB invalidation is intended to clear out the
+ * stale cached translations that are no longer in use. Light Invalidation Mode
+ * is much faster than the Heavy Invalidation Mode, as it does not wait for the
+ * in-flight transactions to be GOd.
+ */
+enum intel_guc_tlb_inval_mode {
+	INTEL_GUC_TLB_INVAL_MODE_HEAVY = 0x0,
+	INTEL_GUC_TLB_INVAL_MODE_LITE = 0x1,
+};
+
 #endif /* _ABI_GUC_ACTIONS_ABI_H */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 2706a8c65090..5c59f9b144a3 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -855,6 +855,96 @@ int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value)
 	return __guc_self_cfg(guc, key, 2, value);
 }
 
+static int guc_send_invalidate_tlb(struct intel_guc *guc, u32 *action, u32 size)
+{
+	struct intel_guc_tlb_wait _wq, *wq = &_wq;
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+	int err = 0;
+	u32 seqno;
+
+	init_waitqueue_head(&_wq.wq);
+
+	if (xa_alloc_cyclic_irq(&guc->tlb_lookup, &seqno, wq,
+				xa_limit_32b, &guc->next_seqno,
+				GFP_ATOMIC | __GFP_NOWARN) < 0) {
+		/* Under severe memory pressure? Serialise TLB allocations */
+		xa_lock_irq(&guc->tlb_lookup);
+		wq = xa_load(&guc->tlb_lookup, guc->serial_slot);
+		wait_event_lock_irq(wq->wq,
+				    !READ_ONCE(wq->status),
+				    guc->tlb_lookup.xa_lock);
+		/*
+		 * Update wq->status under lock to ensure only one waiter can
+		 * issue the tlb invalidation command using the serial slot at a
+		 * time. The condition is set to false before releasing the lock
+		 * so that other caller continue to wait until woken up again.
+		 */
+		wq->status = 1;
+		xa_unlock_irq(&guc->tlb_lookup);
+
+		seqno = guc->serial_slot;
+	}
+
+	action[1] = seqno;
+
+	add_wait_queue(&wq->wq, &wait);
+
+	err = intel_guc_send_busy_loop(guc, action, size, G2H_LEN_DW_INVALIDATE_TLB, true);
+	if (err) {
+		/*
+		 * XXX: Failure of tlb invalidation is critical and would
+		 * warrant a gt reset.
+		 */
+		goto out;
+	}
+/*
+ * GuC has a timeout of 1ms for a tlb invalidation response from GAM. On a
+ * timeout GuC drops the request and has no mechanism to notify the host about
+ * the timeout. So keep a larger timeout that accounts for this individual
+ * timeout and max number of outstanding invalidation requests that can be
+ * queued in CT buffer.
+ */
+#define OUTSTANDING_GUC_TIMEOUT_PERIOD  (HZ)
+	if (!wait_woken(&wait, TASK_UNINTERRUPTIBLE,
+			OUTSTANDING_GUC_TIMEOUT_PERIOD)) {
+		/*
+		 * XXX: Failure of tlb invalidation is critical and would
+		 * warrant a gt reset.
+		 */
+		drm_err(&guc_to_gt(guc)->i915->drm,
+			 "tlb invalidation response timed out for seqno %u\n", seqno);
+		err = -ETIME;
+	}
+out:
+	remove_wait_queue(&wq->wq, &wait);
+	if (seqno != guc->serial_slot)
+		xa_erase_irq(&guc->tlb_lookup, seqno);
+
+	return err;
+}
+
+/*
+ * Guc TLB Invalidation: Invalidate the TLB's of GuC itself.
+ */
+int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
+				 enum intel_guc_tlb_inval_mode mode)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_TLB_INVALIDATION,
+		0,
+		INTEL_GUC_TLB_INVAL_GUC << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
+			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
+			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
+	};
+
+	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc)) {
+		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");
+		return 0;
+	}
+
+	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
+}
+
 /**
  * intel_guc_load_status - dump information about GuC load status
  * @guc: the GuC
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index d0d99f178f2d..f82a121b0838 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -77,6 +77,10 @@ struct intel_guc {
 	atomic_t outstanding_submission_g2h;
 
 	/** @interrupts: pointers to GuC interrupt-managing functions. */
+	struct xarray tlb_lookup;
+	u32 serial_slot;
+	u32 next_seqno;
+
 	struct {
 		void (*reset)(struct intel_guc *guc);
 		void (*enable)(struct intel_guc *guc);
@@ -248,6 +252,11 @@ struct intel_guc {
 #endif
 };
 
+struct intel_guc_tlb_wait {
+	struct wait_queue_head wq;
+	u8 status;
+} __aligned(4);
+
 static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
 {
 	return container_of(log, struct intel_guc, log);
@@ -363,6 +372,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size,
 int intel_guc_self_cfg32(struct intel_guc *guc, u16 key, u32 value);
 int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value);
 
+int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
+				 enum intel_guc_tlb_inval_mode mode);
+
 static inline bool intel_guc_is_supported(struct intel_guc *guc)
 {
 	return intel_uc_fw_is_supported(&guc->fw);
@@ -440,6 +452,7 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 					 const u32 *msg, u32 len);
 int intel_guc_error_capture_process_msg(struct intel_guc *guc,
 					const u32 *msg, u32 len);
+void intel_guc_tlb_invalidation_done(struct intel_guc *guc, u32 seqno);
 
 struct intel_engine_cs *
 intel_guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index f01325cd1b62..c1ce542b7855 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -1023,7 +1023,7 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	return 0;
 }
 
-static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
+static bool ct_process_incoming_requests(struct intel_guc_ct *ct, struct list_head *incoming)
 {
 	unsigned long flags;
 	struct ct_incoming_msg *request;
@@ -1031,11 +1031,11 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
 	int err;
 
 	spin_lock_irqsave(&ct->requests.lock, flags);
-	request = list_first_entry_or_null(&ct->requests.incoming,
+	request = list_first_entry_or_null(incoming,
 					   struct ct_incoming_msg, link);
 	if (request)
 		list_del(&request->link);
-	done = !!list_empty(&ct->requests.incoming);
+	done = !!list_empty(incoming);
 	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
 	if (!request)
@@ -1058,7 +1058,7 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
 	bool done;
 
 	do {
-		done = ct_process_incoming_requests(ct);
+		done = ct_process_incoming_requests(ct, &ct->requests.incoming);
 	} while (!done);
 }
 
@@ -1078,14 +1078,30 @@ static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *requ
 	switch (action) {
 	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
 	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+	case INTEL_GUC_ACTION_TLB_INVALIDATION_DONE:
 		g2h_release_space(ct, request->size);
 	}
+	/* Handle tlb invalidation response in interrupt context */
+	if (action == INTEL_GUC_ACTION_TLB_INVALIDATION_DONE) {
+		const u32 *payload;
+		u32 hxg_len, len;
+
+		hxg_len = request->size - GUC_CTB_MSG_MIN_LEN;
+		len = hxg_len - GUC_HXG_MSG_MIN_LEN;
+		if (unlikely(len < 1))
+			return -EPROTO;
+		payload = &hxg[GUC_HXG_MSG_MIN_LEN];
+		intel_guc_tlb_invalidation_done(ct_to_guc(ct),  payload[0]);
+		ct_free_msg(request);
+		return 0;
+	}
 
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_add_tail(&request->link, &ct->requests.incoming);
 	spin_unlock_irqrestore(&ct->requests.lock, flags);
 
 	queue_work(system_unbound_wq, &ct->requests.worker);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index b3c9a9327f76..3edf567b3f65 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -22,6 +22,7 @@
 /* Payload length only i.e. don't include G2H header length */
 #define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET	2
 #define G2H_LEN_DW_DEREGISTER_CONTEXT		1
+#define G2H_LEN_DW_INVALIDATE_TLB		1
 
 #define GUC_CONTEXT_DISABLE		0
 #define GUC_CONTEXT_ENABLE		1
@@ -431,4 +432,9 @@ enum intel_guc_recv_message {
 	INTEL_GUC_RECV_MSG_EXCEPTION = BIT(30),
 };
 
+#define INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc) \
+	((intel_guc_ct_enabled(&(guc)->ct)) && \
+	 (intel_guc_submission_is_used(guc)) && \
+	 (GRAPHICS_VER(guc_to_gt((guc))->i915) >= 12))
+
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 40f726c61e95..6888ea1bc7c1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1653,11 +1653,20 @@ static void __guc_reset_context(struct intel_context *ce, intel_engine_mask_t st
 	intel_context_put(parent);
 }
 
+static void wake_up_tlb_invalidate(struct intel_guc_tlb_wait *wait)
+{
+	/* Barrier to ensure the store is observed by the woken thread */
+	smp_store_mb(wait->status, 0);
+	wake_up(&wait->wq);
+}
+
 void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stalled)
 {
+	struct intel_guc_tlb_wait *wait;
 	struct intel_context *ce;
 	unsigned long index;
 	unsigned long flags;
+	unsigned long i;
 
 	if (unlikely(!guc_submission_initialized(guc))) {
 		/* Reset called during driver load? GuC not yet initialised! */
@@ -1683,6 +1692,13 @@ void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stall
 
 	/* GuC is blown away, drop all references to contexts */
 	xa_destroy(&guc->context_lookup);
+
+	/*
+	 * The full GT reset will have cleared the TLB caches and flushed the
+	 * G2H message queue; we can release all the blocked waiters.
+	 */
+	xa_for_each(&guc->tlb_lookup, i, wait)
+		wake_up_tlb_invalidate(wait);
 }
 
 static void guc_cancel_context_requests(struct intel_context *ce)
@@ -1805,6 +1821,41 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 static void destroyed_worker_func(struct work_struct *w);
 static void reset_fail_worker_func(struct work_struct *w);
 
+static int init_tlb_lookup(struct intel_guc *guc)
+{
+	struct intel_guc_tlb_wait *wait;
+	int err;
+
+	xa_init_flags(&guc->tlb_lookup, XA_FLAGS_ALLOC);
+
+	wait = kzalloc(sizeof(*wait), GFP_KERNEL);
+	if (!wait)
+		return -ENOMEM;
+
+	init_waitqueue_head(&wait->wq);
+	err = xa_alloc_cyclic_irq(&guc->tlb_lookup, &guc->serial_slot, wait,
+				  xa_limit_32b, &guc->next_seqno, GFP_KERNEL);
+	if (err == -ENOMEM) {
+		kfree(wait);
+		return err;
+	}
+
+	return 0;
+}
+
+static void fini_tlb_lookup(struct intel_guc *guc)
+{
+	struct intel_guc_tlb_wait *wait;
+
+	wait = xa_load(&guc->tlb_lookup, guc->serial_slot);
+	if (wait) {
+		GEM_BUG_ON(wait->status);
+		kfree(wait);
+	}
+
+	xa_destroy(&guc->tlb_lookup);
+}
+
 /*
  * Set up the memory resources to be shared with the GuC (via the GGTT)
  * at firmware loading time.
@@ -1812,20 +1863,31 @@ static void reset_fail_worker_func(struct work_struct *w);
 int intel_guc_submission_init(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
+	int ret;
 
 	if (guc->submission_initialized)
 		return 0;
 
+	ret = init_tlb_lookup(guc);
+	if (ret)
+		return ret;
+
 	guc->submission_state.guc_ids_bitmap =
 		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
-	if (!guc->submission_state.guc_ids_bitmap)
-		return -ENOMEM;
+	if (!guc->submission_state.guc_ids_bitmap) {
+		ret = -ENOMEM;
+		goto err;
+	}
 
 	guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ;
 	guc->timestamp.shift = gpm_timestamp_shift(gt);
 	guc->submission_initialized = true;
 
 	return 0;
+
+err:
+	fini_tlb_lookup(guc);
+	return ret;
 }
 
 void intel_guc_submission_fini(struct intel_guc *guc)
@@ -1836,6 +1898,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	guc_flush_destroyed_contexts(guc);
 	i915_sched_engine_put(guc->sched_engine);
 	bitmap_free(guc->submission_state.guc_ids_bitmap);
+	fini_tlb_lookup(guc);
 	guc->submission_initialized = false;
 }
 
@@ -4027,6 +4090,30 @@ g2h_context_lookup(struct intel_guc *guc, u32 ctx_id)
 	return ce;
 }
 
+static void wait_wake_outstanding_tlb_g2h(struct intel_guc *guc, u32 seqno)
+{
+	struct intel_guc_tlb_wait *wait;
+	unsigned long flags;
+
+	xa_lock_irqsave(&guc->tlb_lookup, flags);
+	wait = xa_load(&guc->tlb_lookup, seqno);
+
+	/* We received a response after the waiting task did exit with a timeout */
+	if (unlikely(!wait))
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Stale tlb invalidation response with seqno %d\n", seqno);
+
+	if (wait)
+		wake_up_tlb_invalidate(wait);
+
+	xa_unlock_irqrestore(&guc->tlb_lookup, flags);
+}
+
+void intel_guc_tlb_invalidation_done(struct intel_guc *guc, u32 seqno)
+{
+	wait_wake_outstanding_tlb_g2h(guc, seqno);
+}
+
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg,
 					  u32 len)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 10/21] drm/i915/guc: use kernel-doc for enum intel_guc_tlb_inval_mode
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (8 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 09/21] drm/i915/guc: Define CTB based TLB invalidation routines Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 11/21] drm/i915/guc: document the TLB invalidation struct members Mauro Carvalho Chehab
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Alan Previn, Daniel Vetter, David Airlie,
	Jani Nikula, John Harrison, Joonas Lahtinen, Matthew Brost,
	Michal Wajdeczko, Prathap Kumar Valsan, Rodrigo Vivi,
	Tvrtko Ursulin, Vinay Belgaumkar, dri-devel, intel-gfx,
	linux-kernel

Transform the comments for intel_guc_tlb_inval_mode into a
kernel-doc markup.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 2e39d8df4c82..14e35a2f8306 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -190,15 +190,18 @@ enum intel_guc_tlb_invalidation_type {
 	INTEL_GUC_TLB_INVAL_GUC = 0x3,
 };
 
-/*
- * 0: Heavy mode of Invalidation:
+/**
+ * enum intel_guc_tlb_inval_mode - define the mode for TLB cache invlidation
+ *
+ * @INTEL_GUC_TLB_INVAL_MODE_HEAVY: Heavy Invalidation Mode.
  * The pipeline of the engine(s) for which the invalidation is targeted to is
  * blocked, and all the in-flight transactions are guaranteed to be Globally
- * Observed before completing the TLB invalidation
- * 1: Lite mode of Invalidation:
+ * Observed before completing the TLB invalidation.
+ * @INTEL_GUC_TLB_INVAL_MODE_LITE: Light Invalidation Mode.
  * TLBs of the targeted engine(s) are immediately invalidated.
  * In-flight transactions are NOT guaranteed to be Globally Observed before
  * completing TLB invalidation.
+ *
  * Light Invalidation Mode is to be used only when
  * it can be guaranteed (by SW) that the address translations remain invariant
  * for the in-flight transactions across the TLB invalidation. In other words,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 11/21] drm/i915/guc: document the TLB invalidation struct members
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (9 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 10/21] drm/i915/guc: use kernel-doc for enum intel_guc_tlb_inval_mode Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 12/21] drm/i915/guc: Introduce TLB_INVALIDATION_ALL action Mauro Carvalho Chehab
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Alan Previn, Daniel Vetter,
	Daniele Ceraolo Spurio, David Airlie, Jani Nikula, John Harrison,
	Joonas Lahtinen, Lucas De Marchi, Matthew Brost,
	Prathap Kumar Valsan, Rodrigo Vivi, Tvrtko Ursulin,
	Umesh Nerlige Ramappa, dri-devel, intel-gfx, linux-kernel

Add documentation for the 3 new members of struct intel_guc
that are used to handle TLB cache invalidation logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/uc/intel_guc.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index f82a121b0838..73c46d405dc4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -76,11 +76,23 @@ struct intel_guc {
 	 */
 	atomic_t outstanding_submission_g2h;
 
-	/** @interrupts: pointers to GuC interrupt-managing functions. */
+	/**
+	 * @tlb_lookup: TLB cache invalidation lookup table.
+	 */
 	struct xarray tlb_lookup;
+
+	/**
+	 * @serial_slot: index to the latest allocated element at the
+	 * @tlb_lookup xarray.
+	 */
 	u32 serial_slot;
+
+	/**
+	 * @next_seqno: next index to be allocated at the @tlb_lookup xarray.
+	 */
 	u32 next_seqno;
 
+	/** @interrupts: pointers to GuC interrupt-managing functions. */
 	struct {
 		void (*reset)(struct intel_guc *guc);
 		void (*enable)(struct intel_guc *guc);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 12/21] drm/i915/guc: Introduce TLB_INVALIDATION_ALL action
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (10 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 11/21] drm/i915/guc: document the TLB invalidation struct members Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 13/21] drm/i915: Invalidate the TLBs on each GT Mauro Carvalho Chehab
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Piotr Piórkowski, Alan Previn, Borislav Petkov,
	Daniel Vetter, Daniele Ceraolo Spurio, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Lucas De Marchi, Matt Roper,
	Matthew Brost, Mauro Carvalho Chehab, Michal Wajdeczko,
	Prathap Kumar Valsan, Rodrigo Vivi, Tvrtko Ursulin,
	Umesh Nerlige Ramappa, Vinay Belgaumkar, dri-devel, intel-gfx,
	linux-kernel

From: Piotr Piórkowski <piotr.piorkowski@intel.com>

Add a new way to invalidate TLB via GuC using actions 0x7002
(TLB_INVALIDATION_ALL).

Those actions will be used on upcoming patches.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c           | 14 ++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h           |  1 +
 3 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 14e35a2f8306..fb0af33e43cc 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -138,6 +138,7 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_PAGE_FAULT_NOTIFICATION = 0x6001,
 	INTEL_GUC_ACTION_TLB_INVALIDATION = 0x7000,
 	INTEL_GUC_ACTION_TLB_INVALIDATION_DONE = 0x7001,
+	INTEL_GUC_ACTION_TLB_INVALIDATION_ALL = 0x7002,
 	INTEL_GUC_ACTION_STATE_CAPTURE_NOTIFICATION = 0x8002,
 	INTEL_GUC_ACTION_NOTIFY_FLUSH_LOG_BUFFER_TO_FILE = 0x8003,
 	INTEL_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED = 0x8004,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 5c59f9b144a3..8a104a292598 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -945,6 +945,20 @@ int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
 	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
 }
 
+int intel_guc_invalidate_tlb_all(struct intel_guc *guc)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_TLB_INVALIDATION_ALL,
+		0,
+		INTEL_GUC_TLB_INVAL_MODE_HEAVY << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
+		INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
+	};
+
+	GEM_BUG_ON(!INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc));
+
+	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
+}
+
 /**
  * intel_guc_load_status - dump information about GuC load status
  * @guc: the GuC
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 73c46d405dc4..01c6478451cc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -386,6 +386,7 @@ int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value);
 
 int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
 				 enum intel_guc_tlb_inval_mode mode);
+int intel_guc_invalidate_tlb_all(struct intel_guc *guc);
 
 static inline bool intel_guc_is_supported(struct intel_guc *guc)
 {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 13/21] drm/i915: Invalidate the TLBs on each GT
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (11 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 12/21] drm/i915/guc: Introduce TLB_INVALIDATION_ALL action Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 14/21] drm/i915: document tlb field at struct drm_i915_gem_object Mauro Carvalho Chehab
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Chris Wilson, Thomas Hellström, Andrzej Hajda,
	Ayaz A Siddiqui, Daniel Vetter, Daniele Ceraolo Spurio,
	David Airlie, Jani Nikula, Jason Ekstrand, John Harrison,
	Joonas Lahtinen, Lucas De Marchi, Maarten Lankhorst, Matt Roper,
	Matthew Auld, Matthew Brost, Mauro Carvalho Chehab,
	Michael Cheng, Prathap Kumar Valsan, Ramalingam C, Rodrigo Vivi,
	Tvrtko Ursulin, Umesh Nerlige Ramappa, dri-devel, intel-gfx,
	linux-kernel, Fei Yang

From: Chris Wilson <chris.p.wilson@intel.com>

With multi-GT devices, the object may have been bound on each GT.
Invalidate the TLBs across all GT before releasing the pages
back to the system.

Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_object_types.h |  4 +++-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c        | 13 ++++++++-----
 drivers/gpu/drm/i915/gt/intel_engine.h           |  1 +
 drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h   |  3 ++-
 drivers/gpu/drm/i915/gt/intel_gt_defines.h       | 11 +++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_types.h         |  4 +++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c            |  4 ++--
 drivers/gpu/drm/i915/i915_drv.h                  |  1 -
 drivers/gpu/drm/i915/i915_vma.c                  | 14 +++++++++++---
 drivers/gpu/drm/i915/i915_vma.h                  |  2 +-
 10 files changed, 42 insertions(+), 15 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_defines.h

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 9f6b14ec189a..3c1d0b750a67 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -17,6 +17,8 @@
 #include "i915_selftest.h"
 #include "i915_vma_resource.h"
 
+#include "gt/intel_gt_defines.h"
+
 struct drm_i915_gem_object;
 struct intel_fronbuffer;
 struct intel_memory_region;
@@ -616,7 +618,7 @@ struct drm_i915_gem_object {
 		 */
 		bool dirty:1;
 
-		u32 tlb;
+		u32 tlb[I915_MAX_GT];
 	} mm;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 1cd76cc5d9f3..4a6a2f2e8148 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -194,13 +194,16 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
 static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
 {
 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
-	struct intel_gt *gt = to_gt(i915);
+	struct intel_gt *gt;
+	int id;
 
-	if (!obj->mm.tlb)
-		return;
+	for_each_gt(gt, i915, id) {
+		if (!obj->mm.tlb[id])
+			continue;
 
-	intel_gt_invalidate_tlb_full(gt, obj->mm.tlb);
-	obj->mm.tlb = 0;
+		intel_gt_invalidate_tlb_full(gt, obj->mm.tlb[id]);
+		obj->mm.tlb[id] = 0;
+	}
 }
 
 struct sg_table *
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 04e435bce79b..fe1dc55bf8f7 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -18,6 +18,7 @@
 #include "intel_gt_types.h"
 #include "intel_timeline.h"
 #include "intel_workarounds.h"
+#include "uc/intel_guc_submission.h"
 
 struct drm_printer;
 struct intel_context;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h
index 487b8a5520f1..8d41cf0c937a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.h
@@ -11,8 +11,9 @@
 #include "i915_active.h"
 #include "intel_gt_buffer_pool_types.h"
 
-struct intel_gt;
+enum i915_map_type;
 struct i915_request;
+struct intel_gt;
 
 struct intel_gt_buffer_pool_node *
 intel_gt_get_buffer_pool(struct intel_gt *gt, size_t size,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_defines.h b/drivers/gpu/drm/i915/gt/intel_gt_defines.h
new file mode 100644
index 000000000000..7c711726d663
--- /dev/null
+++ b/drivers/gpu/drm/i915/gt/intel_gt_defines.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_GT_DEFINES__
+#define __INTEL_GT_DEFINES__
+
+#define I915_MAX_GT 4
+
+#endif
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 3804a583382b..b857c3972251 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -19,7 +19,6 @@
 #include "uc/intel_uc.h"
 #include "intel_gsc.h"
 
-#include "i915_vma.h"
 #include "intel_engine_types.h"
 #include "intel_gt_buffer_pool_types.h"
 #include "intel_hwconfig.h"
@@ -31,8 +30,11 @@
 #include "intel_wakeref.h"
 #include "pxp/intel_pxp_types.h"
 
+#include "intel_gt_defines.h"
+
 struct drm_i915_private;
 struct i915_ggtt;
+struct i915_vma;
 struct intel_engine_cs;
 struct intel_uncore;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index 2da6c82a8bd2..f764d250e929 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -8,6 +8,7 @@
 #include "gem/i915_gem_lmem.h"
 
 #include "i915_trace.h"
+#include "intel_gt.h"
 #include "intel_gtt.h"
 #include "gen6_ppgtt.h"
 #include "gen8_ppgtt.h"
@@ -210,8 +211,7 @@ void ppgtt_unbind_vma(struct i915_address_space *vm,
 		return;
 
 	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
-	if (vma_res->tlb)
-		vma_invalidate_tlb(vm, *vma_res->tlb);
+	vma_invalidate_tlb(vm, vma_res->tlb);
 }
 
 static unsigned long pd_count(u64 size, int shift)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d25647be25d1..f1f70257dbe0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -711,7 +711,6 @@ struct drm_i915_private {
 	/*
 	 * i915->gt[0] == &i915->gt0
 	 */
-#define I915_MAX_GT 4
 	struct intel_gt *gt[I915_MAX_GT];
 
 	struct kobject *sysfs_gt;
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index fe947d1456d5..5edc745dcc51 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1309,8 +1309,14 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
 	return err;
 }
 
-void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb)
 {
+	struct intel_gt *gt;
+	int id;
+
+	if (!tlb)
+		return;
+
 	/*
 	 * Before we release the pages that were bound by this vma, we
 	 * must invalidate all the TLBs that may still have a reference
@@ -1319,7 +1325,9 @@ void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
 	 * the most recent TLB invalidation seqno, and if we have not yet
 	 * flushed the TLBs upon release, perform a full invalidation.
 	 */
-	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
+	for_each_gt(gt, vm->i915, id)
+		WRITE_ONCE(tlb[id],
+			   intel_gt_next_invalidate_tlb_full(vm->gt));
 }
 
 static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
@@ -1955,7 +1963,7 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 
 	if (async)
 		unbind_fence = i915_vma_resource_unbind(vma_res,
-							&vma->obj->mm.tlb);
+							vma->obj->mm.tlb);
 	else
 		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
 
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 5048eed536da..33a58f605d75 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -213,7 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
 			u64 size, u64 alignment, u64 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 void i915_vma_revoke_mmap(struct i915_vma *vma);
-void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb);
 struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
 int __i915_vma_unbind(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 14/21] drm/i915: document tlb field at struct drm_i915_gem_object
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (12 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 13/21] drm/i915: Invalidate the TLBs on each GT Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 15/21] drm/i915: Add platform macro for selective tlb flush Mauro Carvalho Chehab
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Thomas Hellström, Chris Wilson,
	Daniel Vetter, David Airlie, Jani Nikula, Joonas Lahtinen,
	Matthew Auld, Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx,
	linux-kernel

Add documentation to the TLB field inside
struct drm_i915_gem_object.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 3c1d0b750a67..6f5b9e34a4d7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -618,6 +618,7 @@ struct drm_i915_gem_object {
 		 */
 		bool dirty:1;
 
+		/** @mm.tlb: array with TLB invalidate IDs */
 		u32 tlb[I915_MAX_GT];
 	} mm;
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 15/21] drm/i915: Add platform macro for selective tlb flush
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (13 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 14/21] drm/i915: document tlb field at struct drm_i915_gem_object Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 16/21] drm/i915: Define GuC Based TLB invalidation routines Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Prathap Kumar Valsan, Daniel Vetter, David Airlie, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, dri-devel,
	intel-gfx, linux-kernel, Niranjana Vishwanathapura,
	Mauro Carvalho Chehab

From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>

Add support for selective TLB invalidation, which is a platform
feature supported on XeHP.

Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_drv.h          | 3 +++
 drivers/gpu/drm/i915/i915_pci.c          | 1 +
 drivers/gpu/drm/i915/intel_device_info.h | 1 +
 3 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f1f70257dbe0..73494960a3a8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1312,6 +1312,9 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 #define HAS_GT_UC(dev_priv)	(INTEL_INFO(dev_priv)->has_gt_uc)
 
+#define HAS_SELECTIVE_TLB_INVALIDATION(dev_priv) \
+	(INTEL_INFO(dev_priv)->has_selective_tlb_invalidation)
+
 #define HAS_POOLED_EU(dev_priv)	(INTEL_INFO(dev_priv)->has_pooled_eu)
 
 #define HAS_GLOBAL_MOCS_REGISTERS(dev_priv)	(INTEL_INFO(dev_priv)->has_global_mocs)
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index aacc10f2e73f..30d945fe384b 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1022,6 +1022,7 @@ static const struct intel_device_info adl_p_info = {
 	.has_reset_engine = 1, \
 	.has_rps = 1, \
 	.has_runtime_pm = 1, \
+	.has_selective_tlb_invalidation = 1, \
 	.ppgtt_size = 48, \
 	.ppgtt_type = INTEL_PPGTT_FULL
 
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 23bf230aa104..92a38b8f7c47 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -170,6 +170,7 @@ enum intel_ppgtt_type {
 	func(has_rc6p); \
 	func(has_rps); \
 	func(has_runtime_pm); \
+	func(has_selective_tlb_invalidation); \
 	func(has_snoop); \
 	func(has_coherent_ggtt); \
 	func(unfenced_needs_alignment); \
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 16/21] drm/i915: Define GuC Based TLB invalidation routines
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (14 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 15/21] drm/i915: Add platform macro for selective tlb flush Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 15:20   ` Michal Wajdeczko
  2022-07-14 12:06 ` [PATCH v2 17/21] drm/i915: Add generic interface for tlb invalidation for XeHP Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Prathap Kumar Valsan, Alan Previn, Borislav Petkov,
	Daniel Vetter, Daniele Ceraolo Spurio, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Lucas De Marchi, Matt Roper,
	Matthew Brost, Mauro Carvalho Chehab, Michal Wajdeczko,
	Rodrigo Vivi, Tvrtko Ursulin, Umesh Nerlige Ramappa,
	Vinay Belgaumkar, dri-devel, intel-gfx, linux-kernel

From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>

Add routines to interface with GuC firmware for selective TLB invalidation
supported on XeHP.

Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  3 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 90 +++++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 10 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  3 +
 4 files changed, 106 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index fb0af33e43cc..5c019856a269 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -188,6 +188,9 @@ enum intel_guc_state_capture_event_status {
 #define INTEL_GUC_TLB_INVAL_FLUSH_CACHE (1 << 31)
 
 enum intel_guc_tlb_invalidation_type {
+	INTEL_GUC_TLB_INVAL_FULL = 0x0,
+	INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE = 0x1,
+	INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX = 0x2,
 	INTEL_GUC_TLB_INVAL_GUC = 0x3,
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 8a104a292598..98260a7bc90b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -923,6 +923,96 @@ static int guc_send_invalidate_tlb(struct intel_guc *guc, u32 *action, u32 size)
 	return err;
 }
 
+ /* Full TLB invalidation */
+int intel_guc_invalidate_tlb_full(struct intel_guc *guc,
+				  enum intel_guc_tlb_inval_mode mode)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_TLB_INVALIDATION,
+		0,
+		INTEL_GUC_TLB_INVAL_FULL << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
+			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
+			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
+	};
+
+	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc)) {
+		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");
+		return 0;
+	}
+
+	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
+}
+
+/*
+ * Selective TLB Invalidation for Address Range:
+ * TLB's in the Address Range is Invalidated across all engines.
+ */
+int intel_guc_invalidate_tlb_page_selective(struct intel_guc *guc,
+					    enum intel_guc_tlb_inval_mode mode,
+					    u64 start, u64 length)
+{
+	u64 vm_total = BIT_ULL(INTEL_INFO(guc_to_gt(guc)->i915)->ppgtt_size);
+	u32 address_mask = (ilog2(length) - ilog2(I915_GTT_PAGE_SIZE_4K));
+	u32 full_range = vm_total == length;
+	u32 action[] = {
+		INTEL_GUC_ACTION_TLB_INVALIDATION,
+		0,
+		INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
+			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
+			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
+		0,
+		full_range ? full_range : lower_32_bits(start),
+		full_range ? 0 : upper_32_bits(start),
+		full_range ? 0 : address_mask,
+	};
+
+	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION_SELECTIVE(guc)) {
+		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");
+		return 0;
+	}
+
+	GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE_4K));
+	GEM_BUG_ON(!IS_ALIGNED(length, I915_GTT_PAGE_SIZE_4K));
+	GEM_BUG_ON(range_overflows(start, length, vm_total));
+
+	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
+}
+
+/*
+ * Selective TLB Invalidation for Context:
+ * Invalidates all TLB's for a specific context across all engines.
+ */
+int intel_guc_invalidate_tlb_page_selective_ctx(struct intel_guc *guc,
+						enum intel_guc_tlb_inval_mode mode,
+						u64 start, u64 length, u32 ctxid)
+{
+	u64 vm_total = BIT_ULL(INTEL_INFO(guc_to_gt(guc)->i915)->ppgtt_size);
+	u32 address_mask = (ilog2(length) - ilog2(I915_GTT_PAGE_SIZE_4K));
+	u32 full_range = vm_total == length;
+	u32 action[] = {
+		INTEL_GUC_ACTION_TLB_INVALIDATION,
+		0,
+		INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
+			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
+			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
+		ctxid,
+		full_range ? full_range : lower_32_bits(start),
+		full_range ? 0 : upper_32_bits(start),
+		full_range ? 0 : address_mask,
+	};
+
+	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION_SELECTIVE(guc)) {
+		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");
+		return 0;
+	}
+
+	GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE_4K));
+	GEM_BUG_ON(!IS_ALIGNED(length, I915_GTT_PAGE_SIZE_4K));
+	GEM_BUG_ON(range_overflows(start, length, vm_total));
+
+	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
+}
+
 /*
  * Guc TLB Invalidation: Invalidate the TLB's of GuC itself.
  */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 01c6478451cc..df6ba1c32808 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -384,6 +384,16 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size,
 int intel_guc_self_cfg32(struct intel_guc *guc, u16 key, u32 value);
 int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value);
 
+int intel_guc_g2g_register(struct intel_guc *guc);
+
+int intel_guc_invalidate_tlb_full(struct intel_guc *guc,
+				  enum intel_guc_tlb_inval_mode mode);
+int intel_guc_invalidate_tlb_page_selective(struct intel_guc *guc,
+					    enum intel_guc_tlb_inval_mode mode,
+					    u64 start, u64 length);
+int intel_guc_invalidate_tlb_page_selective_ctx(struct intel_guc *guc,
+						  enum intel_guc_tlb_inval_mode mode,
+						  u64 start, u64 length, u32 ctxid);
 int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
 				 enum intel_guc_tlb_inval_mode mode);
 int intel_guc_invalidate_tlb_all(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 3edf567b3f65..29e402f70a94 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -436,5 +436,8 @@ enum intel_guc_recv_message {
 	((intel_guc_ct_enabled(&(guc)->ct)) && \
 	 (intel_guc_submission_is_used(guc)) && \
 	 (GRAPHICS_VER(guc_to_gt((guc))->i915) >= 12))
+#define INTEL_GUC_SUPPORTS_TLB_INVALIDATION_SELECTIVE(guc) \
+	(INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc) && \
+	HAS_SELECTIVE_TLB_INVALIDATION(guc_to_gt(guc)->i915))
 
 #endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 17/21] drm/i915: Add generic interface for tlb invalidation for XeHP
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (15 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 16/21] drm/i915: Define GuC Based TLB invalidation routines Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 18/21] drm/i915: Use selective tlb invalidations where supported Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Prathap Kumar Valsan, Ashutosh Dixit, Chris Wilson,
	Daniel Vetter, Daniele Ceraolo Spurio, David Airlie, Jani Nikula,
	Joonas Lahtinen, Lucas De Marchi, Matt Atwood, Matt Roper,
	Mauro Carvalho Chehab, Rodrigo Vivi, Tvrtko Ursulin, dri-devel,
	intel-gfx, linux-kernel, Niranjana Vishwanathapura, Fei Yang

From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>

Add an interface for GuC TLB actions, supporting both selective and
full TLB invalidations. After this change, when GuC is enabled,
tlb invalidations use GuC ct. Otherwise, use mmio interface.

Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt_regs.h |  8 +++
 drivers/gpu/drm/i915/gt/intel_tlb.c     | 78 ++++++++++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_tlb.h     |  1 +
 3 files changed, 86 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_regs.h b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
index 60d6eb5f245b..52508a9c23e5 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_regs.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_regs.h
@@ -1054,6 +1054,14 @@
 
 #define GEN12_GAM_DONE				_MMIO(0xcf68)
 
+#define XEHP_TLB_INV_DESC0			_MMIO(0xcf7c)
+#define   XEHP_TLB_INV_DESC0_ADDR_LO		REG_GENMASK(31, 12)
+#define   XEHP_TLB_INV_DESC0_ADDR_MASK		REG_GENMASK(8, 3)
+#define   XEHP_TLB_INV_DESC0_G			REG_GENMASK(2, 1)
+#define   XEHP_TLB_INV_DESC0_VALID		REG_BIT(0)
+#define XEHP_TLB_INV_DESC1			_MMIO(0xcf80)
+#define   XEHP_TLB_INV_DESC0_ADDR_HI		REG_GENMASK(31, 0)
+
 #define GEN7_HALF_SLICE_CHICKEN1		_MMIO(0xe100) /* IVB GT1 + VLV */
 #define   GEN7_MAX_PS_THREAD_DEP		(8 << 12)
 #define   GEN7_SINGLE_SUBSCAN_DISPATCH_ENABLE	(1 << 10)
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
index af8cae979489..15ed83226676 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -10,6 +10,7 @@
 #include "intel_gt_pm.h"
 #include "intel_gt_regs.h"
 #include "intel_tlb.h"
+#include "uc/intel_guc.h"
 
 struct reg_and_bit {
 	i915_reg_t reg;
@@ -159,11 +160,16 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
 		return;
 
 	with_intel_gt_pm_if_awake(gt, wakeref) {
+		struct intel_guc *guc = &gt->uc.guc;
+
 		mutex_lock(&gt->tlb.invalidate_lock);
 		if (tlb_seqno_passed(gt, seqno))
 			goto unlock;
 
-		mmio_invalidate_full(gt);
+		if (INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc))
+			intel_guc_invalidate_tlb_full(guc, INTEL_GUC_TLB_INVAL_MODE_HEAVY);
+		else
+			mmio_invalidate_full(gt);
 
 		write_seqcount_invalidate(&gt->tlb.seqno);
 unlock:
@@ -171,6 +177,76 @@ void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
 	}
 }
 
+static bool mmio_invalidate_range(struct intel_gt *gt, u64 start, u64 length)
+{
+	u32 address_mask = (ilog2(length) - ilog2(I915_GTT_PAGE_SIZE_4K));
+	u64 vm_total = BIT_ULL(INTEL_INFO(gt->i915)->ppgtt_size);
+	intel_wakeref_t wakeref;
+	u32 dw0, dw1;
+	int err;
+
+	GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE_4K));
+	GEM_BUG_ON(!IS_ALIGNED(length, I915_GTT_PAGE_SIZE_4K));
+	GEM_BUG_ON(range_overflows(start, length, vm_total));
+
+	dw0 = FIELD_PREP(XEHP_TLB_INV_DESC0_ADDR_LO, (lower_32_bits(start) >> 12)) |
+		FIELD_PREP(XEHP_TLB_INV_DESC0_ADDR_MASK, address_mask) |
+		FIELD_PREP(XEHP_TLB_INV_DESC0_G, 0x3) |
+		FIELD_PREP(XEHP_TLB_INV_DESC0_VALID, 0x1);
+	dw1 = upper_32_bits(start);
+
+	err = 0;
+	with_intel_gt_pm_if_awake(gt, wakeref) {
+		struct intel_uncore *uncore = gt->uncore;
+
+		intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
+
+		mutex_lock(&gt->tlb.invalidate_lock);
+		intel_uncore_write_fw(uncore, XEHP_TLB_INV_DESC1, dw1);
+		intel_uncore_write_fw(uncore, XEHP_TLB_INV_DESC0, dw0);
+		err = __intel_wait_for_register_fw(uncore,
+						   XEHP_TLB_INV_DESC0,
+						   XEHP_TLB_INV_DESC0_VALID,
+						   0, 100, 10, NULL);
+		mutex_unlock(&gt->tlb.invalidate_lock);
+
+		intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
+	}
+
+	if (err)
+		drm_err_ratelimited(&gt->i915->drm,
+				    "TLB invalidation response timed out\n");
+
+	return err == 0;
+}
+
+bool intel_gt_invalidate_tlb_range(struct intel_gt *gt,
+				   u64 start, u64 length)
+{
+	struct intel_guc *guc = &gt->uc.guc;
+	intel_wakeref_t wakeref;
+
+	if (intel_gt_is_wedged(gt))
+		return true;
+
+	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION_SELECTIVE(guc))
+		return false;
+
+	/*XXX: We are seeing timeouts on guc based tlb invalidations on XEHPSDV.
+	 * Until we have a fix, use mmio
+	 */
+	if (IS_XEHPSDV(gt->i915))
+		return mmio_invalidate_range(gt, start, length);
+
+	with_intel_gt_pm_if_awake(gt, wakeref) {
+		intel_guc_invalidate_tlb_page_selective(guc,
+							INTEL_GUC_TLB_INVAL_MODE_HEAVY,
+							start, length);
+	}
+
+	return true;
+}
+
 void intel_gt_init_tlb(struct intel_gt *gt)
 {
 	mutex_init(&gt->tlb.invalidate_lock);
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
index 46ce25bf5afe..32cc79b1d8a4 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -12,6 +12,7 @@
 #include "intel_gt_types.h"
 
 void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
+bool intel_gt_invalidate_tlb_range(struct intel_gt *gt, u64 start, u64 length);
 
 void intel_gt_init_tlb(struct intel_gt *gt);
 void intel_gt_fini_tlb(struct intel_gt *gt);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 18/21] drm/i915: Use selective tlb invalidations where supported
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (16 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 17/21] drm/i915: Add generic interface for tlb invalidation for XeHP Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 19/21] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Prathap Kumar Valsan, Thomas Hellström, Chris Wilson,
	Daniel Vetter, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Maarten Lankhorst, Matt Roper, Matthew Auld,
	Mauro Carvalho Chehab, Michael Cheng, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel,
	Niranjana Vishwanathapura, Fei Yang

From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>

For platforms supporting selective tlb invalidations, we don't need to
do a full tlb invalidation. Rather do a range based tlb invalidation for
every unbind of purged vma belongs to an active vm.

[mchehab: change moved from intel_ppgtt.c to i915_vma.c]
Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  2 +-
 drivers/gpu/drm/i915/i915_vma.c       | 14 +++++++++-----
 drivers/gpu/drm/i915/i915_vma.h       |  3 ++-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index f764d250e929..74782fb2ccbd 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -211,7 +211,7 @@ void ppgtt_unbind_vma(struct i915_address_space *vm,
 		return;
 
 	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
-	vma_invalidate_tlb(vm, vma_res->tlb);
+	vma_invalidate_tlb(vm, vma_res->tlb, vma_res->start, vma_res->vma_size);
 }
 
 static unsigned long pd_count(u64 size, int shift)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 5edc745dcc51..6d881a6b403a 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1309,7 +1309,8 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
 	return err;
 }
 
-void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb)
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb,
+			u64 start, u64 size)
 {
 	struct intel_gt *gt;
 	int id;
@@ -1325,9 +1326,11 @@ void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb)
 	 * the most recent TLB invalidation seqno, and if we have not yet
 	 * flushed the TLBs upon release, perform a full invalidation.
 	 */
-	for_each_gt(gt, vm->i915, id)
-		WRITE_ONCE(tlb[id],
-			   intel_gt_next_invalidate_tlb_full(vm->gt));
+	for_each_gt(gt, vm->i915, id) {
+		if (!intel_gt_invalidate_tlb_range(gt, start, size))
+			WRITE_ONCE(tlb[id],
+				   intel_gt_next_invalidate_tlb_full(vm->gt));
+	}
 }
 
 static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
@@ -1980,7 +1983,8 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 			dma_fence_put(unbind_fence);
 			unbind_fence = NULL;
 		}
-		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
+		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb,
+				   vma->node.start, vma->size);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 33a58f605d75..3f0af9595e59 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -213,7 +213,8 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
 			u64 size, u64 alignment, u64 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 void i915_vma_revoke_mmap(struct i915_vma *vma);
-void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb);
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 *tlb,
+			u64 start, u64 size);
 struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
 int __i915_vma_unbind(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 19/21] drm/i915/gt: document TLB cache invalidation functions
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (17 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 18/21] drm/i915: Use selective tlb invalidations where supported Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 20/21] drm/i915/guc: describe enum intel_guc_tlb_invalidation_type Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 21/21] drm/i915/guc: document TLB cache invalidation functions Mauro Carvalho Chehab
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Chris Wilson, Daniel Vetter, David Airlie,
	Jani Nikula, Joonas Lahtinen, Prathap Kumar Valsan, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel

Add a description for the kAPI functions inside intel_tlb.c.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_tlb.c | 36 +++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
index 15ed83226676..aa2e0086ae88 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -146,6 +146,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
 }
 
+/**
+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
+ * @gt: GT structure
+ * @seqno: sequence number
+ *
+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
+ * full TLB cache invalidation.
+ *
+ * Note:
+ * The TLB cache invalidation logic depends on GEN-specific registers.
+ * It currently supports GEN8 to GEN12 and GuC-based TLB cache invalidation.
+ */
 void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
 {
 	intel_wakeref_t wakeref;
@@ -220,6 +232,17 @@ static bool mmio_invalidate_range(struct intel_gt *gt, u64 start, u64 length)
 	return err == 0;
 }
 
+/**
+ * intel_gt_invalidate_tlb_range - do full TLB cache invalidation
+ * @gt: GT structure
+ * @start: range start
+ * @length: range length
+ *
+ * Do a selected TLB cache invalidation on a range pointed by @start
+ * with @length size.
+ *
+ * Only some GuC-based GPUs can do a selective cache invalidation.
+ */
 bool intel_gt_invalidate_tlb_range(struct intel_gt *gt,
 				   u64 start, u64 length)
 {
@@ -247,12 +270,25 @@ bool intel_gt_invalidate_tlb_range(struct intel_gt *gt,
 	return true;
 }
 
+/**
+ * intel_gt_init_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * TLB cache invalidation logic internally uses some resources that require
+ * initialization. Should be called before doing any TLB cache invalidation.
+ */
 void intel_gt_init_tlb(struct intel_gt *gt)
 {
 	mutex_init(&gt->tlb.invalidate_lock);
 	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
 }
 
+/**
+ * intel_gt_fini_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * Frees any resources needed by TLB cache invalidation logic.
+ */
 void intel_gt_fini_tlb(struct intel_gt *gt)
 {
 	mutex_destroy(&gt->tlb.invalidate_lock);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 20/21] drm/i915/guc: describe enum intel_guc_tlb_invalidation_type
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (18 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 19/21] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  2022-07-14 12:06 ` [PATCH v2 21/21] drm/i915/guc: document TLB cache invalidation functions Mauro Carvalho Chehab
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Alan Previn, Borislav Petkov,
	Daniel Vetter, David Airlie, Jani Nikula, John Harrison,
	Joonas Lahtinen, Matthew Brost, Michal Wajdeczko,
	Prathap Kumar Valsan, Rodrigo Vivi, Tvrtko Ursulin,
	Vinay Belgaumkar, dri-devel, intel-gfx, linux-kernel

Add a description for intel_guc_tlb_invalidation_type enum.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 5c019856a269..e97065c62d28 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -187,6 +187,18 @@ enum intel_guc_state_capture_event_status {
 /* Flush PPC or SMRO caches along with TLB invalidation request */
 #define INTEL_GUC_TLB_INVAL_FLUSH_CACHE (1 << 31)
 
+/**
+ * enum intel_guc_tlb_invalidation_type - type of TLB cache invalidation
+ *
+ * @INTEL_GUC_TLB_INVAL_FULL:
+ *	Global TLB invalidation
+ * @INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE:
+ *	Page-selective TLB cache invalidation
+ * @INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX:
+ *	Context-selective TLB cache invalidation
+ * @INTEL_GUC_TLB_INVAL_GUC:
+ *	Invalidate TLB on GuC itself
+ */
 enum intel_guc_tlb_invalidation_type {
 	INTEL_GUC_TLB_INVAL_FULL = 0x0,
 	INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE = 0x1,
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH v2 21/21] drm/i915/guc: document TLB cache invalidation functions
  2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
                   ` (19 preceding siblings ...)
  2022-07-14 12:06 ` [PATCH v2 20/21] drm/i915/guc: describe enum intel_guc_tlb_invalidation_type Mauro Carvalho Chehab
@ 2022-07-14 12:06 ` Mauro Carvalho Chehab
  20 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-14 12:06 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Daniel Vetter, Daniele Ceraolo Spurio,
	David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen,
	Matthew Brost, Michal Wajdeczko, Prathap Kumar Valsan,
	Rodrigo Vivi, Tvrtko Ursulin, Umesh Nerlige Ramappa,
	Vinay Belgaumkar, dri-devel, intel-gfx, linux-kernel

Add documentation for the kAPI functions that do TLB cache
invalidation via GuC.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/uc/intel_guc.c | 52 ++++++++++++++++++++++----
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 98260a7bc90b..173833bc3a62 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -923,7 +923,14 @@ static int guc_send_invalidate_tlb(struct intel_guc *guc, u32 *action, u32 size)
 	return err;
 }
 
- /* Full TLB invalidation */
+/**
+ * intel_guc_invalidate_tlb_full - GuC full TLB invalidation
+ *
+ * @guc: the guc
+ * @mode: mode of TLB cache invalidation (heavy or lite)
+ *
+ * Use GuC to do a full TLB cache invalidation if supported.
+ */
 int intel_guc_invalidate_tlb_full(struct intel_guc *guc,
 				  enum intel_guc_tlb_inval_mode mode)
 {
@@ -943,8 +950,17 @@ int intel_guc_invalidate_tlb_full(struct intel_guc *guc,
 	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
 }
 
-/*
- * Selective TLB Invalidation for Address Range:
+/**
+ * intel_guc_invalidate_tlb_page_selective - GuC selective TLB invalidation
+ *	for an address range
+ *
+ * @guc: the guc
+ * @mode: mode of TLB cache invalidation (heavy or lite)
+ * @start: range start
+ * @length: range length
+ *
+ * Use GuC to do a selective TLB invalidation if supported.
+ *
  * TLB's in the Address Range is Invalidated across all engines.
  */
 int intel_guc_invalidate_tlb_page_selective(struct intel_guc *guc,
@@ -978,8 +994,18 @@ int intel_guc_invalidate_tlb_page_selective(struct intel_guc *guc,
 	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
 }
 
-/*
- * Selective TLB Invalidation for Context:
+/**
+ * intel_guc_invalidate_tlb_page_selective_ctx - GuC selective TLB
+ *	invalidation for a context
+ *
+ * @guc: the guc
+ * @mode: mode of TLB cache invalidation (heavy or lite)
+ * @start: range start
+ * @length: range length
+ * @ctxid: context ID
+ *
+ * Use GuC to do a selective TLB invalidation on a context if supported.
+ *
  * Invalidates all TLB's for a specific context across all engines.
  */
 int intel_guc_invalidate_tlb_page_selective_ctx(struct intel_guc *guc,
@@ -1013,8 +1039,13 @@ int intel_guc_invalidate_tlb_page_selective_ctx(struct intel_guc *guc,
 	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
 }
 
-/*
- * Guc TLB Invalidation: Invalidate the TLB's of GuC itself.
+/**
+ * intel_guc_invalidate_tlb_guc - GuC self TLB invalidation
+ *
+ * @guc: the guc
+ * @mode: mode of TLB cache invalidation (heavy or lite)
+ *
+ * Use GuC to invalidate the TLB's of GuC itself.
  */
 int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
 				 enum intel_guc_tlb_inval_mode mode)
@@ -1035,6 +1066,13 @@ int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
 	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
 }
 
+/**
+ * intel_guc_invalidate_tlb_all - GuC global TLB invalidation
+ *
+ * @guc: the guc
+ *
+ * Use GuC to do a complete TLB invalidation on all tables
+ */
 int intel_guc_invalidate_tlb_all(struct intel_guc *guc)
 {
 	u32 action[] = {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 09/21] drm/i915/guc: Define CTB based TLB invalidation routines
  2022-07-14 12:06 ` [PATCH v2 09/21] drm/i915/guc: Define CTB based TLB invalidation routines Mauro Carvalho Chehab
@ 2022-07-14 14:06   ` Michal Wajdeczko
  2022-08-02  7:48     ` [Intel-gfx] " Mauro Carvalho Chehab
  0 siblings, 1 reply; 52+ messages in thread
From: Michal Wajdeczko @ 2022-07-14 14:06 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Prathap Kumar Valsan, Alan Previn, Borislav Petkov,
	Daniel Vetter, Daniele Ceraolo Spurio, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Lucas De Marchi, Matt Roper,
	Matthew Brost, Rodrigo Vivi, Tvrtko Ursulin,
	Umesh Nerlige Ramappa, Vinay Belgaumkar, dri-devel, intel-gfx,
	linux-kernel, Bruce Chang, Chris Wilson



On 14.07.2022 14:06, Mauro Carvalho Chehab wrote:
> From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> 
> Add routines to interface with GuC firmware for TLB invalidation.
> 
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Cc: Bruce Chang <yu.bruce.chang@intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Chris Wilson <chris.p.wilson@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 35 +++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 90 ++++++++++++++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 13 +++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 24 ++++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  6 ++
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
>  6 files changed, 253 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> index 4ef9990ed7f8..2e39d8df4c82 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> @@ -134,6 +134,10 @@ enum intel_guc_action {
>  	INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
>  	INTEL_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
>  	INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
> +	INTEL_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR = 0x6000,

should this be part of this patch ?

> +	INTEL_GUC_ACTION_PAGE_FAULT_NOTIFICATION = 0x6001,
> +	INTEL_GUC_ACTION_TLB_INVALIDATION = 0x7000,
> +	INTEL_GUC_ACTION_TLB_INVALIDATION_DONE = 0x7001,

can we document layout of these actions ?

>  	INTEL_GUC_ACTION_STATE_CAPTURE_NOTIFICATION = 0x8002,
>  	INTEL_GUC_ACTION_NOTIFY_FLUSH_LOG_BUFFER_TO_FILE = 0x8003,
>  	INTEL_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED = 0x8004,
> @@ -177,4 +181,35 @@ enum intel_guc_state_capture_event_status {
>  
>  #define INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_MASK      0x000000FF
>  
> +#define INTEL_GUC_TLB_INVAL_TYPE_SHIFT 0
> +#define INTEL_GUC_TLB_INVAL_MODE_SHIFT 8

can we stop using SHIFT-based definitions and start using MASK-based
instead ? then we will be able to use FIELD_PREP/GET like we do for i915_reg

> +/* Flush PPC or SMRO caches along with TLB invalidation request */
> +#define INTEL_GUC_TLB_INVAL_FLUSH_CACHE (1 << 31)
> +
> +enum intel_guc_tlb_invalidation_type {
> +	INTEL_GUC_TLB_INVAL_GUC = 0x3,
> +};
> +
> +/*
> + * 0: Heavy mode of Invalidation:
> + * The pipeline of the engine(s) for which the invalidation is targeted to is
> + * blocked, and all the in-flight transactions are guaranteed to be Globally
> + * Observed before completing the TLB invalidation
> + * 1: Lite mode of Invalidation:
> + * TLBs of the targeted engine(s) are immediately invalidated.
> + * In-flight transactions are NOT guaranteed to be Globally Observed before
> + * completing TLB invalidation.
> + * Light Invalidation Mode is to be used only when
> + * it can be guaranteed (by SW) that the address translations remain invariant
> + * for the in-flight transactions across the TLB invalidation. In other words,
> + * this mode can be used when the TLB invalidation is intended to clear out the
> + * stale cached translations that are no longer in use. Light Invalidation Mode
> + * is much faster than the Heavy Invalidation Mode, as it does not wait for the
> + * in-flight transactions to be GOd.
> + */

either drop this comment or squash with patch 10/21 to fix it

> +enum intel_guc_tlb_inval_mode {
> +	INTEL_GUC_TLB_INVAL_MODE_HEAVY = 0x0,
> +	INTEL_GUC_TLB_INVAL_MODE_LITE = 0x1,
> +};
> +
>  #endif /* _ABI_GUC_ACTIONS_ABI_H */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 2706a8c65090..5c59f9b144a3 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -855,6 +855,96 @@ int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value)
>  	return __guc_self_cfg(guc, key, 2, value);
>  }
>  
> +static int guc_send_invalidate_tlb(struct intel_guc *guc, u32 *action, u32 size)

nit: maybe since MMIO TLB has moved to dedicated file, we can do the
same with GUC TLB code like "intel_guc_tlb.c" ?

> +{
> +	struct intel_guc_tlb_wait _wq, *wq = &_wq;
> +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
> +	int err = 0;
> +	u32 seqno;
> +
> +	init_waitqueue_head(&_wq.wq);
> +
> +	if (xa_alloc_cyclic_irq(&guc->tlb_lookup, &seqno, wq,
> +				xa_limit_32b, &guc->next_seqno,
> +				GFP_ATOMIC | __GFP_NOWARN) < 0) {
> +		/* Under severe memory pressure? Serialise TLB allocations */
> +		xa_lock_irq(&guc->tlb_lookup);
> +		wq = xa_load(&guc->tlb_lookup, guc->serial_slot);
> +		wait_event_lock_irq(wq->wq,
> +				    !READ_ONCE(wq->status),
> +				    guc->tlb_lookup.xa_lock);
> +		/*
> +		 * Update wq->status under lock to ensure only one waiter can
> +		 * issue the tlb invalidation command using the serial slot at a
> +		 * time. The condition is set to false before releasing the lock
> +		 * so that other caller continue to wait until woken up again.
> +		 */
> +		wq->status = 1;
> +		xa_unlock_irq(&guc->tlb_lookup);
> +
> +		seqno = guc->serial_slot;
> +	}
> +
> +	action[1] = seqno;

it's sad that we need to update in blind this action message

if you don't want to expose seqno allocation in a helper function that
each caller would use, then maybe assert that this action message is
expected one


> +
> +	add_wait_queue(&wq->wq, &wait);
> +
> +	err = intel_guc_send_busy_loop(guc, action, size, G2H_LEN_DW_INVALIDATE_TLB, true);
> +	if (err) {
> +		/*
> +		 * XXX: Failure of tlb invalidation is critical and would

s/tlb/TLB

> +		 * warrant a gt reset.
> +		 */
> +		goto out;
> +	}
> +/*
> + * GuC has a timeout of 1ms for a tlb invalidation response from GAM. On a

ditto

> + * timeout GuC drops the request and has no mechanism to notify the host about
> + * the timeout. So keep a larger timeout that accounts for this individual
> + * timeout and max number of outstanding invalidation requests that can be
> + * queued in CT buffer.
> + */
> +#define OUTSTANDING_GUC_TIMEOUT_PERIOD  (HZ)
> +	if (!wait_woken(&wait, TASK_UNINTERRUPTIBLE,

IIRC there was some discussion if we can rely on this in our scenario
can you sync with Chris on that?

> +			OUTSTANDING_GUC_TIMEOUT_PERIOD)) {
> +		/*
> +		 * XXX: Failure of tlb invalidation is critical and would

s/tlb/TLB

> +		 * warrant a gt reset.
> +		 */
> +		drm_err(&guc_to_gt(guc)->i915->drm,
> +			 "tlb invalidation response timed out for seqno %u\n", seqno);

s/tlb/TLB

btw, should we care here about G2H_LEN_DW_INVALIDATE_TLB space that we
reserved in send_busy_loop() ?

> +		err = -ETIME;
> +	}
> +out:
> +	remove_wait_queue(&wq->wq, &wait);
> +	if (seqno != guc->serial_slot)
> +		xa_erase_irq(&guc->tlb_lookup, seqno);
> +
> +	return err;
> +}
> +
> +/*
> + * Guc TLB Invalidation: Invalidate the TLB's of GuC itself.
> + */
> +int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
> +				 enum intel_guc_tlb_inval_mode mode)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_TLB_INVALIDATION,
> +		0,
> +		INTEL_GUC_TLB_INVAL_GUC << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
> +			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
> +			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
> +	};
> +
> +	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc)) {
> +		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");

you should use drm_err() instead

but wondering if maybe this should be treated as a coding error (and
then use GEM_BUG/WARN_ON instead) but then not sure how to interpret the
check for the intel_guc_ct_enabled() embedded in above macro ...
note that intel_guc_ct_send() will return -ENODEV if CTB is down

> +		return 0;
> +	}
> +
> +	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
> +}
> +
>  /**
>   * intel_guc_load_status - dump information about GuC load status
>   * @guc: the GuC
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index d0d99f178f2d..f82a121b0838 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -77,6 +77,10 @@ struct intel_guc {
>  	atomic_t outstanding_submission_g2h;
>  
>  	/** @interrupts: pointers to GuC interrupt-managing functions. */
> +	struct xarray tlb_lookup;
> +	u32 serial_slot;
> +	u32 next_seqno;

wrong place - above kernel-doc is for the struct below

> +
>  	struct {
>  		void (*reset)(struct intel_guc *guc);
>  		void (*enable)(struct intel_guc *guc);
> @@ -248,6 +252,11 @@ struct intel_guc {
>  #endif
>  };
>  
> +struct intel_guc_tlb_wait {
> +	struct wait_queue_head wq;
> +	u8 status;
> +} __aligned(4);
> +
>  static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
>  {
>  	return container_of(log, struct intel_guc, log);
> @@ -363,6 +372,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size,
>  int intel_guc_self_cfg32(struct intel_guc *guc, u16 key, u32 value);
>  int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value);
>  
> +int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
> +				 enum intel_guc_tlb_inval_mode mode);
> +
>  static inline bool intel_guc_is_supported(struct intel_guc *guc)
>  {
>  	return intel_uc_fw_is_supported(&guc->fw);
> @@ -440,6 +452,7 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
>  					 const u32 *msg, u32 len);
>  int intel_guc_error_capture_process_msg(struct intel_guc *guc,
>  					const u32 *msg, u32 len);
> +void intel_guc_tlb_invalidation_done(struct intel_guc *guc, u32 seqno);
>  
>  struct intel_engine_cs *
>  intel_guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index f01325cd1b62..c1ce542b7855 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -1023,7 +1023,7 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>  	return 0;
>  }
>  
> -static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
> +static bool ct_process_incoming_requests(struct intel_guc_ct *ct, struct list_head *incoming)
>  {
>  	unsigned long flags;
>  	struct ct_incoming_msg *request;
> @@ -1031,11 +1031,11 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
>  	int err;
>  
>  	spin_lock_irqsave(&ct->requests.lock, flags);
> -	request = list_first_entry_or_null(&ct->requests.incoming,
> +	request = list_first_entry_or_null(incoming,
>  					   struct ct_incoming_msg, link);
>  	if (request)
>  		list_del(&request->link);
> -	done = !!list_empty(&ct->requests.incoming);
> +	done = !!list_empty(incoming);
>  	spin_unlock_irqrestore(&ct->requests.lock, flags);
>  
>  	if (!request)
> @@ -1058,7 +1058,7 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
>  	bool done;
>  
>  	do {
> -		done = ct_process_incoming_requests(ct);
> +		done = ct_process_incoming_requests(ct, &ct->requests.incoming);
>  	} while (!done);
>  }
>  
> @@ -1078,14 +1078,30 @@ static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *requ
>  	switch (action) {
>  	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
>  	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> +	case INTEL_GUC_ACTION_TLB_INVALIDATION_DONE:
>  		g2h_release_space(ct, request->size);
>  	}
> +	/* Handle tlb invalidation response in interrupt context */

since it breaks layering, can you add more comments why this is done in
such way ?

> +	if (action == INTEL_GUC_ACTION_TLB_INVALIDATION_DONE) {
> +		const u32 *payload;
> +		u32 hxg_len, len;
> +
> +		hxg_len = request->size - GUC_CTB_MSG_MIN_LEN;
> +		len = hxg_len - GUC_HXG_MSG_MIN_LEN;
> +		if (unlikely(len < 1))
> +			return -EPROTO;
> +		payload = &hxg[GUC_HXG_MSG_MIN_LEN];

if we still need to handle this at this level, can we at least move this
message decomposition to the handler (in other words: just pass hxg
pointer instead of single dword payload)

> +		intel_guc_tlb_invalidation_done(ct_to_guc(ct),  payload[0]);
> +		ct_free_msg(request);
> +		return 0;
> +	}
>  
>  	spin_lock_irqsave(&ct->requests.lock, flags);
>  	list_add_tail(&request->link, &ct->requests.incoming);
>  	spin_unlock_irqrestore(&ct->requests.lock, flags);
>  
>  	queue_work(system_unbound_wq, &ct->requests.worker);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index b3c9a9327f76..3edf567b3f65 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -22,6 +22,7 @@
>  /* Payload length only i.e. don't include G2H header length */
>  #define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET	2
>  #define G2H_LEN_DW_DEREGISTER_CONTEXT		1
> +#define G2H_LEN_DW_INVALIDATE_TLB		1
>  
>  #define GUC_CONTEXT_DISABLE		0
>  #define GUC_CONTEXT_ENABLE		1
> @@ -431,4 +432,9 @@ enum intel_guc_recv_message {
>  	INTEL_GUC_RECV_MSG_EXCEPTION = BIT(30),
>  };
>  
> +#define INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc) \
> +	((intel_guc_ct_enabled(&(guc)->ct)) && \

do we need this check ?
CTB is prerequisite for submission that is required below

> +	 (intel_guc_submission_is_used(guc)) && \
> +	 (GRAPHICS_VER(guc_to_gt((guc))->i915) >= 12))
> +
>  #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 40f726c61e95..6888ea1bc7c1 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1653,11 +1653,20 @@ static void __guc_reset_context(struct intel_context *ce, intel_engine_mask_t st
>  	intel_context_put(parent);
>  }
>  
> +static void wake_up_tlb_invalidate(struct intel_guc_tlb_wait *wait)
> +{
> +	/* Barrier to ensure the store is observed by the woken thread */
> +	smp_store_mb(wait->status, 0);
> +	wake_up(&wait->wq);
> +}
> +
>  void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stalled)
>  {
> +	struct intel_guc_tlb_wait *wait;
>  	struct intel_context *ce;
>  	unsigned long index;
>  	unsigned long flags;
> +	unsigned long i;
>  
>  	if (unlikely(!guc_submission_initialized(guc))) {
>  		/* Reset called during driver load? GuC not yet initialised! */
> @@ -1683,6 +1692,13 @@ void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stall
>  
>  	/* GuC is blown away, drop all references to contexts */
>  	xa_destroy(&guc->context_lookup);
> +
> +	/*
> +	 * The full GT reset will have cleared the TLB caches and flushed the
> +	 * G2H message queue; we can release all the blocked waiters.
> +	 */
> +	xa_for_each(&guc->tlb_lookup, i, wait)
> +		wake_up_tlb_invalidate(wait);

shouldn't this be closer to intel_guc_invalidate_tlb_guc()
then we can avoid spreading code across many files

same for the init/fini_tlb_lookup() functions below

>  }
>  
>  static void guc_cancel_context_requests(struct intel_context *ce)
> @@ -1805,6 +1821,41 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
>  static void destroyed_worker_func(struct work_struct *w);
>  static void reset_fail_worker_func(struct work_struct *w);
>  
> +static int init_tlb_lookup(struct intel_guc *guc)
> +{
> +	struct intel_guc_tlb_wait *wait;
> +	int err;
> +
> +	xa_init_flags(&guc->tlb_lookup, XA_FLAGS_ALLOC);
> +
> +	wait = kzalloc(sizeof(*wait), GFP_KERNEL);
> +	if (!wait)
> +		return -ENOMEM;
> +
> +	init_waitqueue_head(&wait->wq);
> +	err = xa_alloc_cyclic_irq(&guc->tlb_lookup, &guc->serial_slot, wait,
> +				  xa_limit_32b, &guc->next_seqno, GFP_KERNEL);
> +	if (err == -ENOMEM) {
> +		kfree(wait);
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static void fini_tlb_lookup(struct intel_guc *guc)
> +{
> +	struct intel_guc_tlb_wait *wait;
> +
> +	wait = xa_load(&guc->tlb_lookup, guc->serial_slot);
> +	if (wait) {
> +		GEM_BUG_ON(wait->status);
> +		kfree(wait);
> +	}
> +
> +	xa_destroy(&guc->tlb_lookup);
> +}
> +
>  /*
>   * Set up the memory resources to be shared with the GuC (via the GGTT)
>   * at firmware loading time.
> @@ -1812,20 +1863,31 @@ static void reset_fail_worker_func(struct work_struct *w);
>  int intel_guc_submission_init(struct intel_guc *guc)
>  {
>  	struct intel_gt *gt = guc_to_gt(guc);
> +	int ret;
>  
>  	if (guc->submission_initialized)
>  		return 0;
>  
> +	ret = init_tlb_lookup(guc);

if we promote guc_tlb to own file/functions then maybe it could be
init/fini directly from __uc_init_hw() ?

> +	if (ret)
> +		return ret;
> +
>  	guc->submission_state.guc_ids_bitmap =
>  		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
> -	if (!guc->submission_state.guc_ids_bitmap)
> -		return -ENOMEM;
> +	if (!guc->submission_state.guc_ids_bitmap) {
> +		ret = -ENOMEM;
> +		goto err;
> +	}
>  
>  	guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ;
>  	guc->timestamp.shift = gpm_timestamp_shift(gt);
>  	guc->submission_initialized = true;
>  
>  	return 0;
> +
> +err:
> +	fini_tlb_lookup(guc);
> +	return ret;
>  }
>  
>  void intel_guc_submission_fini(struct intel_guc *guc)
> @@ -1836,6 +1898,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>  	guc_flush_destroyed_contexts(guc);
>  	i915_sched_engine_put(guc->sched_engine);
>  	bitmap_free(guc->submission_state.guc_ids_bitmap);
> +	fini_tlb_lookup(guc);
>  	guc->submission_initialized = false;
>  }
>  
> @@ -4027,6 +4090,30 @@ g2h_context_lookup(struct intel_guc *guc, u32 ctx_id)
>  	return ce;
>  }
>  
> +static void wait_wake_outstanding_tlb_g2h(struct intel_guc *guc, u32 seqno)
> +{
> +	struct intel_guc_tlb_wait *wait;
> +	unsigned long flags;
> +
> +	xa_lock_irqsave(&guc->tlb_lookup, flags);
> +	wait = xa_load(&guc->tlb_lookup, seqno);
> +
> +	/* We received a response after the waiting task did exit with a timeout */
> +	if (unlikely(!wait))
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Stale tlb invalidation response with seqno %d\n", seqno);

hmm, this sounds like a problem as we shouldn't get any late
notifications - do we really want to hide it under drm_dbg ?

> +
> +	if (wait)
> +		wake_up_tlb_invalidate(wait);
> +
> +	xa_unlock_irqrestore(&guc->tlb_lookup, flags);
> +}
> +
> +void intel_guc_tlb_invalidation_done(struct intel_guc *guc, u32 seqno)
> +{
> +	wait_wake_outstanding_tlb_g2h(guc, seqno);
> +}
> +
>  int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>  					  const u32 *msg,
>  					  u32 len)

,Michal

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 16/21] drm/i915: Define GuC Based TLB invalidation routines
  2022-07-14 12:06 ` [PATCH v2 16/21] drm/i915: Define GuC Based TLB invalidation routines Mauro Carvalho Chehab
@ 2022-07-14 15:20   ` Michal Wajdeczko
  0 siblings, 0 replies; 52+ messages in thread
From: Michal Wajdeczko @ 2022-07-14 15:20 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Prathap Kumar Valsan, Alan Previn, Borislav Petkov,
	Daniel Vetter, Daniele Ceraolo Spurio, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Lucas De Marchi, Matt Roper,
	Matthew Brost, Rodrigo Vivi, Tvrtko Ursulin,
	Umesh Nerlige Ramappa, Vinay Belgaumkar, dri-devel, intel-gfx,
	linux-kernel



On 14.07.2022 14:06, Mauro Carvalho Chehab wrote:
> From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> 
> Add routines to interface with GuC firmware for selective TLB invalidation
> supported on XeHP.
> 
> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  3 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 90 +++++++++++++++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 10 +++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  3 +
>  4 files changed, 106 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> index fb0af33e43cc..5c019856a269 100644
> --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> @@ -188,6 +188,9 @@ enum intel_guc_state_capture_event_status {
>  #define INTEL_GUC_TLB_INVAL_FLUSH_CACHE (1 << 31)
>  
>  enum intel_guc_tlb_invalidation_type {
> +	INTEL_GUC_TLB_INVAL_FULL = 0x0,
> +	INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE = 0x1,
> +	INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX = 0x2,
>  	INTEL_GUC_TLB_INVAL_GUC = 0x3,
>  };
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 8a104a292598..98260a7bc90b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -923,6 +923,96 @@ static int guc_send_invalidate_tlb(struct intel_guc *guc, u32 *action, u32 size)
>  	return err;
>  }
>  
> + /* Full TLB invalidation */
> +int intel_guc_invalidate_tlb_full(struct intel_guc *guc,
> +				  enum intel_guc_tlb_inval_mode mode)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_TLB_INVALIDATION,
> +		0,
> +		INTEL_GUC_TLB_INVAL_FULL << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
> +			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
> +			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
> +	};
> +
> +	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc)) {
> +		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");

s/Tlb/TLB

and use drm_err() or even consider GEM_BUG_ON() as this looks more like
a coding mistake if we will be here, no ?

> +		return 0;
> +	}
> +
> +	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
> +}
> +
> +/*
> + * Selective TLB Invalidation for Address Range:
> + * TLB's in the Address Range is Invalidated across all engines.
> + */
> +int intel_guc_invalidate_tlb_page_selective(struct intel_guc *guc,
> +					    enum intel_guc_tlb_inval_mode mode,
> +					    u64 start, u64 length)
> +{
> +	u64 vm_total = BIT_ULL(INTEL_INFO(guc_to_gt(guc)->i915)->ppgtt_size);
> +	u32 address_mask = (ilog2(length) - ilog2(I915_GTT_PAGE_SIZE_4K));

drop extra ( )

> +	u32 full_range = vm_total == length;

bool ?

> +	u32 action[] = {
> +		INTEL_GUC_ACTION_TLB_INVALIDATION,
> +		0,
> +		INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
> +			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
> +			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
> +		0,
> +		full_range ? full_range : lower_32_bits(start),
> +		full_range ? 0 : upper_32_bits(start),
> +		full_range ? 0 : address_mask,
> +	};
> +
> +	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION_SELECTIVE(guc)) {
> +		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");

as above

> +		return 0;
> +	}
> +
> +	GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE_4K));
> +	GEM_BUG_ON(!IS_ALIGNED(length, I915_GTT_PAGE_SIZE_4K));
> +	GEM_BUG_ON(range_overflows(start, length, vm_total));
> +
> +	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
> +}
> +
> +/*
> + * Selective TLB Invalidation for Context:
> + * Invalidates all TLB's for a specific context across all engines.
> + */
> +int intel_guc_invalidate_tlb_page_selective_ctx(struct intel_guc *guc,
> +						enum intel_guc_tlb_inval_mode mode,
> +						u64 start, u64 length, u32 ctxid)
> +{
> +	u64 vm_total = BIT_ULL(INTEL_INFO(guc_to_gt(guc)->i915)->ppgtt_size);
> +	u32 address_mask = (ilog2(length) - ilog2(I915_GTT_PAGE_SIZE_4K));

drop ( )

> +	u32 full_range = vm_total == length;

bool

> +	u32 action[] = {
> +		INTEL_GUC_ACTION_TLB_INVALIDATION,
> +		0,
> +		INTEL_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
> +			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
> +			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
> +		ctxid,
> +		full_range ? full_range : lower_32_bits(start),
> +		full_range ? 0 : upper_32_bits(start),
> +		full_range ? 0 : address_mask,
> +	};
> +
> +	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION_SELECTIVE(guc)) {
> +		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");

as above

> +		return 0;
> +	}
> +
> +	GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE_4K));
> +	GEM_BUG_ON(!IS_ALIGNED(length, I915_GTT_PAGE_SIZE_4K));
> +	GEM_BUG_ON(range_overflows(start, length, vm_total));
> +
> +	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
> +}
> +
>  /*
>   * Guc TLB Invalidation: Invalidate the TLB's of GuC itself.
>   */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 01c6478451cc..df6ba1c32808 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -384,6 +384,16 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size,
>  int intel_guc_self_cfg32(struct intel_guc *guc, u16 key, u32 value);
>  int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value);
>  
> +int intel_guc_g2g_register(struct intel_guc *guc);

drop this, not part of this series

> +
> +int intel_guc_invalidate_tlb_full(struct intel_guc *guc,
> +				  enum intel_guc_tlb_inval_mode mode);
> +int intel_guc_invalidate_tlb_page_selective(struct intel_guc *guc,
> +					    enum intel_guc_tlb_inval_mode mode,
> +					    u64 start, u64 length);
> +int intel_guc_invalidate_tlb_page_selective_ctx(struct intel_guc *guc,
> +						  enum intel_guc_tlb_inval_mode mode,
> +						  u64 start, u64 length, u32 ctxid);
>  int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
>  				 enum intel_guc_tlb_inval_mode mode);
>  int intel_guc_invalidate_tlb_all(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> index 3edf567b3f65..29e402f70a94 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> @@ -436,5 +436,8 @@ enum intel_guc_recv_message {
>  	((intel_guc_ct_enabled(&(guc)->ct)) && \
>  	 (intel_guc_submission_is_used(guc)) && \
>  	 (GRAPHICS_VER(guc_to_gt((guc))->i915) >= 12))
> +#define INTEL_GUC_SUPPORTS_TLB_INVALIDATION_SELECTIVE(guc) \
> +	(INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc) && \
> +	HAS_SELECTIVE_TLB_INVALIDATION(guc_to_gt(guc)->i915))
>  
>  #endif

,Michal

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-14 12:06 ` [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab
@ 2022-07-18 13:16   ` Tvrtko Ursulin
  2022-07-18 14:53     ` [Intel-gfx] " Mauro Carvalho Chehab
  2022-07-22 11:56   ` Andi Shyti
  1 sibling, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-18 13:16 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Jason Ekstrand, John Harrison, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost,
	Rodrigo Vivi, dri-devel, intel-gfx, linux-kernel, stable,
	Fei Yang


On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Check if the device is powered down prior to any engine activity,
> as, on such cases, all the TLBs were already invalidated, so an
> explicit TLB invalidation is not needed, thus reducing the
> performance regression impact due to it.
> 
> This becomes more significant with GuC, as it can only do so when
> the connection to the GuC is awake.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Patch itself looks fine but I don't think we closed on the issue of 
stable/fixes on this patch?

My position here is that, if the functional issue is only with GuC 
invalidations, then the tags shouldn't be there (and the huge CC list).

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++----
>   drivers/gpu/drm/i915/gt/intel_gt.c        | 17 ++++++++++-------
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
>   3 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 97c820eee115..6835279943df 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -6,14 +6,15 @@
>   
>   #include <drm/drm_cache.h>
>   
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +
>   #include "i915_drv.h"
>   #include "i915_gem_object.h"
>   #include "i915_scatterlist.h"
>   #include "i915_gem_lmem.h"
>   #include "i915_gem_mman.h"
>   
> -#include "gt/intel_gt.h"
> -
>   void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>   				 struct sg_table *pages,
>   				 unsigned int sg_page_sizes)
> @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>   
>   	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
>   		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +		struct intel_gt *gt = to_gt(i915);
>   		intel_wakeref_t wakeref;
>   
> -		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
> -			intel_gt_invalidate_tlbs(to_gt(i915));
> +		with_intel_gt_pm_if_awake(gt, wakeref)
> +			intel_gt_invalidate_tlbs(gt);
>   	}
>   
>   	return pages;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 68c2b0d8f187..c4d43da84d8e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -12,6 +12,7 @@
>   
>   #include "i915_drv.h"
>   #include "intel_context.h"
> +#include "intel_engine_pm.h"
>   #include "intel_engine_regs.h"
>   #include "intel_ggtt_gmch.h"
>   #include "intel_gt.h"
> @@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	struct drm_i915_private *i915 = gt->i915;
>   	struct intel_uncore *uncore = gt->uncore;
>   	struct intel_engine_cs *engine;
> +	intel_engine_mask_t awake, tmp;
>   	enum intel_engine_id id;
>   	const i915_reg_t *regs;
>   	unsigned int num = 0;
> @@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   
>   	GEM_TRACE("\n");
>   
> -	assert_rpm_wakelock_held(&i915->runtime_pm);
> -
>   	mutex_lock(&gt->tlb_invalidate_lock);
>   	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
>   	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>   
> +	awake = 0;
>   	for_each_engine(engine, gt, id) {
>   		struct reg_and_bit rb;
>   
> +		if (!intel_engine_pm_is_awake(engine))
> +			continue;
> +
>   		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>   		if (!i915_mmio_reg_offset(rb.reg))
>   			continue;
>   
>   		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> +		awake |= engine->mask;
>   	}
>   
>   	spin_unlock_irq(&uncore->lock);
>   
> -	for_each_engine(engine, gt, id) {
> +	for_each_engine_masked(engine, gt, awake, tmp) {
> +		struct reg_and_bit rb;
> +
>   		/*
>   		 * HW architecture suggest typical invalidation time at 40us,
>   		 * with pessimistic cases up to 100us and a recommendation to
> @@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   		 */
>   		const unsigned int timeout_us = 100;
>   		const unsigned int timeout_ms = 4;
> -		struct reg_and_bit rb;
>   
>   		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> -		if (!i915_mmio_reg_offset(rb.reg))
> -			continue;
> -
>   		if (__intel_wait_for_register_fw(uncore,
>   						 rb.reg, rb.bit, 0,
>   						 timeout_us, timeout_ms,
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> index bc898df7a48c..a334787a4939 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
>   	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
>   	     intel_gt_pm_put(gt), tmp = 0)
>   
> +#define with_intel_gt_pm_if_awake(gt, wf) \
> +	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
> +
>   static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
>   {
>   	return intel_wakeref_wait_for_idle(&gt->wakeref);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 02/21] drm/i915/gt: document with_intel_gt_pm_if_awake()
  2022-07-14 12:06 ` [PATCH v2 02/21] drm/i915/gt: document with_intel_gt_pm_if_awake() Mauro Carvalho Chehab
@ 2022-07-18 13:21   ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-18 13:21 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Daniel Vetter, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Matthew Brost, Rodrigo Vivi,
	dri-devel, intel-gfx, linux-kernel


On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> Add a kernel-doc markup to document this new macro.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> index a334787a4939..4d4caf612fdc 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> @@ -55,6 +55,13 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
>   	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
>   	     intel_gt_pm_put(gt), tmp = 0)
>   
> +/**
> + * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent
> + *	it to sleep, run some code and then put the reference away.
> + *
> + * @gt: pointer to the gt
> + * @wf: pointer to a temporary wakeref.
> + */
>   #define with_intel_gt_pm_if_awake(gt, wf) \
>   	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)

Maybe say in kerneldoc the put is async. Although for me documenting 
trivial helpers is a bit over the top by anyway..

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 03/21] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  2022-07-14 12:06 ` [PATCH v2 03/21] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations Mauro Carvalho Chehab
@ 2022-07-18 13:24   ` Tvrtko Ursulin
  2022-07-22 11:57   ` Andi Shyti
  1 sibling, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-18 13:24 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Rodrigo Vivi, dri-devel, intel-gfx,
	linux-kernel, stable, Fei Yang, Thomas Hellström


On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Ensure that the TLB of the OA unit is also invalidated
> on gen12 HW, as just invalidating the TLB of an engine is not
> enough.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index c4d43da84d8e..1d84418e8676 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -11,6 +11,7 @@
>   #include "pxp/intel_pxp.h"
>   
>   #include "i915_drv.h"
> +#include "i915_perf_oa_regs.h"
>   #include "intel_context.h"
>   #include "intel_engine_pm.h"
>   #include "intel_engine_regs.h"
> @@ -969,6 +970,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   		awake |= engine->mask;
>   	}
>   
> +	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
> +	if (awake &&
> +	    (IS_TIGERLAKE(i915) ||
> +	     IS_DG1(i915) ||
> +	     IS_ROCKETLAKE(i915) ||
> +	     IS_ALDERLAKE_S(i915) ||
> +	     IS_ALDERLAKE_P(i915)))
> +		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
> +
>   	spin_unlock_irq(&uncore->lock);
>   
>   	for_each_engine_masked(engine, gt, awake, tmp) {

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  2022-07-14 12:06 ` [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation Mauro Carvalho Chehab
@ 2022-07-18 13:39   ` Tvrtko Ursulin
  2022-07-18 16:00     ` [Intel-gfx] " Mauro Carvalho Chehab
  2022-07-22 11:58   ` Andi Shyti
  1 sibling, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-18 13:39 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Daniel Vetter, Dave Airlie, David Airlie,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, dri-devel, intel-gfx,
	linux-kernel, stable, Fei Yang, Andi Shyti,
	Thomas Hellström


On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't flush TLBs when the buffer is only used in the GGTT under full
> control of the kernel, as there's no risk of concurrent access
> and stale access from prefetch.
> 
> We only need to invalidate the TLB if they are accessible by the user.
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Do we really need or want stable and fixes on this one?

What do we think the performance improvement is, given there's very 
little in GGTT, which is not mapped via PPGTT as well?

I think it is safe, but part of me would ideally not even want to think 
about whether it is safe, if the performance improvement is 
non-existent. Which I can't imagine how there would be?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/i915_vma.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index ef3b04c7e153..646f419b2035 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -538,7 +538,8 @@ int i915_vma_bind(struct i915_vma *vma,
>   				   bind_flags);
>   	}
>   
> -	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> +	if (bind_flags & I915_VMA_LOCAL_BIND)
> +		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
>   
>   	atomic_or(bind_flags, &vma->flags);
>   	return 0;

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged
  2022-07-14 12:06 ` [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged Mauro Carvalho Chehab
@ 2022-07-18 13:45   ` Tvrtko Ursulin
  2022-07-18 16:06     ` [Intel-gfx] " Mauro Carvalho Chehab
  2022-07-22 12:00   ` Andi Shyti
  1 sibling, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-18 13:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Rodrigo Vivi, dri-devel, intel-gfx,
	linux-kernel, stable, Fei Yang, Thomas Hellström


On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
> 
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Is the claim of a performance regression this solved based on a wedged 
GPU which does not work any more to the extend where mmio tlb 
invalidation requests keep timing out? If so please clarify in the 
commit text and then it looks good to me. Even if it is IMO a very 
borderline situation to declare something a fix.

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>   drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 1d84418e8676..5c55a90672f4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -934,6 +934,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
>   		return;
>   
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
>   	if (GRAPHICS_VER(i915) == 12) {
>   		regs = gen12_regs;
>   		num = ARRAY_SIZE(gen12_regs);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-14 12:06 ` [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab
@ 2022-07-18 13:52   ` Tvrtko Ursulin
  2022-07-20  7:13     ` [Intel-gfx] " Mauro Carvalho Chehab
  2022-07-20 10:54   ` Tvrtko Ursulin
  1 sibling, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-18 13:52 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Christian König, Thomas Hellström,
	Andi Shyti, Ashutosh Dixit, Ayaz A Siddiqui, Casey Bowman,
	Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie, David Airlie,
	Jani Nikula, Jason Ekstrand, John Harrison, Joonas Lahtinen,
	Lucas De Marchi, Maarten Lankhorst, Matt Roper, Matthew Auld,
	Prathap Kumar Valsan, Ramalingam C, Rodrigo Vivi, Sumit Semwal,
	Tomas Winkler, dri-devel, intel-gfx, linaro-mm-sig, linux-kernel,
	linux-media, stable, Tvrtko Ursulin, Fei Yang


On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Invalidate TLB in patch, in order to reduce performance regressions.

"in batches"?

> Currently, every caller performs a full barrier around a TLB
> invalidation, ignoring all other invalidations that may have already
> removed their PTEs from the cache. As this is a synchronous operation
> and can be quite slow, we cause multiple threads to contend on the TLB
> invalidate mutex blocking userspace.
> 
> We only need to invalidate the TLB once after replacing our PTE to
> ensure that there is no possible continued access to the physical
> address before releasing our pages. By tracking a seqno for each full
> TLB invalidate we can quickly determine if one has been performed since
> rewriting the PTE, and only if necessary trigger one for ourselves.
> 
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> [mchehab: rebased to not require moving the code to a separate file]
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
>   drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
>   drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
>   drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
>   drivers/gpu/drm/i915/i915_vma.c               | 34 +++++++++---
>   drivers/gpu/drm/i915/i915_vma.h               |  1 +
>   drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
>   drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
>   10 files changed, 125 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9f6b14ec189a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -335,7 +335,6 @@ struct drm_i915_gem_object {
>   #define I915_BO_READONLY          BIT(7)
>   #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
>   #define I915_BO_PROTECTED         BIT(9)
> -#define I915_BO_WAS_BOUND_BIT     10
>   	/**
>   	 * @mem_flags - Mutable placement-related flags
>   	 *
> @@ -616,6 +615,8 @@ struct drm_i915_gem_object {
>   		 * pages were last acquired.
>   		 */
>   		bool dirty:1;
> +
> +		u32 tlb;
>   	} mm;
>   
>   	struct {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6835279943df..8357dbdcab5c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
>   		vunmap(ptr);
>   }
>   
> +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct intel_gt *gt = to_gt(i915);
> +
> +	if (!obj->mm.tlb)
> +		return;
> +
> +	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
> +	obj->mm.tlb = 0;
> +}
> +
>   struct sg_table *
>   __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>   {
> @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>   	__i915_gem_object_reset_page_iter(obj);
>   	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
>   
> -	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
> -		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> -		struct intel_gt *gt = to_gt(i915);
> -		intel_wakeref_t wakeref;
> -
> -		with_intel_gt_pm_if_awake(gt, wakeref)
> -			intel_gt_invalidate_tlbs(gt);
> -	}
> +	flush_tlb_invalidate(obj);
>   
>   	return pages;
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 5c55a90672f4..f435e06125aa 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>   {
>   	spin_lock_init(&gt->irq_lock);
>   
> -	mutex_init(&gt->tlb_invalidate_lock);
> -
>   	INIT_LIST_HEAD(&gt->closed_vma);
>   	spin_lock_init(&gt->closed_lock);
>   
> @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>   	intel_gt_init_reset(gt);
>   	intel_gt_init_requests(gt);
>   	intel_gt_init_timelines(gt);
> +	mutex_init(&gt->tlb.invalidate_lock);
> +	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>   	intel_gt_pm_init_early(gt);
>   
>   	intel_uc_init_early(&gt->uc);
> @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
>   		intel_gt_fini_requests(gt);
>   		intel_gt_fini_reset(gt);
>   		intel_gt_fini_timelines(gt);
> +		mutex_destroy(&gt->tlb.invalidate_lock);
>   		intel_engines_free(gt);
>   	}
>   }
> @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>   	return rb;
>   }
>   
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> +static void mmio_invalidate_full(struct intel_gt *gt)
>   {
>   	static const i915_reg_t gen8_regs[] = {
>   		[RENDER_CLASS]			= GEN8_RTCR,
> @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	const i915_reg_t *regs;
>   	unsigned int num = 0;
>   
> -	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> -		return;
> -
> -	if (intel_gt_is_wedged(gt))
> -		return;
> -
>   	if (GRAPHICS_VER(i915) == 12) {
>   		regs = gen12_regs;
>   		num = ARRAY_SIZE(gen12_regs);
> @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   			  "Platform does not implement TLB invalidation!"))
>   		return;
>   
> -	GEM_TRACE("\n");
> -
> -	mutex_lock(&gt->tlb_invalidate_lock);
>   	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
>   	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   		awake |= engine->mask;
>   	}
>   
> +	GT_TRACE(gt, "invalidated engines %08x\n", awake);
> +
>   	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>   	if (awake &&
>   	    (IS_TIGERLAKE(i915) ||
> @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	 * transitions.
>   	 */
>   	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> -	mutex_unlock(&gt->tlb_invalidate_lock);
> +}
> +
> +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
> +{
> +	u32 cur = intel_gt_tlb_seqno(gt);
> +
> +	/* Only skip if a *full* TLB invalidate barrier has passed */
> +	return (s32)(cur - ALIGN(seqno, 2)) > 0;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
> +{
> +	intel_wakeref_t wakeref;
> +
> +	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> +		return;
> +
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
> +	if (tlb_seqno_passed(gt, seqno))
> +		return;
> +
> +	with_intel_gt_pm_if_awake(gt, wakeref) {
> +		mutex_lock(&gt->tlb.invalidate_lock);
> +		if (tlb_seqno_passed(gt, seqno))
> +			goto unlock;
> +
> +		mmio_invalidate_full(gt);
> +
> +		write_seqcount_invalidate(&gt->tlb.seqno);
> +unlock:
> +		mutex_unlock(&gt->tlb.invalidate_lock);
> +	}
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index 82d6f248d876..40b06adf509a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>   
>   void intel_gt_watchdog_work(struct work_struct *work);
>   
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt);
> +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> +{
> +	return seqprop_sequence(&gt->tlb.seqno);
> +}
> +
> +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> +{
> +	return intel_gt_tlb_seqno(gt) | 1;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
>   
>   #endif /* __INTEL_GT_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index df708802889d..3804a583382b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -11,6 +11,7 @@
>   #include <linux/llist.h>
>   #include <linux/mutex.h>
>   #include <linux/notifier.h>
> +#include <linux/seqlock.h>
>   #include <linux/spinlock.h>
>   #include <linux/types.h>
>   #include <linux/workqueue.h>
> @@ -83,7 +84,22 @@ struct intel_gt {
>   	struct intel_uc uc;
>   	struct intel_gsc gsc;
>   
> -	struct mutex tlb_invalidate_lock;
> +	struct {
> +		/* Serialize global tlb invalidations */
> +		struct mutex invalidate_lock;
> +
> +		/*
> +		 * Batch TLB invalidations
> +		 *
> +		 * After unbinding the PTE, we need to ensure the TLB
> +		 * are invalidated prior to releasing the physical pages.
> +		 * But we only need one such invalidation for all unbinds,
> +		 * so we track how many TLB invalidations have been
> +		 * performed since unbind the PTE and only emit an extra
> +		 * invalidate if no full barrier has been passed.
> +		 */
> +		seqcount_mutex_t seqno;
> +	} tlb;
>   
>   	struct i915_wa_list wa_list;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index d8b94d638559..2da6c82a8bd2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>   void ppgtt_unbind_vma(struct i915_address_space *vm,
>   		      struct i915_vma_resource *vma_res)
>   {
> -	if (vma_res->allocated)
> -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (!vma_res->allocated)
> +		return;
> +
> +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (vma_res->tlb)
> +		vma_invalidate_tlb(vm, *vma_res->tlb);

The patch is about more than batching? If there is a security hole in 
this area (unbind) with the current code?

Regards,

Tvrtko

>   }
>   
>   static unsigned long pd_count(u64 size, int shift)
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 646f419b2035..84a9ccbc5fc5 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -538,9 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
>   				   bind_flags);
>   	}
>   
> -	if (bind_flags & I915_VMA_LOCAL_BIND)
> -		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> -
>   	atomic_or(bind_flags, &vma->flags);
>   	return 0;
>   }
> @@ -1311,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
>   	return err;
>   }
>   
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> +{
> +	/*
> +	 * Before we release the pages that were bound by this vma, we
> +	 * must invalidate all the TLBs that may still have a reference
> +	 * back to our physical address. It only needs to be done once,
> +	 * so after updating the PTE to point away from the pages, record
> +	 * the most recent TLB invalidation seqno, and if we have not yet
> +	 * flushed the TLBs upon release, perform a full invalidation.
> +	 */
> +	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> +}
> +
>   static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
>   {
>   	/* We allocate under vma_get_pages, so beware the shrinker */
> @@ -1942,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>   		vma->vm->skip_pte_rewrite;
>   	trace_i915_vma_unbind(vma);
>   
> -	unbind_fence = i915_vma_resource_unbind(vma_res);
> +	if (async)
> +		unbind_fence = i915_vma_resource_unbind(vma_res,
> +							&vma->obj->mm.tlb);
> +	else
> +		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
> +
>   	vma->resource = NULL;
>   
>   	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
> @@ -1950,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>   
>   	i915_vma_detach(vma);
>   
> -	if (!async && unbind_fence) {
> -		dma_fence_wait(unbind_fence, false);
> -		dma_fence_put(unbind_fence);
> -		unbind_fence = NULL;
> +	if (!async) {
> +		if (unbind_fence) {
> +			dma_fence_wait(unbind_fence, false);
> +			dma_fence_put(unbind_fence);
> +			unbind_fence = NULL;
> +		}
> +		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
>   	}
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 88ca0bd9c900..5048eed536da 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
>   			u64 size, u64 alignment, u64 flags);
>   void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
>   void i915_vma_revoke_mmap(struct i915_vma *vma);
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
>   struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
>   int __i915_vma_unbind(struct i915_vma *vma);
>   int __must_check i915_vma_unbind(struct i915_vma *vma);
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
> index 27c55027387a..5a67995ea5fe 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.c
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.c
> @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
>    * Return: A refcounted pointer to a dma-fence that signals when unbinding is
>    * complete.
>    */
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb)
>   {
>   	struct i915_address_space *vm = vma_res->vm;
>   
> +	vma_res->tlb = tlb;
> +
>   	/* Reference for the sw fence */
>   	i915_vma_resource_get(vma_res);
>   
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
> index 5d8427caa2ba..06923d1816e7 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.h
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.h
> @@ -67,6 +67,7 @@ struct i915_page_sizes {
>    * taken when the unbind is scheduled.
>    * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
>    * needs to be skipped for unbind.
> + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
>    *
>    * The lifetime of a struct i915_vma_resource is from a binding request to
>    * the actual possible asynchronous unbind has completed.
> @@ -119,6 +120,8 @@ struct i915_vma_resource {
>   	bool immediate_unbind:1;
>   	bool needs_wakeref:1;
>   	bool skip_pte_rewrite:1;
> +
> +	u32 *tlb;
>   };
>   
>   bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
> @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
>   
>   void i915_vma_resource_free(struct i915_vma_resource *vma_res);
>   
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb);
>   
>   void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
>   

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-18 13:16   ` Tvrtko Ursulin
@ 2022-07-18 14:53     ` Mauro Carvalho Chehab
  2022-07-18 15:01       ` Tvrtko Ursulin
  2022-07-18 15:50       ` David Laight
  0 siblings, 2 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-18 14:53 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Chris Wilson,
	Matthew Auld, Dave Airlie, Thomas Hellström,
	Lucas De Marchi, intel-gfx, Rodrigo Vivi, linux-kernel, stable

On Mon, 18 Jul 2022 14:16:10 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> > From: Chris Wilson <chris.p.wilson@intel.com>
> > 
> > Check if the device is powered down prior to any engine activity,
> > as, on such cases, all the TLBs were already invalidated, so an
> > explicit TLB invalidation is not needed, thus reducing the
> > performance regression impact due to it.
> > 
> > This becomes more significant with GuC, as it can only do so when
> > the connection to the GuC is awake.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")  
> 
> Patch itself looks fine but I don't think we closed on the issue of 
> stable/fixes on this patch?

No, because TLB cache invalidation takes time and causes time outs, which
in turn affects applications and produce Kernel warnings.

There's even open bugs due to TLB timeouts, like this one:

	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!

See:
	https://gitlab.freedesktop.org/drm/intel/-/issues/6424

So, while this is a performance regression, it ends causing a
functional regression.

The first part of this series (patches 1-7) are meant to reduce the
risk of such timeouts by doing TLB invalidation in batch and only 
when really needed (userspace-exposed TLBs for GTs that are powered-on
and non-edged).

As they're fixing such regressions, it makes sense c/c stable and having
a fixes tag.

> My position here is that, if the functional issue is only with GuC 
> invalidations, then the tags shouldn't be there (and the huge CC list).
> 
> Regards,
> 
> Tvrtko
> 
> > Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> > Cc: Fei Yang <fei.yang@intel.com>
> > Cc: Andi Shyti <andi.shyti@linux.intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> > ---
> > 
> > To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> > See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> > 
> >   drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++----
> >   drivers/gpu/drm/i915/gt/intel_gt.c        | 17 ++++++++++-------
> >   drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
> >   3 files changed, 19 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> > index 97c820eee115..6835279943df 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> > @@ -6,14 +6,15 @@
> >   
> >   #include <drm/drm_cache.h>
> >   
> > +#include "gt/intel_gt.h"
> > +#include "gt/intel_gt_pm.h"
> > +
> >   #include "i915_drv.h"
> >   #include "i915_gem_object.h"
> >   #include "i915_scatterlist.h"
> >   #include "i915_gem_lmem.h"
> >   #include "i915_gem_mman.h"
> >   
> > -#include "gt/intel_gt.h"
> > -
> >   void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> >   				 struct sg_table *pages,
> >   				 unsigned int sg_page_sizes)
> > @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
> >   
> >   	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
> >   		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +		struct intel_gt *gt = to_gt(i915);
> >   		intel_wakeref_t wakeref;
> >   
> > -		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
> > -			intel_gt_invalidate_tlbs(to_gt(i915));
> > +		with_intel_gt_pm_if_awake(gt, wakeref)
> > +			intel_gt_invalidate_tlbs(gt);
> >   	}
> >   
> >   	return pages;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index 68c2b0d8f187..c4d43da84d8e 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -12,6 +12,7 @@
> >   
> >   #include "i915_drv.h"
> >   #include "intel_context.h"
> > +#include "intel_engine_pm.h"
> >   #include "intel_engine_regs.h"
> >   #include "intel_ggtt_gmch.h"
> >   #include "intel_gt.h"
> > @@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >   	struct drm_i915_private *i915 = gt->i915;
> >   	struct intel_uncore *uncore = gt->uncore;
> >   	struct intel_engine_cs *engine;
> > +	intel_engine_mask_t awake, tmp;
> >   	enum intel_engine_id id;
> >   	const i915_reg_t *regs;
> >   	unsigned int num = 0;
> > @@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >   
> >   	GEM_TRACE("\n");
> >   
> > -	assert_rpm_wakelock_held(&i915->runtime_pm);
> > -
> >   	mutex_lock(&gt->tlb_invalidate_lock);
> >   	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
> >   
> >   	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> >   
> > +	awake = 0;
> >   	for_each_engine(engine, gt, id) {
> >   		struct reg_and_bit rb;
> >   
> > +		if (!intel_engine_pm_is_awake(engine))
> > +			continue;
> > +
> >   		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> >   		if (!i915_mmio_reg_offset(rb.reg))
> >   			continue;
> >   
> >   		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> > +		awake |= engine->mask;
> >   	}
> >   
> >   	spin_unlock_irq(&uncore->lock);
> >   
> > -	for_each_engine(engine, gt, id) {
> > +	for_each_engine_masked(engine, gt, awake, tmp) {
> > +		struct reg_and_bit rb;
> > +
> >   		/*
> >   		 * HW architecture suggest typical invalidation time at 40us,
> >   		 * with pessimistic cases up to 100us and a recommendation to
> > @@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> >   		 */
> >   		const unsigned int timeout_us = 100;
> >   		const unsigned int timeout_ms = 4;
> > -		struct reg_and_bit rb;
> >   
> >   		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> > -		if (!i915_mmio_reg_offset(rb.reg))
> > -			continue;
> > -
> >   		if (__intel_wait_for_register_fw(uncore,
> >   						 rb.reg, rb.bit, 0,
> >   						 timeout_us, timeout_ms,
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> > index bc898df7a48c..a334787a4939 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> > @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
> >   	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
> >   	     intel_gt_pm_put(gt), tmp = 0)
> >   
> > +#define with_intel_gt_pm_if_awake(gt, wf) \
> > +	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
> > +
> >   static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
> >   {
> >   	return intel_wakeref_wait_for_idle(&gt->wakeref);  

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-18 14:53     ` [Intel-gfx] " Mauro Carvalho Chehab
@ 2022-07-18 15:01       ` Tvrtko Ursulin
  2022-07-18 15:50       ` David Laight
  1 sibling, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-18 15:01 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Chris Wilson,
	Matthew Auld, Dave Airlie, Thomas Hellström,
	Lucas De Marchi, intel-gfx, Rodrigo Vivi, linux-kernel, stable


On 18/07/2022 15:53, Mauro Carvalho Chehab wrote:
> On Mon, 18 Jul 2022 14:16:10 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
>>> From: Chris Wilson <chris.p.wilson@intel.com>
>>>
>>> Check if the device is powered down prior to any engine activity,
>>> as, on such cases, all the TLBs were already invalidated, so an
>>> explicit TLB invalidation is not needed, thus reducing the
>>> performance regression impact due to it.
>>>
>>> This becomes more significant with GuC, as it can only do so when
>>> the connection to the GuC is awake.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>
>> Patch itself looks fine but I don't think we closed on the issue of
>> stable/fixes on this patch?
> 
> No, because TLB cache invalidation takes time and causes time outs, which
> in turn affects applications and produce Kernel warnings.
> 
> There's even open bugs due to TLB timeouts, like this one:
> 
> 	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
> 
> See:
> 	https://gitlab.freedesktop.org/drm/intel/-/issues/6424
> 
> So, while this is a performance regression, it ends causing a
> functional regression.

This test is not even particularly stressful. Fair enough - thanks for 
the information.

Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Is skipping of the ggtt only bound flush the fix for this particular test?

Regards,

Tvrtko

> 
> The first part of this series (patches 1-7) are meant to reduce the
> risk of such timeouts by doing TLB invalidation in batch and only
> when really needed (userspace-exposed TLBs for GTs that are powered-on
> and non-edged).
> 
> As they're fixing such regressions, it makes sense c/c stable and having
> a fixes tag.
> 
>> My position here is that, if the functional issue is only with GuC
>> invalidations, then the tags shouldn't be there (and the huge CC list).
>>
>> Regards,
>>
>> Tvrtko
>>
>>> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
>>> Cc: Fei Yang <fei.yang@intel.com>
>>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
>>> ---
>>>
>>> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
>>> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
>>>
>>>    drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++----
>>>    drivers/gpu/drm/i915/gt/intel_gt.c        | 17 ++++++++++-------
>>>    drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
>>>    3 files changed, 19 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> index 97c820eee115..6835279943df 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>>> @@ -6,14 +6,15 @@
>>>    
>>>    #include <drm/drm_cache.h>
>>>    
>>> +#include "gt/intel_gt.h"
>>> +#include "gt/intel_gt_pm.h"
>>> +
>>>    #include "i915_drv.h"
>>>    #include "i915_gem_object.h"
>>>    #include "i915_scatterlist.h"
>>>    #include "i915_gem_lmem.h"
>>>    #include "i915_gem_mman.h"
>>>    
>>> -#include "gt/intel_gt.h"
>>> -
>>>    void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>>>    				 struct sg_table *pages,
>>>    				 unsigned int sg_page_sizes)
>>> @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>>>    
>>>    	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
>>>    		struct drm_i915_private *i915 = to_i915(obj->base.dev);
>>> +		struct intel_gt *gt = to_gt(i915);
>>>    		intel_wakeref_t wakeref;
>>>    
>>> -		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
>>> -			intel_gt_invalidate_tlbs(to_gt(i915));
>>> +		with_intel_gt_pm_if_awake(gt, wakeref)
>>> +			intel_gt_invalidate_tlbs(gt);
>>>    	}
>>>    
>>>    	return pages;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> index 68c2b0d8f187..c4d43da84d8e 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> @@ -12,6 +12,7 @@
>>>    
>>>    #include "i915_drv.h"
>>>    #include "intel_context.h"
>>> +#include "intel_engine_pm.h"
>>>    #include "intel_engine_regs.h"
>>>    #include "intel_ggtt_gmch.h"
>>>    #include "intel_gt.h"
>>> @@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>    	struct drm_i915_private *i915 = gt->i915;
>>>    	struct intel_uncore *uncore = gt->uncore;
>>>    	struct intel_engine_cs *engine;
>>> +	intel_engine_mask_t awake, tmp;
>>>    	enum intel_engine_id id;
>>>    	const i915_reg_t *regs;
>>>    	unsigned int num = 0;
>>> @@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>    
>>>    	GEM_TRACE("\n");
>>>    
>>> -	assert_rpm_wakelock_held(&i915->runtime_pm);
>>> -
>>>    	mutex_lock(&gt->tlb_invalidate_lock);
>>>    	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>>>    
>>>    	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>>>    
>>> +	awake = 0;
>>>    	for_each_engine(engine, gt, id) {
>>>    		struct reg_and_bit rb;
>>>    
>>> +		if (!intel_engine_pm_is_awake(engine))
>>> +			continue;
>>> +
>>>    		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>>    		if (!i915_mmio_reg_offset(rb.reg))
>>>    			continue;
>>>    
>>>    		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
>>> +		awake |= engine->mask;
>>>    	}
>>>    
>>>    	spin_unlock_irq(&uncore->lock);
>>>    
>>> -	for_each_engine(engine, gt, id) {
>>> +	for_each_engine_masked(engine, gt, awake, tmp) {
>>> +		struct reg_and_bit rb;
>>> +
>>>    		/*
>>>    		 * HW architecture suggest typical invalidation time at 40us,
>>>    		 * with pessimistic cases up to 100us and a recommendation to
>>> @@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>>>    		 */
>>>    		const unsigned int timeout_us = 100;
>>>    		const unsigned int timeout_ms = 4;
>>> -		struct reg_and_bit rb;
>>>    
>>>    		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>>> -		if (!i915_mmio_reg_offset(rb.reg))
>>> -			continue;
>>> -
>>>    		if (__intel_wait_for_register_fw(uncore,
>>>    						 rb.reg, rb.bit, 0,
>>>    						 timeout_us, timeout_ms,
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
>>> index bc898df7a48c..a334787a4939 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
>>> @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
>>>    	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
>>>    	     intel_gt_pm_put(gt), tmp = 0)
>>>    
>>> +#define with_intel_gt_pm_if_awake(gt, wf) \
>>> +	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
>>> +
>>>    static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
>>>    {
>>>    	return intel_wakeref_wait_for_idle(&gt->wakeref);

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [Intel-gfx] [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-18 14:53     ` [Intel-gfx] " Mauro Carvalho Chehab
  2022-07-18 15:01       ` Tvrtko Ursulin
@ 2022-07-18 15:50       ` David Laight
  2022-07-19  7:24         ` Tvrtko Ursulin
  1 sibling, 1 reply; 52+ messages in thread
From: David Laight @ 2022-07-18 15:50 UTC (permalink / raw)
  To: 'Mauro Carvalho Chehab', Tvrtko Ursulin
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Chris Wilson,
	Matthew Auld, Dave Airlie, Thomas Hellström,
	Lucas De Marchi, intel-gfx, Rodrigo Vivi, linux-kernel, stable

From: Mauro Carvalho Chehab
> Sent: 18 July 2022 15:54
> 
> On Mon, 18 Jul 2022 14:16:10 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
> > On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> > > From: Chris Wilson <chris.p.wilson@intel.com>
> > >
> > > Check if the device is powered down prior to any engine activity,
> > > as, on such cases, all the TLBs were already invalidated, so an
> > > explicit TLB invalidation is not needed, thus reducing the
> > > performance regression impact due to it.
> > >
> > > This becomes more significant with GuC, as it can only do so when
> > > the connection to the GuC is awake.
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> >
> > Patch itself looks fine but I don't think we closed on the issue of
> > stable/fixes on this patch?
> 
> No, because TLB cache invalidation takes time and causes time outs, which
> in turn affects applications and produce Kernel warnings.

It's not only the TLB flushes that cause grief.

There is a loop that forces a write-back of all the frame buffer pages.
With a large display and some cpu (like my Ivy bridge one) that
takes long enough with pre-emption disabled that wakeup of RT processes
(and any pinned to the cpu) takes far longer than one might have
wished for.

Since some X servers request a flush every few seconds this makes
the system unusable for some workloads.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  2022-07-18 13:39   ` Tvrtko Ursulin
@ 2022-07-18 16:00     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-18 16:00 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Mauro Carvalho Chehab, Thomas Hellström, David Airlie,
	dri-devel, linux-kernel, Chris Wilson, Rodrigo Vivi, Dave Airlie,
	stable, intel-gfx

On Mon, 18 Jul 2022 14:39:17 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> > From: Chris Wilson <chris.p.wilson@intel.com>
> > 
> > Don't flush TLBs when the buffer is only used in the GGTT under full
> > control of the kernel, as there's no risk of concurrent access
> > and stale access from prefetch.
> > 
> > We only need to invalidate the TLB if they are accessible by the user.
> > That helps to reduce the performance regression introduced by TLB
> > invalidate logic.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")  
> 
> Do we really need or want stable and fixes on this one?
> 
> What do we think the performance improvement is, given there's very 
> little in GGTT, which is not mapped via PPGTT as well?
> 
> I think it is safe, but part of me would ideally not even want to think 
> about whether it is safe, if the performance improvement is 
> non-existent. Which I can't imagine how there would be?

Makes sense. Patch 6 actually ends removing the code doing
that, so I'll just fold this patch with patch 6, in order to
avoid adding something that will later be removed.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged
  2022-07-18 13:45   ` Tvrtko Ursulin
@ 2022-07-18 16:06     ` Mauro Carvalho Chehab
  2022-07-19  7:19       ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-18 16:06 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Mauro Carvalho Chehab, Thomas Hellström, David Airlie,
	dri-devel, Lucas De Marchi, linux-kernel, Chris Wilson,
	Rodrigo Vivi, Dave Airlie, stable, intel-gfx

On Mon, 18 Jul 2022 14:45:22 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> > From: Chris Wilson <chris.p.wilson@intel.com>
> > 
> > Skip all further TLB invalidations once the device is wedged and
> > had been reset, as, on such cases, it can no longer process instructions
> > on the GPU and the user no longer has access to the TLB's in each engine.
> > 
> > That helps to reduce the performance regression introduced by TLB
> > invalidate logic.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")  
> 
> Is the claim of a performance regression this solved based on a wedged 
> GPU which does not work any more to the extend where mmio tlb 
> invalidation requests keep timing out? If so please clarify in the 
> commit text and then it looks good to me. Even if it is IMO a very 
> borderline situation to declare something a fix.

Indeed this helps on a borderline situation: if GT is wedged, TLB 
invalidation will timeout, so it makes sense to keep the patch with a
comment like:

    drm/i915/gt: Skip TLB invalidations once wedged
    
    Skip all further TLB invalidations once the device is wedged and
    had been reset, as, on such cases, it can no longer process instructions
    on the GPU and the user no longer has access to the TLB's in each engine.
    
    So, an attempt to do a TLB cache invalidation will produce a timeout.
    
    That helps to reduce the performance regression introduced by TLB
    invalidate logic.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged
  2022-07-18 16:06     ` [Intel-gfx] " Mauro Carvalho Chehab
@ 2022-07-19  7:19       ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-19  7:19 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, Thomas Hellström, David Airlie,
	dri-devel, Lucas De Marchi, linux-kernel, Chris Wilson,
	Rodrigo Vivi, Dave Airlie, stable, intel-gfx


On 18/07/2022 17:06, Mauro Carvalho Chehab wrote:
> On Mon, 18 Jul 2022 14:45:22 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
>>> From: Chris Wilson <chris.p.wilson@intel.com>
>>>
>>> Skip all further TLB invalidations once the device is wedged and
>>> had been reset, as, on such cases, it can no longer process instructions
>>> on the GPU and the user no longer has access to the TLB's in each engine.
>>>
>>> That helps to reduce the performance regression introduced by TLB
>>> invalidate logic.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>
>> Is the claim of a performance regression this solved based on a wedged
>> GPU which does not work any more to the extend where mmio tlb
>> invalidation requests keep timing out? If so please clarify in the
>> commit text and then it looks good to me. Even if it is IMO a very
>> borderline situation to declare something a fix.
> 
> Indeed this helps on a borderline situation: if GT is wedged, TLB
> invalidation will timeout, so it makes sense to keep the patch with a
> comment like:
> 
>      drm/i915/gt: Skip TLB invalidations once wedged
>      
>      Skip all further TLB invalidations once the device is wedged and
>      had been reset, as, on such cases, it can no longer process instructions
>      on the GPU and the user no longer has access to the TLB's in each engine.
>      
>      So, an attempt to do a TLB cache invalidation will produce a timeout.
>      
>      That helps to reduce the performance regression introduced by TLB
>      invalidate logic.

Yeah that is better but whether bothering stable with it is the 
question. Wedged GPU means constant endless -EIO to userspace so very 
hard to imagine that after a TLB invalidation timeout or two there would 
be further ones. But okay, it's tiny so fine I guess.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-18 15:50       ` David Laight
@ 2022-07-19  7:24         ` Tvrtko Ursulin
  2022-07-19  7:45           ` David Laight
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-19  7:24 UTC (permalink / raw)
  To: David Laight, 'Mauro Carvalho Chehab'
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Chris Wilson,
	Matthew Auld, Dave Airlie, Thomas Hellström,
	Lucas De Marchi, intel-gfx, Rodrigo Vivi, linux-kernel, stable


Hi David,

On 18/07/2022 16:50, David Laight wrote:
> From: Mauro Carvalho Chehab
>> Sent: 18 July 2022 15:54
>>
>> On Mon, 18 Jul 2022 14:16:10 +0100
>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>
>>> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
>>>> From: Chris Wilson <chris.p.wilson@intel.com>
>>>>
>>>> Check if the device is powered down prior to any engine activity,
>>>> as, on such cases, all the TLBs were already invalidated, so an
>>>> explicit TLB invalidation is not needed, thus reducing the
>>>> performance regression impact due to it.
>>>>
>>>> This becomes more significant with GuC, as it can only do so when
>>>> the connection to the GuC is awake.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
>>>
>>> Patch itself looks fine but I don't think we closed on the issue of
>>> stable/fixes on this patch?
>>
>> No, because TLB cache invalidation takes time and causes time outs, which
>> in turn affects applications and produce Kernel warnings.
> 
> It's not only the TLB flushes that cause grief.
> 
> There is a loop that forces a write-back of all the frame buffer pages.
> With a large display and some cpu (like my Ivy bridge one) that
> takes long enough with pre-emption disabled that wakeup of RT processes
> (and any pinned to the cpu) takes far longer than one might have
> wished for.
> 
> Since some X servers request a flush every few seconds this makes
> the system unusable for some workloads.

Ok TLB invalidations as discussed in this patch does not apply to 
Ivybridge. But what is the write back loop you mention which is causing 
you grief? What size frame buffers are we talking about here? If they 
don't fit in the mappable area recently we merged a patch* which 
improves things in that situation but not sure you are hitting exactly that.

Regards,

Tvrtko

*) 230523ba24bd ("drm/i915/gem: Don't evict unmappable VMAs when pinning 
with PIN_MAPPABLE (v2)")

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [Intel-gfx] [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-19  7:24         ` Tvrtko Ursulin
@ 2022-07-19  7:45           ` David Laight
  0 siblings, 0 replies; 52+ messages in thread
From: David Laight @ 2022-07-19  7:45 UTC (permalink / raw)
  To: 'Tvrtko Ursulin', 'Mauro Carvalho Chehab'
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Chris Wilson,
	Matthew Auld, Dave Airlie, Thomas Hellström,
	Lucas De Marchi, intel-gfx, Rodrigo Vivi, linux-kernel, stable

From: Tvrtko Ursulin
> Sent: 19 July 2022 08:25
...
> > It's not only the TLB flushes that cause grief.
> >
> > There is a loop that forces a write-back of all the frame buffer pages.
> > With a large display and some cpu (like my Ivy bridge one) that
> > takes long enough with pre-emption disabled that wakeup of RT processes
> > (and any pinned to the cpu) takes far longer than one might have
> > wished for.
> >
> > Since some X servers request a flush every few seconds this makes
> > the system unusable for some workloads.
> 
> Ok TLB invalidations as discussed in this patch does not apply to
> Ivybridge. But what is the write back loop you mention which is causing
> you grief? What size frame buffers are we talking about here? If they
> don't fit in the mappable area recently we merged a patch* which
> improves things in that situation but not sure you are hitting exactly that.

I found the old email:

What I've found is that the Intel i915 graphics driver uses the 'events_unbound'
kernel worker thread to periodically execute drm_cflush_sg().
(see https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/drm_cache.c)

I'm guessing this is to ensure that any writes to graphics memory become
visible is a semi-timely manner.

This loop takes about 1us per iteration split fairly evenly between whatever is in
for_each_sg_page() and drm_cflush_page().
With a 2560x1440 display the loop count is 3600 (4 bytes/pixel) and the whole
function takes around 3.3ms.

IIRC the first few page flushes are quick (I bet they go into a fifo)
and then they all get slow.
The flushes are actually requested from userspace.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-18 13:52   ` Tvrtko Ursulin
@ 2022-07-20  7:13     ` Mauro Carvalho Chehab
  2022-07-20 10:49       ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-20  7:13 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Sumit Semwal,
	Chris Wilson, Dave Airlie, Tomas Winkler, Matthew Auld,
	Thomas Hellström, Lucas De Marchi, intel-gfx, linaro-mm-sig,
	Rodrigo Vivi, linux-kernel, stable, linux-media,
	Christian König

On Mon, 18 Jul 2022 14:52:05 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> 
> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> > From: Chris Wilson <chris.p.wilson@intel.com>
> > 
> > Invalidate TLB in patch, in order to reduce performance regressions.
> 
> "in batches"?

Yeah. Will fix it.

> > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > index d8b94d638559..2da6c82a8bd2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> > @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
> >   void ppgtt_unbind_vma(struct i915_address_space *vm,
> >   		      struct i915_vma_resource *vma_res)
> >   {
> > -	if (vma_res->allocated)
> > -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> > +	if (!vma_res->allocated)
> > +		return;
> > +
> > +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> > +	if (vma_res->tlb)
> > +		vma_invalidate_tlb(vm, *vma_res->tlb);
> 
> The patch is about more than batching? If there is a security hole in 
> this area (unbind) with the current code?

No, I don't think there's a security hole. The rationale for this is
not due to it.

Since commit 2f6b90da9192 ("drm/i915: Use vma resources for async unbinding"),
VMA unbind can happen either sync or async.

So, the logic needs to do TLB invalidate on two places. After this
patch, the code at __i915_vma_evict is:

	struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
	{
...
		if (async)
			unbind_fence = i915_vma_resource_unbind(vma_res,
								&vma->obj->mm.tlb);
		else
			unbind_fence = i915_vma_resource_unbind(vma_res, NULL);

		vma->resource = NULL;

		atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
			   &vma->flags);

		i915_vma_detach(vma);

		if (!async) {
			if (unbind_fence) {
				dma_fence_wait(unbind_fence, false);
				dma_fence_put(unbind_fence);
				unbind_fence = NULL;
			}
			vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
		}
...

So, basically, if !async, __i915_vma_evict() will do TLB cache invalidation.

However, when async is used, the actual page release will happen later,
at this function:

	void ppgtt_unbind_vma(struct i915_address_space *vm,
			      struct i915_vma_resource *vma_res)
	{
		if (!vma_res->allocated)
			return;

		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
		if (vma_res->tlb)
			vma_invalidate_tlb(vm, *vma_res->tlb);
	}
	
Regards,
Mauro

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-20  7:13     ` [Intel-gfx] " Mauro Carvalho Chehab
@ 2022-07-20 10:49       ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-20 10:49 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Sumit Semwal,
	Chris Wilson, Dave Airlie, Tomas Winkler, Matthew Auld,
	Thomas Hellström, Lucas De Marchi, intel-gfx, linaro-mm-sig,
	Rodrigo Vivi, linux-kernel, stable, linux-media,
	Christian König


On 20/07/2022 08:13, Mauro Carvalho Chehab wrote:
> On Mon, 18 Jul 2022 14:52:05 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>>
>> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
>>> From: Chris Wilson <chris.p.wilson@intel.com>
>>>
>>> Invalidate TLB in patch, in order to reduce performance regressions.
>>
>> "in batches"?
> 
> Yeah. Will fix it.
> 
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> index d8b94d638559..2da6c82a8bd2 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>>>    void ppgtt_unbind_vma(struct i915_address_space *vm,
>>>    		      struct i915_vma_resource *vma_res)
>>>    {
>>> -	if (vma_res->allocated)
>>> -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
>>> +	if (!vma_res->allocated)
>>> +		return;
>>> +
>>> +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
>>> +	if (vma_res->tlb)
>>> +		vma_invalidate_tlb(vm, *vma_res->tlb);
>>
>> The patch is about more than batching? If there is a security hole in
>> this area (unbind) with the current code?
> 
> No, I don't think there's a security hole. The rationale for this is
> not due to it.

In this case obvious question is why are these changes in the patch 
which declares itself to be about batching invalidations? Because...

> Since commit 2f6b90da9192 ("drm/i915: Use vma resources for async unbinding"),
> VMA unbind can happen either sync or async.
> 
> So, the logic needs to do TLB invalidate on two places. After this
> patch, the code at __i915_vma_evict is:
> 
> 	struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
> 	{
> ...
> 		if (async)
> 			unbind_fence = i915_vma_resource_unbind(vma_res,
> 								&vma->obj->mm.tlb);
> 		else
> 			unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
> 
> 		vma->resource = NULL;
> 
> 		atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
> 			   &vma->flags);
> 
> 		i915_vma_detach(vma);
> 
> 		if (!async) {
> 			if (unbind_fence) {
> 				dma_fence_wait(unbind_fence, false);
> 				dma_fence_put(unbind_fence);
> 				unbind_fence = NULL;
> 			}
> 			vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
> 		}
> ...
> 
> So, basically, if !async, __i915_vma_evict() will do TLB cache invalidation.
> 
> However, when async is used, the actual page release will happen later,
> at this function:
> 
> 	void ppgtt_unbind_vma(struct i915_address_space *vm,
> 			      struct i915_vma_resource *vma_res)
> 	{
> 		if (!vma_res->allocated)
> 			return;
> 
> 		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> 		if (vma_res->tlb)
> 			vma_invalidate_tlb(vm, *vma_res->tlb);
> 	}

.. frankly I don't follow since I don't see any page release happening 
in here. Just PTE clearing.

I am explaining why it looks to me that the patch is doing two things. 
Implementing batching _and_ adding invalidation points at VMA unbind 
sites, while so far we had it at backing store release only. Maybe I am 
wrong and perhaps I am too slow to pick up on the explanation here.

So if the patch is doing two things please split it up.

I am further confused by the invalidation call site in evict and in 
unbind - why there can't be one logical site since the logical sequence 
is evict -> unbind.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-14 12:06 ` [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab
  2022-07-18 13:52   ` Tvrtko Ursulin
@ 2022-07-20 10:54   ` Tvrtko Ursulin
  2022-07-27 11:48     ` [Intel-gfx] " Mauro Carvalho Chehab
  1 sibling, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-20 10:54 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Christian König, Thomas Hellström,
	Andi Shyti, Ashutosh Dixit, Ayaz A Siddiqui, Casey Bowman,
	Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie, David Airlie,
	Jani Nikula, Jason Ekstrand, John Harrison, Joonas Lahtinen,
	Lucas De Marchi, Maarten Lankhorst, Matt Roper, Matthew Auld,
	Prathap Kumar Valsan, Ramalingam C, Rodrigo Vivi, Sumit Semwal,
	Tomas Winkler, dri-devel, intel-gfx, linaro-mm-sig, linux-kernel,
	linux-media, stable, Tvrtko Ursulin, Fei Yang


On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Invalidate TLB in patch, in order to reduce performance regressions.
> 
> Currently, every caller performs a full barrier around a TLB
> invalidation, ignoring all other invalidations that may have already
> removed their PTEs from the cache. As this is a synchronous operation
> and can be quite slow, we cause multiple threads to contend on the TLB
> invalidate mutex blocking userspace.
> 
> We only need to invalidate the TLB once after replacing our PTE to
> ensure that there is no possible continued access to the physical
> address before releasing our pages. By tracking a seqno for each full
> TLB invalidate we can quickly determine if one has been performed since
> rewriting the PTE, and only if necessary trigger one for ourselves.
> 
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> [mchehab: rebased to not require moving the code to a separate file]
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>   .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
>   drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
>   drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
>   drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
>   drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
>   drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
>   drivers/gpu/drm/i915/i915_vma.c               | 34 +++++++++---
>   drivers/gpu/drm/i915/i915_vma.h               |  1 +
>   drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
>   drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
>   10 files changed, 125 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9f6b14ec189a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -335,7 +335,6 @@ struct drm_i915_gem_object {
>   #define I915_BO_READONLY          BIT(7)
>   #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
>   #define I915_BO_PROTECTED         BIT(9)
> -#define I915_BO_WAS_BOUND_BIT     10
>   	/**
>   	 * @mem_flags - Mutable placement-related flags
>   	 *
> @@ -616,6 +615,8 @@ struct drm_i915_gem_object {
>   		 * pages were last acquired.
>   		 */
>   		bool dirty:1;
> +
> +		u32 tlb;
>   	} mm;
>   
>   	struct {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6835279943df..8357dbdcab5c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
>   		vunmap(ptr);
>   }
>   
> +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct intel_gt *gt = to_gt(i915);
> +
> +	if (!obj->mm.tlb)
> +		return;
> +
> +	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
> +	obj->mm.tlb = 0;
> +}
> +
>   struct sg_table *
>   __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>   {
> @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>   	__i915_gem_object_reset_page_iter(obj);
>   	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
>   
> -	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
> -		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> -		struct intel_gt *gt = to_gt(i915);
> -		intel_wakeref_t wakeref;
> -
> -		with_intel_gt_pm_if_awake(gt, wakeref)
> -			intel_gt_invalidate_tlbs(gt);
> -	}
> +	flush_tlb_invalidate(obj);
>   
>   	return pages;
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 5c55a90672f4..f435e06125aa 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>   {
>   	spin_lock_init(&gt->irq_lock);
>   
> -	mutex_init(&gt->tlb_invalidate_lock);
> -
>   	INIT_LIST_HEAD(&gt->closed_vma);
>   	spin_lock_init(&gt->closed_lock);
>   
> @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>   	intel_gt_init_reset(gt);
>   	intel_gt_init_requests(gt);
>   	intel_gt_init_timelines(gt);
> +	mutex_init(&gt->tlb.invalidate_lock);
> +	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>   	intel_gt_pm_init_early(gt);
>   
>   	intel_uc_init_early(&gt->uc);
> @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
>   		intel_gt_fini_requests(gt);
>   		intel_gt_fini_reset(gt);
>   		intel_gt_fini_timelines(gt);
> +		mutex_destroy(&gt->tlb.invalidate_lock);
>   		intel_engines_free(gt);
>   	}
>   }
> @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>   	return rb;
>   }
>   
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> +static void mmio_invalidate_full(struct intel_gt *gt)
>   {
>   	static const i915_reg_t gen8_regs[] = {
>   		[RENDER_CLASS]			= GEN8_RTCR,
> @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	const i915_reg_t *regs;
>   	unsigned int num = 0;
>   
> -	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> -		return;
> -
> -	if (intel_gt_is_wedged(gt))
> -		return;
> -
>   	if (GRAPHICS_VER(i915) == 12) {
>   		regs = gen12_regs;
>   		num = ARRAY_SIZE(gen12_regs);
> @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   			  "Platform does not implement TLB invalidation!"))
>   		return;
>   
> -	GEM_TRACE("\n");
> -
> -	mutex_lock(&gt->tlb_invalidate_lock);
>   	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>   
>   	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   		awake |= engine->mask;
>   	}
>   
> +	GT_TRACE(gt, "invalidated engines %08x\n", awake);
> +
>   	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>   	if (awake &&
>   	    (IS_TIGERLAKE(i915) ||
> @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>   	 * transitions.
>   	 */
>   	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> -	mutex_unlock(&gt->tlb_invalidate_lock);
> +}
> +
> +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
> +{
> +	u32 cur = intel_gt_tlb_seqno(gt);
> +
> +	/* Only skip if a *full* TLB invalidate barrier has passed */
> +	return (s32)(cur - ALIGN(seqno, 2)) > 0;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
> +{
> +	intel_wakeref_t wakeref;
> +
> +	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> +		return;
> +
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
> +	if (tlb_seqno_passed(gt, seqno))
> +		return;
> +
> +	with_intel_gt_pm_if_awake(gt, wakeref) {
> +		mutex_lock(&gt->tlb.invalidate_lock);
> +		if (tlb_seqno_passed(gt, seqno))
> +			goto unlock;
> +
> +		mmio_invalidate_full(gt);
> +
> +		write_seqcount_invalidate(&gt->tlb.seqno);
> +unlock:
> +		mutex_unlock(&gt->tlb.invalidate_lock);
> +	}
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index 82d6f248d876..40b06adf509a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>   
>   void intel_gt_watchdog_work(struct work_struct *work);
>   
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt);
> +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> +{
> +	return seqprop_sequence(&gt->tlb.seqno);
> +}
> +
> +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> +{
> +	return intel_gt_tlb_seqno(gt) | 1;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
>   
>   #endif /* __INTEL_GT_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index df708802889d..3804a583382b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -11,6 +11,7 @@
>   #include <linux/llist.h>
>   #include <linux/mutex.h>
>   #include <linux/notifier.h>
> +#include <linux/seqlock.h>
>   #include <linux/spinlock.h>
>   #include <linux/types.h>
>   #include <linux/workqueue.h>
> @@ -83,7 +84,22 @@ struct intel_gt {
>   	struct intel_uc uc;
>   	struct intel_gsc gsc;
>   
> -	struct mutex tlb_invalidate_lock;
> +	struct {
> +		/* Serialize global tlb invalidations */
> +		struct mutex invalidate_lock;
> +
> +		/*
> +		 * Batch TLB invalidations
> +		 *
> +		 * After unbinding the PTE, we need to ensure the TLB
> +		 * are invalidated prior to releasing the physical pages.
> +		 * But we only need one such invalidation for all unbinds,
> +		 * so we track how many TLB invalidations have been
> +		 * performed since unbind the PTE and only emit an extra
> +		 * invalidate if no full barrier has been passed.
> +		 */
> +		seqcount_mutex_t seqno;
> +	} tlb;
>   
>   	struct i915_wa_list wa_list;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index d8b94d638559..2da6c82a8bd2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>   void ppgtt_unbind_vma(struct i915_address_space *vm,
>   		      struct i915_vma_resource *vma_res)
>   {
> -	if (vma_res->allocated)
> -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (!vma_res->allocated)
> +		return;
> +
> +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (vma_res->tlb)
> +		vma_invalidate_tlb(vm, *vma_res->tlb);
>   }
>   
>   static unsigned long pd_count(u64 size, int shift)
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 646f419b2035..84a9ccbc5fc5 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -538,9 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
>   				   bind_flags);
>   	}
>   
> -	if (bind_flags & I915_VMA_LOCAL_BIND)
> -		set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> -
>   	atomic_or(bind_flags, &vma->flags);
>   	return 0;
>   }
> @@ -1311,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
>   	return err;
>   }
>   
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> +{
> +	/*
> +	 * Before we release the pages that were bound by this vma, we
> +	 * must invalidate all the TLBs that may still have a reference
> +	 * back to our physical address. It only needs to be done once,
> +	 * so after updating the PTE to point away from the pages, record
> +	 * the most recent TLB invalidation seqno, and if we have not yet
> +	 * flushed the TLBs upon release, perform a full invalidation.
> +	 */
> +	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));

Shouldn't tlb be a pointer for this to make sense?

Regards,

Tvrtko

> +}
> +
>   static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
>   {
>   	/* We allocate under vma_get_pages, so beware the shrinker */
> @@ -1942,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>   		vma->vm->skip_pte_rewrite;
>   	trace_i915_vma_unbind(vma);
>   
> -	unbind_fence = i915_vma_resource_unbind(vma_res);
> +	if (async)
> +		unbind_fence = i915_vma_resource_unbind(vma_res,
> +							&vma->obj->mm.tlb);
> +	else
> +		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
> +
>   	vma->resource = NULL;
>   
>   	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
> @@ -1950,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>   
>   	i915_vma_detach(vma);
>   
> -	if (!async && unbind_fence) {
> -		dma_fence_wait(unbind_fence, false);
> -		dma_fence_put(unbind_fence);
> -		unbind_fence = NULL;
> +	if (!async) {
> +		if (unbind_fence) {
> +			dma_fence_wait(unbind_fence, false);
> +			dma_fence_put(unbind_fence);
> +			unbind_fence = NULL;
> +		}
> +		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
>   	}
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 88ca0bd9c900..5048eed536da 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
>   			u64 size, u64 alignment, u64 flags);
>   void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
>   void i915_vma_revoke_mmap(struct i915_vma *vma);
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
>   struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
>   int __i915_vma_unbind(struct i915_vma *vma);
>   int __must_check i915_vma_unbind(struct i915_vma *vma);
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
> index 27c55027387a..5a67995ea5fe 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.c
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.c
> @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
>    * Return: A refcounted pointer to a dma-fence that signals when unbinding is
>    * complete.
>    */
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb)
>   {
>   	struct i915_address_space *vm = vma_res->vm;
>   
> +	vma_res->tlb = tlb;
> +
>   	/* Reference for the sw fence */
>   	i915_vma_resource_get(vma_res);
>   
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
> index 5d8427caa2ba..06923d1816e7 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.h
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.h
> @@ -67,6 +67,7 @@ struct i915_page_sizes {
>    * taken when the unbind is scheduled.
>    * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
>    * needs to be skipped for unbind.
> + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
>    *
>    * The lifetime of a struct i915_vma_resource is from a binding request to
>    * the actual possible asynchronous unbind has completed.
> @@ -119,6 +120,8 @@ struct i915_vma_resource {
>   	bool immediate_unbind:1;
>   	bool needs_wakeref:1;
>   	bool skip_pte_rewrite:1;
> +
> +	u32 *tlb;
>   };
>   
>   bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
> @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
>   
>   void i915_vma_resource_free(struct i915_vma_resource *vma_res);
>   
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb);
>   
>   void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
>   

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-14 12:06 ` [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab
  2022-07-18 13:16   ` Tvrtko Ursulin
@ 2022-07-22 11:56   ` Andi Shyti
  1 sibling, 0 replies; 52+ messages in thread
From: Andi Shyti @ 2022-07-22 11:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Jason Ekstrand, John Harrison, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost,
	Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel,
	stable, Fei Yang

Hi Mauro,

On Thu, Jul 14, 2022 at 01:06:06PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Check if the device is powered down prior to any engine activity,
> as, on such cases, all the TLBs were already invalidated, so an
> explicit TLB invalidation is not needed, thus reducing the
> performance regression impact due to it.
> 
> This becomes more significant with GuC, as it can only do so when
> the connection to the GuC is awake.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

For me it's good, but please, sort out with Tvrtko about his
doubts:

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi

> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> 
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++----
>  drivers/gpu/drm/i915/gt/intel_gt.c        | 17 ++++++++++-------
>  drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
>  3 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 97c820eee115..6835279943df 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -6,14 +6,15 @@
>  
>  #include <drm/drm_cache.h>
>  
> +#include "gt/intel_gt.h"
> +#include "gt/intel_gt_pm.h"
> +
>  #include "i915_drv.h"
>  #include "i915_gem_object.h"
>  #include "i915_scatterlist.h"
>  #include "i915_gem_lmem.h"
>  #include "i915_gem_mman.h"
>  
> -#include "gt/intel_gt.h"
> -
>  void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>  				 struct sg_table *pages,
>  				 unsigned int sg_page_sizes)
> @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>  
>  	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
>  		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +		struct intel_gt *gt = to_gt(i915);
>  		intel_wakeref_t wakeref;
>  
> -		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
> -			intel_gt_invalidate_tlbs(to_gt(i915));
> +		with_intel_gt_pm_if_awake(gt, wakeref)
> +			intel_gt_invalidate_tlbs(gt);
>  	}
>  
>  	return pages;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 68c2b0d8f187..c4d43da84d8e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -12,6 +12,7 @@
>  
>  #include "i915_drv.h"
>  #include "intel_context.h"
> +#include "intel_engine_pm.h"
>  #include "intel_engine_regs.h"
>  #include "intel_ggtt_gmch.h"
>  #include "intel_gt.h"
> @@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	struct drm_i915_private *i915 = gt->i915;
>  	struct intel_uncore *uncore = gt->uncore;
>  	struct intel_engine_cs *engine;
> +	intel_engine_mask_t awake, tmp;
>  	enum intel_engine_id id;
>  	const i915_reg_t *regs;
>  	unsigned int num = 0;
> @@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  
>  	GEM_TRACE("\n");
>  
> -	assert_rpm_wakelock_held(&i915->runtime_pm);
> -
>  	mutex_lock(&gt->tlb_invalidate_lock);
>  	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>  
>  	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
>  
> +	awake = 0;
>  	for_each_engine(engine, gt, id) {
>  		struct reg_and_bit rb;
>  
> +		if (!intel_engine_pm_is_awake(engine))
> +			continue;
> +
>  		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
>  		if (!i915_mmio_reg_offset(rb.reg))
>  			continue;
>  
>  		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
> +		awake |= engine->mask;
>  	}
>  
>  	spin_unlock_irq(&uncore->lock);
>  
> -	for_each_engine(engine, gt, id) {
> +	for_each_engine_masked(engine, gt, awake, tmp) {
> +		struct reg_and_bit rb;
> +
>  		/*
>  		 * HW architecture suggest typical invalidation time at 40us,
>  		 * with pessimistic cases up to 100us and a recommendation to
> @@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  		 */
>  		const unsigned int timeout_us = 100;
>  		const unsigned int timeout_ms = 4;
> -		struct reg_and_bit rb;
>  
>  		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
> -		if (!i915_mmio_reg_offset(rb.reg))
> -			continue;
> -
>  		if (__intel_wait_for_register_fw(uncore,
>  						 rb.reg, rb.bit, 0,
>  						 timeout_us, timeout_ms,
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> index bc898df7a48c..a334787a4939 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
>  	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
>  	     intel_gt_pm_put(gt), tmp = 0)
>  
> +#define with_intel_gt_pm_if_awake(gt, wf) \
> +	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
> +
>  static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
>  {
>  	return intel_wakeref_wait_for_idle(&gt->wakeref);
> -- 
> 2.36.1

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 03/21] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  2022-07-14 12:06 ` [PATCH v2 03/21] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations Mauro Carvalho Chehab
  2022-07-18 13:24   ` Tvrtko Ursulin
@ 2022-07-22 11:57   ` Andi Shyti
  1 sibling, 0 replies; 52+ messages in thread
From: Andi Shyti @ 2022-07-22 11:57 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Rodrigo Vivi, Tvrtko Ursulin,
	dri-devel, intel-gfx, linux-kernel, stable, Fei Yang,
	Thomas Hellström

Hi Mauro and Chris,

On Thu, Jul 14, 2022 at 01:06:08PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Ensure that the TLB of the OA unit is also invalidated
> on gen12 HW, as just invalidating the TLB of an engine is not
> enough.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation
  2022-07-14 12:06 ` [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation Mauro Carvalho Chehab
  2022-07-18 13:39   ` Tvrtko Ursulin
@ 2022-07-22 11:58   ` Andi Shyti
  1 sibling, 0 replies; 52+ messages in thread
From: Andi Shyti @ 2022-07-22 11:58 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Daniel Vetter, Dave Airlie, David Airlie,
	Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin,
	dri-devel, intel-gfx, linux-kernel, stable, Fei Yang, Andi Shyti,
	Thomas Hellström

Hi Mauro,

On Thu, Jul 14, 2022 at 01:06:09PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Don't flush TLBs when the buffer is only used in the GGTT under full
> control of the kernel, as there's no risk of concurrent access
> and stale access from prefetch.
> 
> We only need to invalidate the TLB if they are accessible by the user.
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Please, once you have sorted out Tvrtko's question you can add:

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged
  2022-07-14 12:06 ` [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged Mauro Carvalho Chehab
  2022-07-18 13:45   ` Tvrtko Ursulin
@ 2022-07-22 12:00   ` Andi Shyti
  1 sibling, 0 replies; 52+ messages in thread
From: Andi Shyti @ 2022-07-22 12:00 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Rodrigo Vivi, Tvrtko Ursulin,
	dri-devel, intel-gfx, linux-kernel, stable, Fei Yang,
	Thomas Hellström

Hi Mauro,

On Thu, Jul 14, 2022 at 01:06:10PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Skip all further TLB invalidations once the device is wedged and
> had been reset, as, on such cases, it can no longer process instructions
> on the GPU and the user no longer has access to the TLB's in each engine.
> 
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

I haven't read any concern from Tvrtko here, in any case:

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

thanks,
Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 08/21] drm/i915/gt: Move TLB invalidation to its own file
  2022-07-14 12:06 ` [PATCH v2 08/21] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
@ 2022-07-22 12:07   ` Andi Shyti
  0 siblings, 0 replies; 52+ messages in thread
From: Andi Shyti @ 2022-07-22 12:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Thomas Hellström, Andi Shyti, Casey Bowman,
	Daniel Vetter, Daniele Ceraolo Spurio, David Airlie, Jani Nikula,
	Jason Ekstrand, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld,
	Prathap Kumar Valsan, Rodrigo Vivi, Tomas Winkler,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, Fei Yang

Hi Mauro,

On Thu, Jul 14, 2022 at 01:06:13PM +0100, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Prepare for supporting more TLB invalidation scenarios by moving
> the current MMIO invalidation to its own file.
> 
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Just a copy paste, I checked line by line and it looked all
correct:

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-20 10:54   ` Tvrtko Ursulin
@ 2022-07-27 11:48     ` Mauro Carvalho Chehab
  2022-07-27 12:56       ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 11:48 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Sumit Semwal,
	Chris Wilson, Dave Airlie, Tomas Winkler, Matthew Auld,
	Thomas Hellström, Lucas De Marchi, intel-gfx, linaro-mm-sig,
	Rodrigo Vivi, linux-kernel, stable, linux-media,
	Christian König

On Wed, 20 Jul 2022 11:49:59 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> On 20/07/2022 08:13, Mauro Carvalho Chehab wrote:
> > On Mon, 18 Jul 2022 14:52:05 +0100
> > Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> >   
> >>
> >> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:  
> >>> From: Chris Wilson <chris.p.wilson@intel.com>
> >>>
> >>> Invalidate TLB in patch, in order to reduce performance regressions.  
> >>
> >> "in batches"?  
> > 
> > Yeah. Will fix it.

> > +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> > +{
> > +	/*
> > +	 * Before we release the pages that were bound by this vma, we
> > +	 * must invalidate all the TLBs that may still have a reference
> > +	 * back to our physical address. It only needs to be done once,
> > +	 * so after updating the PTE to point away from the pages, record
> > +	 * the most recent TLB invalidation seqno, and if we have not yet
> > +	 * flushed the TLBs upon release, perform a full invalidation.
> > +	 */
> > +	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));  
> 
> Shouldn't tlb be a pointer for this to make sense?

Oh, my mistake! Will fix at the next version.

> >   
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> >>> index d8b94d638559..2da6c82a8bd2 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> >>> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
> >>>    void ppgtt_unbind_vma(struct i915_address_space *vm,
> >>>    		      struct i915_vma_resource *vma_res)
> >>>    {
> >>> -	if (vma_res->allocated)
> >>> -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> >>> +	if (!vma_res->allocated)
> >>> +		return;
> >>> +
> >>> +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> >>> +	if (vma_res->tlb)
> >>> +		vma_invalidate_tlb(vm, *vma_res->tlb);  
> >>
> >> The patch is about more than batching? If there is a security hole in
> >> this area (unbind) with the current code?  
> > 
> > No, I don't think there's a security hole. The rationale for this is
> > not due to it.  
> 
> In this case obvious question is why are these changes in the patch 
> which declares itself to be about batching invalidations? Because...

Because vma_invalidate_tlb() basically stores a TLB seqno, but the
actual invalidation is deferred to when the pages are unset, at
__i915_gem_object_unset_pages().

So, what happens is:

- on VMA sync mode, the need to invalidate TLB is marked at
  __vma_put_pages(), before VMA unbind;
- on async, this is deferred to happen at ppgtt_unbind_vma(), where
  it marks the need to invalidate TLBs.

On both cases, __i915_gem_object_unset_pages() is called later,
when the driver is ready to unmap the page.

> I am explaining why it looks to me that the patch is doing two things. 
> Implementing batching _and_ adding invalidation points at VMA unbind 
> sites, while so far we had it at backing store release only. Maybe I am 
> wrong and perhaps I am too slow to pick up on the explanation here.
> 
> So if the patch is doing two things please split it up.
> 
> I am further confused by the invalidation call site in evict and in 
> unbind - why there can't be one logical site since the logical sequence 
> is evict -> unbind.

The invalidation happens only on one place: __i915_gem_object_unset_pages().

Despite its name, vma_invalidate_tlb() just marks the need of doing TLB
invalidation.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-27 11:48     ` [Intel-gfx] " Mauro Carvalho Chehab
@ 2022-07-27 12:56       ` Tvrtko Ursulin
  2022-07-28  6:32         ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-27 12:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, David Airlie, dri-devel, Sumit Semwal,
	Chris Wilson, Dave Airlie, Tomas Winkler, Matthew Auld,
	Thomas Hellström, Lucas De Marchi, intel-gfx, linaro-mm-sig,
	Rodrigo Vivi, linux-kernel, stable, linux-media,
	Christian König


On 27/07/2022 12:48, Mauro Carvalho Chehab wrote:
> On Wed, 20 Jul 2022 11:49:59 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>> On 20/07/2022 08:13, Mauro Carvalho Chehab wrote:
>>> On Mon, 18 Jul 2022 14:52:05 +0100
>>> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
>>>    
>>>>
>>>> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
>>>>> From: Chris Wilson <chris.p.wilson@intel.com>
>>>>>
>>>>> Invalidate TLB in patch, in order to reduce performance regressions.
>>>>
>>>> "in batches"?
>>>
>>> Yeah. Will fix it.
> 
>>> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
>>> +{
>>> +	/*
>>> +	 * Before we release the pages that were bound by this vma, we
>>> +	 * must invalidate all the TLBs that may still have a reference
>>> +	 * back to our physical address. It only needs to be done once,
>>> +	 * so after updating the PTE to point away from the pages, record
>>> +	 * the most recent TLB invalidation seqno, and if we have not yet
>>> +	 * flushed the TLBs upon release, perform a full invalidation.
>>> +	 */
>>> +	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
>>
>> Shouldn't tlb be a pointer for this to make sense?
> 
> Oh, my mistake! Will fix at the next version.
> 
>>>    
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>>>> index d8b94d638559..2da6c82a8bd2 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>>>> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>>>>>     void ppgtt_unbind_vma(struct i915_address_space *vm,
>>>>>     		      struct i915_vma_resource *vma_res)
>>>>>     {
>>>>> -	if (vma_res->allocated)
>>>>> -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
>>>>> +	if (!vma_res->allocated)
>>>>> +		return;
>>>>> +
>>>>> +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
>>>>> +	if (vma_res->tlb)
>>>>> +		vma_invalidate_tlb(vm, *vma_res->tlb);
>>>>
>>>> The patch is about more than batching? If there is a security hole in
>>>> this area (unbind) with the current code?
>>>
>>> No, I don't think there's a security hole. The rationale for this is
>>> not due to it.
>>
>> In this case obvious question is why are these changes in the patch
>> which declares itself to be about batching invalidations? Because...
> 
> Because vma_invalidate_tlb() basically stores a TLB seqno, but the
> actual invalidation is deferred to when the pages are unset, at
> __i915_gem_object_unset_pages().
> 
> So, what happens is:
> 
> - on VMA sync mode, the need to invalidate TLB is marked at
>    __vma_put_pages(), before VMA unbind;
> - on async, this is deferred to happen at ppgtt_unbind_vma(), where
>    it marks the need to invalidate TLBs.
> 
> On both cases, __i915_gem_object_unset_pages() is called later,
> when the driver is ready to unmap the page.

Sorry still not clear to me why is the patch moving marking of the need 
to invalidate (regardless if it a bit like today, or a seqno like in 
this patch) from bind to unbind?

What if the seqno was stored in i915_vma_bind, where the bit is set 
today, and all the hunks which touch the unbind and evict would 
disappear from the patch. What wouldn't work in that case, if anything?

Regards,

Tvrtko

> 
>> I am explaining why it looks to me that the patch is doing two things.
>> Implementing batching _and_ adding invalidation points at VMA unbind
>> sites, while so far we had it at backing store release only. Maybe I am
>> wrong and perhaps I am too slow to pick up on the explanation here.
>>
>> So if the patch is doing two things please split it up.
>>
>> I am further confused by the invalidation call site in evict and in
>> unbind - why there can't be one logical site since the logical sequence
>> is evict -> unbind.
> 
> The invalidation happens only on one place: __i915_gem_object_unset_pages().
> 
> Despite its name, vma_invalidate_tlb() just marks the need of doing TLB
> invalidation.
> 
> Regards,
> Mauro

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-27 12:56       ` Tvrtko Ursulin
@ 2022-07-28  6:32         ` Mauro Carvalho Chehab
  2022-07-28  7:26           ` Mauro Carvalho Chehab
  2022-07-28 10:11           ` Tvrtko Ursulin
  0 siblings, 2 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-28  6:32 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: stable, Thomas Hellström, linux-media, David Airlie,
	intel-gfx, Lucas De Marchi, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, Chris Wilson, Rodrigo Vivi,
	Dave Airlie, Tomas Winkler, Mauro Carvalho Chehab, Sumit Semwal,
	Matthew Auld

On Wed, 27 Jul 2022 13:56:50 +0100
Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:

> > Because vma_invalidate_tlb() basically stores a TLB seqno, but the
> > actual invalidation is deferred to when the pages are unset, at
> > __i915_gem_object_unset_pages().
> > 
> > So, what happens is:
> > 
> > - on VMA sync mode, the need to invalidate TLB is marked at
> >    __vma_put_pages(), before VMA unbind;
> > - on async, this is deferred to happen at ppgtt_unbind_vma(), where
> >    it marks the need to invalidate TLBs.
> > 
> > On both cases, __i915_gem_object_unset_pages() is called later,
> > when the driver is ready to unmap the page.  
> 
> Sorry still not clear to me why is the patch moving marking of the need 
> to invalidate (regardless if it a bit like today, or a seqno like in 
> this patch) from bind to unbind?
> 
> What if the seqno was stored in i915_vma_bind, where the bit is set 
> today, and all the hunks which touch the unbind and evict would 
> disappear from the patch. What wouldn't work in that case, if anything?

Ah, now I see your point.

I can't see any sense on having a sequence number at VMA bind, as the
unbind order can be different. The need of doing a full TLB invalidation
or not depends on the unbind order.

The way the current algorithm works is that drm_i915_gem_object can be
created on any order, and, at unbind/evict, they receive a seqno.

The seqno is incremented at intel_gt_invalidate_tlb():

    void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
    {
	with_intel_gt_pm_if_awake(gt, wakeref) {
		mutex_lock(&gt->tlb.invalidate_lock);
		if (tlb_seqno_passed(gt, seqno))
				goto unlock;

		mmio_invalidate_full(gt);

		write_seqcount_invalidate(&gt->tlb.seqno);	// increment seqno
		

So, let's say 3 objects were created, on this order:

	obj1
	obj2
	obj3

They would be unbind/evict on a different order. On that time, 
the mm.tlb will be stamped with a seqno, using the number from the
last TLB flush, plus 1.

As different threads can be used to handle TLB flushes, let's imagine
two threads (just for the sake of having an example). On such case,
what we would have is:

seqno		Thread 0			Thread 1

seqno=2		unbind/evict event
		obj3.mm.tlb = seqno | 1
seqno=2		unbind/evict event
		obj1.mm.tlb = seqno | 1
						__i915_gem_object_unset_pages() 
						called for obj3, TLB flush happened,
						invalidating both obj1 and obj2.
						seqno += 2					
seqno=4		unbind/evict event
		obj1.mm.tlb = seqno | 1
						__i915_gem_object_unset_pages()
						called for obj1, don't flush.
...
						__i915_gem_object_unset_pages() called for obj2, TLB flush happened
						seqno += 2
seqno=6

So, basically the seqno is used to track when the object data stopped
being updated, because of an unbind/evict event, being later used by
intel_gt_invalidate_tlb() when called from __i915_gem_object_unset_pages(),
in order to check if a previous invalidation call was enough to invalidate
the object, or if a new call is needed.

Now, if seqno is stored at bind, data can still leak, as the assumption
made by intel_gt_invalidate_tlb() that the data stopped being used at
seqno is not true anymore.

Still, I agree that this logic is complex and should be better 
documented. So, if you're now OK with this patch, I'll add the above
explanation inside a kernel-doc comment.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-28  6:32         ` Mauro Carvalho Chehab
@ 2022-07-28  7:26           ` Mauro Carvalho Chehab
  2022-07-28 10:11           ` Tvrtko Ursulin
  1 sibling, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-28  7:26 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: stable, Thomas Hellström, linux-media, David Airlie,
	intel-gfx, Lucas De Marchi, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, Chris Wilson, Rodrigo Vivi,
	Dave Airlie, Tomas Winkler, Mauro Carvalho Chehab, Sumit Semwal,
	Matthew Auld

On Thu, 28 Jul 2022 08:32:32 +0200
Mauro Carvalho Chehab <mauro.chehab@linux.intel.com> wrote:

> On Wed, 27 Jul 2022 13:56:50 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
> > > Because vma_invalidate_tlb() basically stores a TLB seqno, but the
> > > actual invalidation is deferred to when the pages are unset, at
> > > __i915_gem_object_unset_pages().
> > > 
> > > So, what happens is:
> > > 
> > > - on VMA sync mode, the need to invalidate TLB is marked at
> > >    __vma_put_pages(), before VMA unbind;
> > > - on async, this is deferred to happen at ppgtt_unbind_vma(), where
> > >    it marks the need to invalidate TLBs.
> > > 
> > > On both cases, __i915_gem_object_unset_pages() is called later,
> > > when the driver is ready to unmap the page.    
> > 
> > Sorry still not clear to me why is the patch moving marking of the need 
> > to invalidate (regardless if it a bit like today, or a seqno like in 
> > this patch) from bind to unbind?
> > 
> > What if the seqno was stored in i915_vma_bind, where the bit is set 
> > today, and all the hunks which touch the unbind and evict would 
> > disappear from the patch. What wouldn't work in that case, if anything?  
> 
> Ah, now I see your point.
> 
> I can't see any sense on having a sequence number at VMA bind, as the
> unbind order can be different. The need of doing a full TLB invalidation
> or not depends on the unbind order.
> 
> The way the current algorithm works is that drm_i915_gem_object can be
> created on any order, and, at unbind/evict, they receive a seqno.
> 
> The seqno is incremented at intel_gt_invalidate_tlb():
> 
>     void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
>     {
> 	with_intel_gt_pm_if_awake(gt, wakeref) {
> 		mutex_lock(&gt->tlb.invalidate_lock);
> 		if (tlb_seqno_passed(gt, seqno))
> 				goto unlock;
> 
> 		mmio_invalidate_full(gt);
> 
> 		write_seqcount_invalidate(&gt->tlb.seqno);	// increment seqno
> 		
> 
> So, let's say 3 objects were created, on this order:
> 
> 	obj1
> 	obj2
> 	obj3
> 
> They would be unbind/evict on a different order. On that time, 
> the mm.tlb will be stamped with a seqno, using the number from the
> last TLB flush, plus 1.
> 
> As different threads can be used to handle TLB flushes, let's imagine
> two threads (just for the sake of having an example). On such case,
> what we would have is:
> 
> seqno		Thread 0			Thread 1
> 
> seqno=2		unbind/evict event
> 		obj3.mm.tlb = seqno | 1
> seqno=2		unbind/evict event
> 		obj1.mm.tlb = seqno | 1
> 						__i915_gem_object_unset_pages() 
> 						called for obj3, TLB flush happened,
> 						invalidating both obj1 and obj2.
> 						seqno += 2					
> seqno=4		unbind/evict event
> 		obj1.mm.tlb = seqno | 1

cut-and-paste typo. it should be, instead:

 		obj2.mm.tlb = seqno | 1


> 						__i915_gem_object_unset_pages()
> 						called for obj1, don't flush.
> ...
> 						__i915_gem_object_unset_pages() called for obj2, TLB flush happened
> 						seqno += 2
> seqno=6
> 
> So, basically the seqno is used to track when the object data stopped
> being updated, because of an unbind/evict event, being later used by
> intel_gt_invalidate_tlb() when called from __i915_gem_object_unset_pages(),
> in order to check if a previous invalidation call was enough to invalidate
> the object, or if a new call is needed.
> 
> Now, if seqno is stored at bind, data can still leak, as the assumption
> made by intel_gt_invalidate_tlb() that the data stopped being used at
> seqno is not true anymore.
> 
> Still, I agree that this logic is complex and should be better 
> documented. So, if you're now OK with this patch, I'll add the above
> explanation inside a kernel-doc comment.

I'm enclosing the kernel-doc patch (to be applied after moving the code into
its own files: intel_tlb.c/intel_tlb.h):

[PATCH] drm/i915/gt: document TLB cache invalidation functions

Add a description for the TLB cache invalidation algorithm and for
the related kAPI functions.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.c b/drivers/gpu/drm/i915/gt/intel_tlb.c
index af8cae979489..8eda0743da74 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.c
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.c
@@ -145,6 +145,18 @@ static void mmio_invalidate_full(struct intel_gt *gt)
 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
 }
 
+/**
+ * intel_gt_invalidate_tlb_full - do full TLB cache invalidation
+ * @gt: GT structure
+ * @seqno: sequence number
+ *
+ * Do a full TLB cache invalidation if the @seqno is bigger than the last
+ * full TLB cache invalidation.
+ *
+ * Note:
+ * The TLB cache invalidation logic depends on GEN-specific registers.
+ * It currently supports GEN8 to GEN12 and GuC-based TLB cache invalidation.
+ */
 void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno)
 {
 	intel_wakeref_t wakeref;
@@ -177,6 +189,12 @@ void intel_gt_init_tlb(struct intel_gt *gt)
 	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
 }
 
+/**
+ * intel_gt_fini_tlb - initialize TLB-specific vars
+ * @gt: GT structure
+ *
+ * Frees any resources needed by TLB cache invalidation logic.
+ */
 void intel_gt_fini_tlb(struct intel_gt *gt)
 {
 	mutex_destroy(&gt->tlb.invalidate_lock);
diff --git a/drivers/gpu/drm/i915/gt/intel_tlb.h b/drivers/gpu/drm/i915/gt/intel_tlb.h
index 46ce25bf5afe..d186f5d5901f 100644
--- a/drivers/gpu/drm/i915/gt/intel_tlb.h
+++ b/drivers/gpu/drm/i915/gt/intel_tlb.h
@@ -11,16 +11,99 @@
 
 #include "intel_gt_types.h"
 
+/**
+ * DOC: TLB cache invalidation logic
+ *
+ * The way the current algorithm works is that drm_i915_gem_object can be
+ * created on any order. At unbind/evict time, the object is warranted that
+ * it won't be used anymore. So, they store a sequence number provided by
+ * intel_gt_next_invalidate_tlb_full().This can happen either at
+ * __vma_put_pages(), for VMA sync unbind, or at ppgtt_unbind_vma(), for
+ * VMA async VMA bind.
+ *
+ * At __i915_gem_object_unset_pages(), intel_gt_invalidate_tlb() is called,
+ * where it checks if the sequence number of the object was already invalidated
+ * or not. If not, it increments it::
+ *
+ *   void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
+ *   {
+ *   ...
+ * 	with_intel_gt_pm_if_awake(gt, wakeref) {
+ * 		mutex_lock(&gt->tlb.invalidate_lock);
+ * 		if (tlb_seqno_passed(gt, seqno))
+ * 				goto unlock;
+ *
+ * 		mmio_invalidate_full(gt);
+ *
+ * 		write_seqcount_invalidate(&gt->tlb.seqno); // increment seqno
+ *    ...
+ *
+ * So, let's say the current seqno is 2 and 3 new objects were created,
+ * on this order:
+ *
+ * 	obj1
+ * 	obj2
+ * 	obj3
+ *
+ * They can be unbind/evict on a different order. At unbind/evict time,
+ * the mm.tlb will be stamped with the sequence number, using the number
+ * from the last TLB flush, plus 1.
+ *
+ * Different threads may be used on unbind/evict and/or unset pages.
+ *
+ * As the logic at void intel_gt_invalidate_tlb() is protected by a mutex,
+ * for simplicity, let's consider just two threads::
+ *
+ *   sequence number	Thread 0		Thread 1
+ *
+ *   seqno=2
+ *			unbind/evict event
+ * 			obj3.mm.tlb = seqno | 1
+ *
+ *			unbind/evict event
+ * 			obj1.mm.tlb = seqno | 1
+ * 						__i915_gem_object_unset_pages()
+ * 						called for obj3 => TLB flush
+ * 						invalidating both obj1 and obj2.
+ * 						seqno += 2
+ *   seqno=4
+ *			unbind/evict event
+ * 			obj2.mm.tlb = seqno | 1
+ * 						__i915_gem_object_unset_pages()
+ * 						called for obj1, don't flush,
+ *						as past flush invalidated obj1
+ *
+ * 						__i915_gem_object_unset_pages()
+ *						called for obj2 => TLB flush
+ * 						seqno += 2
+ *   seqno=6
+ */
+
 void intel_gt_invalidate_tlb_full(struct intel_gt *gt, u32 seqno);
 
 void intel_gt_init_tlb(struct intel_gt *gt);
 void intel_gt_fini_tlb(struct intel_gt *gt);
 
+/**
+ * intel_gt_tlb_seqno - Returns the current TLB invlidation sequence number
+ *
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
 static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
 {
 	return seqprop_sequence(&gt->tlb.seqno);
 }
 
+/**
+ * intel_gt_next_invalidate_tlb_full - Returns the next TLB full invalidation
+ *	sequence number
+ *
+ * @gt: GT structure
+ *
+ * There's no need to lock while calling it, as seqprop_sequence is thread-safe
+ */
 static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
 {
 	return intel_gt_tlb_seqno(gt) | 1;


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations
  2022-07-28  6:32         ` Mauro Carvalho Chehab
  2022-07-28  7:26           ` Mauro Carvalho Chehab
@ 2022-07-28 10:11           ` Tvrtko Ursulin
  1 sibling, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2022-07-28 10:11 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: stable, Thomas Hellström, linux-media, David Airlie,
	intel-gfx, Lucas De Marchi, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, Chris Wilson, Rodrigo Vivi,
	Dave Airlie, Tomas Winkler, Mauro Carvalho Chehab, Sumit Semwal,
	Matthew Auld


On 28/07/2022 07:32, Mauro Carvalho Chehab wrote:
> On Wed, 27 Jul 2022 13:56:50 +0100
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> wrote:
> 
>>> Because vma_invalidate_tlb() basically stores a TLB seqno, but the
>>> actual invalidation is deferred to when the pages are unset, at
>>> __i915_gem_object_unset_pages().
>>>
>>> So, what happens is:
>>>
>>> - on VMA sync mode, the need to invalidate TLB is marked at
>>>     __vma_put_pages(), before VMA unbind;
>>> - on async, this is deferred to happen at ppgtt_unbind_vma(), where
>>>     it marks the need to invalidate TLBs.
>>>
>>> On both cases, __i915_gem_object_unset_pages() is called later,
>>> when the driver is ready to unmap the page.
>>
>> Sorry still not clear to me why is the patch moving marking of the need
>> to invalidate (regardless if it a bit like today, or a seqno like in
>> this patch) from bind to unbind?
>>
>> What if the seqno was stored in i915_vma_bind, where the bit is set
>> today, and all the hunks which touch the unbind and evict would
>> disappear from the patch. What wouldn't work in that case, if anything?
> 
> Ah, now I see your point.
> 
> I can't see any sense on having a sequence number at VMA bind, as the
> unbind order can be different. The need of doing a full TLB invalidation
> or not depends on the unbind order.

Sorry yes that was stupid from me.. What I was really thinking was the 
approach I initially used for coalescing. Keeping the set_bit in bind 
and then once the code enters intel_gt_invalidate_tlbs, takes a "ticket" 
and waits on the mutex. Once it gets the mutex checks the ticket against 
the GT copy and if two invalidations have passed since it was waiting on 
the mutex it can immediately exit. That would seem like a minimal 
improvement to batch things up.

But I guess it would still emit needless invalidations if there is no 
contention, just a stream of serialized put pages. While the approach 
from this patch can skip all but truly required.

Okay, go for it and thanks for the explanations.

Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

P.S. The last remaining "ugliness" is the 2nd call to invalidation from 
evict. It would be nicer if there was a single common place to do it on 
vma unbind but okay, I do not plan to dig into it so fine.

> 
> The way the current algorithm works is that drm_i915_gem_object can be
> created on any order, and, at unbind/evict, they receive a seqno.
> 
> The seqno is incremented at intel_gt_invalidate_tlb():
> 
>      void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
>      {
> 	with_intel_gt_pm_if_awake(gt, wakeref) {
> 		mutex_lock(&gt->tlb.invalidate_lock);
> 		if (tlb_seqno_passed(gt, seqno))
> 				goto unlock;
> 
> 		mmio_invalidate_full(gt);
> 
> 		write_seqcount_invalidate(&gt->tlb.seqno);	// increment seqno
> 		
> 
> So, let's say 3 objects were created, on this order:
> 
> 	obj1
> 	obj2
> 	obj3
> 
> They would be unbind/evict on a different order. On that time,
> the mm.tlb will be stamped with a seqno, using the number from the
> last TLB flush, plus 1.
> 
> As different threads can be used to handle TLB flushes, let's imagine
> two threads (just for the sake of having an example). On such case,
> what we would have is:
> 
> seqno		Thread 0			Thread 1
> 
> seqno=2		unbind/evict event
> 		obj3.mm.tlb = seqno | 1
> seqno=2		unbind/evict event
> 		obj1.mm.tlb = seqno | 1
> 						__i915_gem_object_unset_pages()
> 						called for obj3, TLB flush happened,
> 						invalidating both obj1 and obj2.
> 						seqno += 2					
> seqno=4		unbind/evict event
> 		obj1.mm.tlb = seqno | 1
> 						__i915_gem_object_unset_pages()
> 						called for obj1, don't flush.
> ...
> 						__i915_gem_object_unset_pages() called for obj2, TLB flush happened
> 						seqno += 2
> seqno=6
> 
> So, basically the seqno is used to track when the object data stopped
> being updated, because of an unbind/evict event, being later used by
> intel_gt_invalidate_tlb() when called from __i915_gem_object_unset_pages(),
> in order to check if a previous invalidation call was enough to invalidate
> the object, or if a new call is needed.
> 
> Now, if seqno is stored at bind, data can still leak, as the assumption
> made by intel_gt_invalidate_tlb() that the data stopped being used at
> seqno is not true anymore.
> 
> Still, I agree that this logic is complex and should be better
> documented. So, if you're now OK with this patch, I'll add the above
> explanation inside a kernel-doc comment.
> 
> Regards,
> Mauro

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH v2 09/21] drm/i915/guc: Define CTB based TLB invalidation routines
  2022-07-14 14:06   ` Michal Wajdeczko
@ 2022-08-02  7:48     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 52+ messages in thread
From: Mauro Carvalho Chehab @ 2022-08-02  7:48 UTC (permalink / raw)
  To: Michal Wajdeczko
  Cc: Mauro Carvalho Chehab, Chris Wilson, Alan Previn, David Airlie,
	dri-devel, Lucas De Marchi, linux-kernel, Rodrigo Vivi,
	Borislav Petkov, intel-gfx

On Thu, 14 Jul 2022 16:06:28 +0200
Michal Wajdeczko <michal.wajdeczko@intel.com> wrote:

> On 14.07.2022 14:06, Mauro Carvalho Chehab wrote:
> > From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> > 
> > Add routines to interface with GuC firmware for TLB invalidation.
> > 
> > Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
> > Cc: Bruce Chang <yu.bruce.chang@intel.com>
> > Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Chris Wilson <chris.p.wilson@intel.com>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> > ---
> > 
> > To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> > See [PATCH v2 00/21] at: https://lore.kernel.org/all/cover.1657800199.git.mchehab@kernel.org/
> > 
> >  .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 35 +++++++
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 90 ++++++++++++++++++
> >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        | 13 +++
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 24 ++++-
> >  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  6 ++
> >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 ++++++++++++++++++-
> >  6 files changed, 253 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> > index 4ef9990ed7f8..2e39d8df4c82 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
> > @@ -134,6 +134,10 @@ enum intel_guc_action {
> >  	INTEL_GUC_ACTION_REGISTER_CONTEXT_MULTI_LRC = 0x4601,
> >  	INTEL_GUC_ACTION_CLIENT_SOFT_RESET = 0x5507,
> >  	INTEL_GUC_ACTION_SET_ENG_UTIL_BUFF = 0x550A,
> > +	INTEL_GUC_ACTION_NOTIFY_MEMORY_CAT_ERROR = 0x6000,  
> 
> should this be part of this patch ?

No, I'll drop...
> 
> > +	INTEL_GUC_ACTION_PAGE_FAULT_NOTIFICATION = 0x6001,

... and also drop this one.

> > +	INTEL_GUC_ACTION_TLB_INVALIDATION = 0x7000,
> > +	INTEL_GUC_ACTION_TLB_INVALIDATION_DONE = 0x7001,  
> 
> can we document layout of these actions ?

Where should we document it? At intel_guc_invalidate_tlb_guc()
function & friends, or are you thinking on something else, at this
header file?

> 
> >  	INTEL_GUC_ACTION_STATE_CAPTURE_NOTIFICATION = 0x8002,
> >  	INTEL_GUC_ACTION_NOTIFY_FLUSH_LOG_BUFFER_TO_FILE = 0x8003,
> >  	INTEL_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED = 0x8004,
> > @@ -177,4 +181,35 @@ enum intel_guc_state_capture_event_status {
> >  
> >  #define INTEL_GUC_STATE_CAPTURE_EVENT_STATUS_MASK      0x000000FF
> >  
> > +#define INTEL_GUC_TLB_INVAL_TYPE_SHIFT 0
> > +#define INTEL_GUC_TLB_INVAL_MODE_SHIFT 8  
> 
> can we stop using SHIFT-based definitions and start using MASK-based
> instead ? then we will be able to use FIELD_PREP/GET like we do for i915_reg

Ok.

> 
> > +/* Flush PPC or SMRO caches along with TLB invalidation request */
> > +#define INTEL_GUC_TLB_INVAL_FLUSH_CACHE (1 << 31)
> > +
> > +enum intel_guc_tlb_invalidation_type {
> > +	INTEL_GUC_TLB_INVAL_GUC = 0x3,
> > +};
> > +
> > +/*
> > + * 0: Heavy mode of Invalidation:
> > + * The pipeline of the engine(s) for which the invalidation is targeted to is
> > + * blocked, and all the in-flight transactions are guaranteed to be Globally
> > + * Observed before completing the TLB invalidation
> > + * 1: Lite mode of Invalidation:
> > + * TLBs of the targeted engine(s) are immediately invalidated.
> > + * In-flight transactions are NOT guaranteed to be Globally Observed before
> > + * completing TLB invalidation.
> > + * Light Invalidation Mode is to be used only when
> > + * it can be guaranteed (by SW) that the address translations remain invariant
> > + * for the in-flight transactions across the TLB invalidation. In other words,
> > + * this mode can be used when the TLB invalidation is intended to clear out the
> > + * stale cached translations that are no longer in use. Light Invalidation Mode
> > + * is much faster than the Heavy Invalidation Mode, as it does not wait for the
> > + * in-flight transactions to be GOd.
> > + */  
> 
> either drop this comment or squash with patch 10/21 to fix it

Ok.

> 
> > +enum intel_guc_tlb_inval_mode {
> > +	INTEL_GUC_TLB_INVAL_MODE_HEAVY = 0x0,
> > +	INTEL_GUC_TLB_INVAL_MODE_LITE = 0x1,
> > +};
> > +
> >  #endif /* _ABI_GUC_ACTIONS_ABI_H */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > index 2706a8c65090..5c59f9b144a3 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > @@ -855,6 +855,96 @@ int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value)
> >  	return __guc_self_cfg(guc, key, 2, value);
> >  }
> >  
> > +static int guc_send_invalidate_tlb(struct intel_guc *guc, u32 *action, u32 size)  
> 
> nit: maybe since MMIO TLB has moved to dedicated file, we can do the
> same with GUC TLB code like "intel_guc_tlb.c" ?

I'll add a patch at the end of this series moving the code.

> > +{
> > +	struct intel_guc_tlb_wait _wq, *wq = &_wq;
> > +	DEFINE_WAIT_FUNC(wait, woken_wake_function);
> > +	int err = 0;
> > +	u32 seqno;
> > +
> > +	init_waitqueue_head(&_wq.wq);
> > +
> > +	if (xa_alloc_cyclic_irq(&guc->tlb_lookup, &seqno, wq,
> > +				xa_limit_32b, &guc->next_seqno,
> > +				GFP_ATOMIC | __GFP_NOWARN) < 0) {
> > +		/* Under severe memory pressure? Serialise TLB allocations */
> > +		xa_lock_irq(&guc->tlb_lookup);
> > +		wq = xa_load(&guc->tlb_lookup, guc->serial_slot);
> > +		wait_event_lock_irq(wq->wq,
> > +				    !READ_ONCE(wq->status),
> > +				    guc->tlb_lookup.xa_lock);
> > +		/*
> > +		 * Update wq->status under lock to ensure only one waiter can
> > +		 * issue the tlb invalidation command using the serial slot at a
> > +		 * time. The condition is set to false before releasing the lock
> > +		 * so that other caller continue to wait until woken up again.
> > +		 */
> > +		wq->status = 1;
> > +		xa_unlock_irq(&guc->tlb_lookup);
> > +
> > +		seqno = guc->serial_slot;
> > +	}
> > +
> > +	action[1] = seqno;  
> 
> it's sad that we need to update in blind this action message
> 
> if you don't want to expose seqno allocation in a helper function that
> each caller would use, then maybe assert that this action message is
> expected one

I'll encapsulate the code that allocates the seqno on a new helper function:

	static int intel_guc_alloc_tlb_seqno(struct intel_guc *guc,
	                                    struct intel_guc_tlb_wait *wq)
	{
	       u32 seqno;
	
	       if (xa_alloc_cyclic_irq(&guc->tlb_lookup, &seqno, wq,
	                               xa_limit_32b, &guc->next_seqno,
        	                       GFP_ATOMIC | __GFP_NOWARN) >= 0)
	               return seqno;

		/* Under severe memory pressure? Serialise TLB allocations */

	       xa_lock_irq(&guc->tlb_lookup);
	       wq = xa_load(&guc->tlb_lookup, guc->serial_slot);
	       wait_event_lock_irq(wq->wq,
	                           !READ_ONCE(wq->status),
        	                   guc->tlb_lookup.xa_lock);
	       /*
	        * Update wq->status under lock to ensure only one waiter can
	        * issue the TLB invalidation command using the serial slot at a
	        * time. The condition is set to false before releasing the lock
	        * so that other caller continue to wait until woken up again.
	        */
	       wq->status = 1;
	       xa_unlock_irq(&guc->tlb_lookup);
	
	       return guc->serial_slot;
	}

This should improve the readability of the invalidate function:

	static int guc_send_invalidate_tlb(struct intel_guc *guc, u32 *action, u32 size)
	{
		struct intel_guc_tlb_wait _wq, *wq = &_wq;
		DEFINE_WAIT_FUNC(wait, woken_wake_function);
		int err = 0;
		u32 seqno;

		init_waitqueue_head(&_wq.wq);

		seqno = intel_guc_alloc_tlb_seqno(guc, wq);
		action[1] = seqno;
	...

> > +
> > +	add_wait_queue(&wq->wq, &wait);
> > +
> > +	err = intel_guc_send_busy_loop(guc, action, size, G2H_LEN_DW_INVALIDATE_TLB, true);
> > +	if (err) {
> > +		/*
> > +		 * XXX: Failure of tlb invalidation is critical and would  
> 
> s/tlb/TLB
> 
> > +		 * warrant a gt reset.
> > +		 */
> > +		goto out;
> > +	}
> > +/*
> > + * GuC has a timeout of 1ms for a tlb invalidation response from GAM. On a  
> 
> ditto
> 
> > + * timeout GuC drops the request and has no mechanism to notify the host about
> > + * the timeout. So keep a larger timeout that accounts for this individual
> > + * timeout and max number of outstanding invalidation requests that can be
> > + * queued in CT buffer.
> > + */
> > +#define OUTSTANDING_GUC_TIMEOUT_PERIOD  (HZ)
> > +	if (!wait_woken(&wait, TASK_UNINTERRUPTIBLE,  
> 
> IIRC there was some discussion if we can rely on this in our scenario
> can you sync with Chris on that?

I'll check.

> 
> > +			OUTSTANDING_GUC_TIMEOUT_PERIOD)) {
> > +		/*
> > +		 * XXX: Failure of tlb invalidation is critical and would  
> 
> s/tlb/TLB
> 
> > +		 * warrant a gt reset.
> > +		 */
> > +		drm_err(&guc_to_gt(guc)->i915->drm,
> > +			 "tlb invalidation response timed out for seqno %u\n", seqno);  
> 
> s/tlb/TLB
> 
> btw, should we care here about G2H_LEN_DW_INVALIDATE_TLB space that we
> reserved in send_busy_loop() ?

Good question. The logic at intel_guc_tlb_invalidation_done() has
something to handle timeouts:

	/* We received a response after the waiting task did exit with a timeout */
	if (unlikely(!wait))
		drm_err(&guc_to_gt(guc)->i915->drm,
			"Stale TLB invalidation response with seqno %d\n", seqno);

It sounds to me that this is already covered there.

> 
> > +		err = -ETIME;
> > +	}
> > +out:
> > +	remove_wait_queue(&wq->wq, &wait);
> > +	if (seqno != guc->serial_slot)
> > +		xa_erase_irq(&guc->tlb_lookup, seqno);
> > +
> > +	return err;
> > +}
> > +
> > +/*
> > + * Guc TLB Invalidation: Invalidate the TLB's of GuC itself.
> > + */
> > +int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
> > +				 enum intel_guc_tlb_inval_mode mode)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_TLB_INVALIDATION,
> > +		0,
> > +		INTEL_GUC_TLB_INVAL_GUC << INTEL_GUC_TLB_INVAL_TYPE_SHIFT |
> > +			mode << INTEL_GUC_TLB_INVAL_MODE_SHIFT |
> > +			INTEL_GUC_TLB_INVAL_FLUSH_CACHE,
> > +	};
> > +
> > +	if (!INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc)) {
> > +		DRM_ERROR("Tlb invalidation: Operation not supported in this platform!\n");  
> 
> you should use drm_err() instead

Ok.

> but wondering if maybe this should be treated as a coding error (and
> then use GEM_BUG/WARN_ON instead) but then not sure how to interpret the
> check for the intel_guc_ct_enabled() embedded in above macro ...
> note that intel_guc_ct_send() will return -ENODEV if CTB is down

Good point. Still, I don't think that the driver should crash with
GEM_BUG/WARN_ON() if TLB invalidation is not available, as this may
cause regressions.

I mean, assuming that someone is currently using the driver on a
firmware that doesn't support such actions, after this patch, the
driver will stop working for him.

So, I think the right thing to do is to just report it as an
error.

> > +		return 0;
> > +	}
> > +
> > +	return guc_send_invalidate_tlb(guc, action, ARRAY_SIZE(action));
> > +}
> > +
> >  /**
> >   * intel_guc_load_status - dump information about GuC load status
> >   * @guc: the GuC
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index d0d99f178f2d..f82a121b0838 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -77,6 +77,10 @@ struct intel_guc {
> >  	atomic_t outstanding_submission_g2h;
> >  
> >  	/** @interrupts: pointers to GuC interrupt-managing functions. */
> > +	struct xarray tlb_lookup;
> > +	u32 serial_slot;
> > +	u32 next_seqno;  
> 
> wrong place - above kernel-doc is for the struct below

Ok.

> > +
> >  	struct {
> >  		void (*reset)(struct intel_guc *guc);
> >  		void (*enable)(struct intel_guc *guc);
> > @@ -248,6 +252,11 @@ struct intel_guc {
> >  #endif
> >  };
> >  
> > +struct intel_guc_tlb_wait {
> > +	struct wait_queue_head wq;
> > +	u8 status;
> > +} __aligned(4);
> > +
> >  static inline struct intel_guc *log_to_guc(struct intel_guc_log *log)
> >  {
> >  	return container_of(log, struct intel_guc, log);
> > @@ -363,6 +372,9 @@ int intel_guc_allocate_and_map_vma(struct intel_guc *guc, u32 size,
> >  int intel_guc_self_cfg32(struct intel_guc *guc, u16 key, u32 value);
> >  int intel_guc_self_cfg64(struct intel_guc *guc, u16 key, u64 value);
> >  
> > +int intel_guc_invalidate_tlb_guc(struct intel_guc *guc,
> > +				 enum intel_guc_tlb_inval_mode mode);
> > +
> >  static inline bool intel_guc_is_supported(struct intel_guc *guc)
> >  {
> >  	return intel_uc_fw_is_supported(&guc->fw);
> > @@ -440,6 +452,7 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
> >  					 const u32 *msg, u32 len);
> >  int intel_guc_error_capture_process_msg(struct intel_guc *guc,
> >  					const u32 *msg, u32 len);
> > +void intel_guc_tlb_invalidation_done(struct intel_guc *guc, u32 seqno);
> >  
> >  struct intel_engine_cs *
> >  intel_guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index f01325cd1b62..c1ce542b7855 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -1023,7 +1023,7 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >  	return 0;
> >  }
> >  
> > -static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
> > +static bool ct_process_incoming_requests(struct intel_guc_ct *ct, struct list_head *incoming)
> >  {
> >  	unsigned long flags;
> >  	struct ct_incoming_msg *request;
> > @@ -1031,11 +1031,11 @@ static bool ct_process_incoming_requests(struct intel_guc_ct *ct)
> >  	int err;
> >  
> >  	spin_lock_irqsave(&ct->requests.lock, flags);
> > -	request = list_first_entry_or_null(&ct->requests.incoming,
> > +	request = list_first_entry_or_null(incoming,
> >  					   struct ct_incoming_msg, link);
> >  	if (request)
> >  		list_del(&request->link);
> > -	done = !!list_empty(&ct->requests.incoming);
> > +	done = !!list_empty(incoming);
> >  	spin_unlock_irqrestore(&ct->requests.lock, flags);
> >  
> >  	if (!request)
> > @@ -1058,7 +1058,7 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
> >  	bool done;
> >  
> >  	do {
> > -		done = ct_process_incoming_requests(ct);
> > +		done = ct_process_incoming_requests(ct, &ct->requests.incoming);
> >  	} while (!done);
> >  }
> >  
> > @@ -1078,14 +1078,30 @@ static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *requ
> >  	switch (action) {
> >  	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
> >  	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> > +	case INTEL_GUC_ACTION_TLB_INVALIDATION_DONE:
> >  		g2h_release_space(ct, request->size);
> >  	}
> > +	/* Handle tlb invalidation response in interrupt context */  
> 
> since it breaks layering, can you add more comments why this is done in
> such way ?
> 
> > +	if (action == INTEL_GUC_ACTION_TLB_INVALIDATION_DONE) {

I'll improve the comment here. I guess something like this would
be enough:

	/*
	 * Handle tlb invalidation response in interrupt context
	 * 
	 * As TLB invalidation is needed to avoid leaking data, wait until
	 * TLB invalidation is completed before returning.
	 */

> > +		const u32 *payload;
> > +		u32 hxg_len, len;
> > +
> > +		hxg_len = request->size - GUC_CTB_MSG_MIN_LEN;
> > +		len = hxg_len - GUC_HXG_MSG_MIN_LEN;
> > +		if (unlikely(len < 1))
> > +			return -EPROTO;
> > +		payload = &hxg[GUC_HXG_MSG_MIN_LEN];  
> 
> if we still need to handle this at this level, can we at least move this
> message decomposition to the handler (in other words: just pass hxg
> pointer instead of single dword payload)

Yeah, makes sense.

> 
> > +		intel_guc_tlb_invalidation_done(ct_to_guc(ct),  payload[0]);
> > +		ct_free_msg(request);
> > +		return 0;
> > +	}
> >  
> >  	spin_lock_irqsave(&ct->requests.lock, flags);
> >  	list_add_tail(&request->link, &ct->requests.incoming);
> >  	spin_unlock_irqrestore(&ct->requests.lock, flags);
> >  
> >  	queue_work(system_unbound_wq, &ct->requests.worker);
> > +
> >  	return 0;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > index b3c9a9327f76..3edf567b3f65 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
> > @@ -22,6 +22,7 @@
> >  /* Payload length only i.e. don't include G2H header length */
> >  #define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET	2
> >  #define G2H_LEN_DW_DEREGISTER_CONTEXT		1
> > +#define G2H_LEN_DW_INVALIDATE_TLB		1
> >  
> >  #define GUC_CONTEXT_DISABLE		0
> >  #define GUC_CONTEXT_ENABLE		1
> > @@ -431,4 +432,9 @@ enum intel_guc_recv_message {
> >  	INTEL_GUC_RECV_MSG_EXCEPTION = BIT(30),
> >  };
> >  
> > +#define INTEL_GUC_SUPPORTS_TLB_INVALIDATION(guc) \
> > +	((intel_guc_ct_enabled(&(guc)->ct)) && \  

This basically does:
	static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
	{
		return ct->enabled;
	}

> 
> do we need this check ?
> CTB is prerequisite for submission that is required below
> 
> > +	 (intel_guc_submission_is_used(guc)) && \

If I understood the code right, this checks the status machine,
by looking at __intel_uc_fw_status(&guc->fw), but it doesn't look
at ct->enabled. Without knowing exactly how the status is updated, 
is hard to tell if ct enabled flag will be always true here.

So, I would keep the check.

> > +	 (GRAPHICS_VER(guc_to_gt((guc))->i915) >= 12))
> > +
> >  #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 40f726c61e95..6888ea1bc7c1 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1653,11 +1653,20 @@ static void __guc_reset_context(struct intel_context *ce, intel_engine_mask_t st
> >  	intel_context_put(parent);
> >  }
> >  
> > +static void wake_up_tlb_invalidate(struct intel_guc_tlb_wait *wait)
> > +{
> > +	/* Barrier to ensure the store is observed by the woken thread */
> > +	smp_store_mb(wait->status, 0);
> > +	wake_up(&wait->wq);
> > +}
> > +
> >  void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stalled)
> >  {
> > +	struct intel_guc_tlb_wait *wait;
> >  	struct intel_context *ce;
> >  	unsigned long index;
> >  	unsigned long flags;
> > +	unsigned long i;
> >  
> >  	if (unlikely(!guc_submission_initialized(guc))) {
> >  		/* Reset called during driver load? GuC not yet initialised! */
> > @@ -1683,6 +1692,13 @@ void intel_guc_submission_reset(struct intel_guc *guc, intel_engine_mask_t stall
> >  
> >  	/* GuC is blown away, drop all references to contexts */
> >  	xa_destroy(&guc->context_lookup);
> > +
> > +	/*
> > +	 * The full GT reset will have cleared the TLB caches and flushed the
> > +	 * G2H message queue; we can release all the blocked waiters.
> > +	 */
> > +	xa_for_each(&guc->tlb_lookup, i, wait)
> > +		wake_up_tlb_invalidate(wait);  
> 
> shouldn't this be closer to intel_guc_invalidate_tlb_guc()
> then we can avoid spreading code across many files
> 
> same for the init/fini_tlb_lookup() functions below

I'll address that on a final patch moving GuC-based TLB patch to a
separate file.

> >  }
> >  
> >  static void guc_cancel_context_requests(struct intel_context *ce)
> > @@ -1805,6 +1821,41 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
> >  static void destroyed_worker_func(struct work_struct *w);
> >  static void reset_fail_worker_func(struct work_struct *w);
> >  
> > +static int init_tlb_lookup(struct intel_guc *guc)
> > +{
> > +	struct intel_guc_tlb_wait *wait;
> > +	int err;
> > +
> > +	xa_init_flags(&guc->tlb_lookup, XA_FLAGS_ALLOC);
> > +
> > +	wait = kzalloc(sizeof(*wait), GFP_KERNEL);
> > +	if (!wait)
> > +		return -ENOMEM;
> > +
> > +	init_waitqueue_head(&wait->wq);
> > +	err = xa_alloc_cyclic_irq(&guc->tlb_lookup, &guc->serial_slot, wait,
> > +				  xa_limit_32b, &guc->next_seqno, GFP_KERNEL);
> > +	if (err == -ENOMEM) {
> > +		kfree(wait);
> > +		return err;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void fini_tlb_lookup(struct intel_guc *guc)
> > +{
> > +	struct intel_guc_tlb_wait *wait;
> > +
> > +	wait = xa_load(&guc->tlb_lookup, guc->serial_slot);
> > +	if (wait) {
> > +		GEM_BUG_ON(wait->status);
> > +		kfree(wait);
> > +	}
> > +
> > +	xa_destroy(&guc->tlb_lookup);
> > +}
> > +
> >  /*
> >   * Set up the memory resources to be shared with the GuC (via the GGTT)
> >   * at firmware loading time.
> > @@ -1812,20 +1863,31 @@ static void reset_fail_worker_func(struct work_struct *w);
> >  int intel_guc_submission_init(struct intel_guc *guc)
> >  {
> >  	struct intel_gt *gt = guc_to_gt(guc);
> > +	int ret;
> >  
> >  	if (guc->submission_initialized)
> >  		return 0;
> >  
> > +	ret = init_tlb_lookup(guc);  
> 
> if we promote guc_tlb to own file/functions then maybe it could be
> init/fini directly from __uc_init_hw() ?

I'll look on it at the new patch to be added at the end.

> > +	if (ret)
> > +		return ret;
> > +
> >  	guc->submission_state.guc_ids_bitmap =
> >  		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
> > -	if (!guc->submission_state.guc_ids_bitmap)
> > -		return -ENOMEM;
> > +	if (!guc->submission_state.guc_ids_bitmap) {
> > +		ret = -ENOMEM;
> > +		goto err;
> > +	}
> >  
> >  	guc->timestamp.ping_delay = (POLL_TIME_CLKS / gt->clock_frequency + 1) * HZ;
> >  	guc->timestamp.shift = gpm_timestamp_shift(gt);
> >  	guc->submission_initialized = true;
> >  
> >  	return 0;
> > +
> > +err:
> > +	fini_tlb_lookup(guc);
> > +	return ret;
> >  }
> >  
> >  void intel_guc_submission_fini(struct intel_guc *guc)
> > @@ -1836,6 +1898,7 @@ void intel_guc_submission_fini(struct intel_guc *guc)
> >  	guc_flush_destroyed_contexts(guc);
> >  	i915_sched_engine_put(guc->sched_engine);
> >  	bitmap_free(guc->submission_state.guc_ids_bitmap);
> > +	fini_tlb_lookup(guc);
> >  	guc->submission_initialized = false;
> >  }
> >  
> > @@ -4027,6 +4090,30 @@ g2h_context_lookup(struct intel_guc *guc, u32 ctx_id)
> >  	return ce;
> >  }
> >  
> > +static void wait_wake_outstanding_tlb_g2h(struct intel_guc *guc, u32 seqno)
> > +{
> > +	struct intel_guc_tlb_wait *wait;
> > +	unsigned long flags;
> > +
> > +	xa_lock_irqsave(&guc->tlb_lookup, flags);
> > +	wait = xa_load(&guc->tlb_lookup, seqno);
> > +
> > +	/* We received a response after the waiting task did exit with a timeout */
> > +	if (unlikely(!wait))
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Stale tlb invalidation response with seqno %d\n", seqno);  
> 
> hmm, this sounds like a problem as we shouldn't get any late
> notifications - do we really want to hide it under drm_dbg ?

Agreed. I'll change it to drm_err().

> > +
> > +	if (wait)
> > +		wake_up_tlb_invalidate(wait);
> > +
> > +	xa_unlock_irqrestore(&guc->tlb_lookup, flags);
> > +}
> > +
> > +void intel_guc_tlb_invalidation_done(struct intel_guc *guc, u32 seqno)
> > +{
> > +	wait_wake_outstanding_tlb_g2h(guc, seqno);
> > +}
> > +
> >  int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >  					  const u32 *msg,
> >  					  u32 len)  
> 
> ,Michal

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2022-08-02  7:48 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14 12:06 [PATCH v2 00/21] Fix performance regressions with TLB and add GuC support Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 01/21] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab
2022-07-18 13:16   ` Tvrtko Ursulin
2022-07-18 14:53     ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-18 15:01       ` Tvrtko Ursulin
2022-07-18 15:50       ` David Laight
2022-07-19  7:24         ` Tvrtko Ursulin
2022-07-19  7:45           ` David Laight
2022-07-22 11:56   ` Andi Shyti
2022-07-14 12:06 ` [PATCH v2 02/21] drm/i915/gt: document with_intel_gt_pm_if_awake() Mauro Carvalho Chehab
2022-07-18 13:21   ` Tvrtko Ursulin
2022-07-14 12:06 ` [PATCH v2 03/21] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations Mauro Carvalho Chehab
2022-07-18 13:24   ` Tvrtko Ursulin
2022-07-22 11:57   ` Andi Shyti
2022-07-14 12:06 ` [PATCH v2 04/21] drm/i915/gt: Only invalidate TLBs exposed to user manipulation Mauro Carvalho Chehab
2022-07-18 13:39   ` Tvrtko Ursulin
2022-07-18 16:00     ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-22 11:58   ` Andi Shyti
2022-07-14 12:06 ` [PATCH v2 05/21] drm/i915/gt: Skip TLB invalidations once wedged Mauro Carvalho Chehab
2022-07-18 13:45   ` Tvrtko Ursulin
2022-07-18 16:06     ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-19  7:19       ` Tvrtko Ursulin
2022-07-22 12:00   ` Andi Shyti
2022-07-14 12:06 ` [PATCH v2 06/21] drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab
2022-07-18 13:52   ` Tvrtko Ursulin
2022-07-20  7:13     ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-20 10:49       ` Tvrtko Ursulin
2022-07-20 10:54   ` Tvrtko Ursulin
2022-07-27 11:48     ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-27 12:56       ` Tvrtko Ursulin
2022-07-28  6:32         ` Mauro Carvalho Chehab
2022-07-28  7:26           ` Mauro Carvalho Chehab
2022-07-28 10:11           ` Tvrtko Ursulin
2022-07-14 12:06 ` [PATCH v2 07/21] drm/i915/gt: describe the new tlb parameter at i915_vma_resource Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 08/21] drm/i915/gt: Move TLB invalidation to its own file Mauro Carvalho Chehab
2022-07-22 12:07   ` Andi Shyti
2022-07-14 12:06 ` [PATCH v2 09/21] drm/i915/guc: Define CTB based TLB invalidation routines Mauro Carvalho Chehab
2022-07-14 14:06   ` Michal Wajdeczko
2022-08-02  7:48     ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 10/21] drm/i915/guc: use kernel-doc for enum intel_guc_tlb_inval_mode Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 11/21] drm/i915/guc: document the TLB invalidation struct members Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 12/21] drm/i915/guc: Introduce TLB_INVALIDATION_ALL action Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 13/21] drm/i915: Invalidate the TLBs on each GT Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 14/21] drm/i915: document tlb field at struct drm_i915_gem_object Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 15/21] drm/i915: Add platform macro for selective tlb flush Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 16/21] drm/i915: Define GuC Based TLB invalidation routines Mauro Carvalho Chehab
2022-07-14 15:20   ` Michal Wajdeczko
2022-07-14 12:06 ` [PATCH v2 17/21] drm/i915: Add generic interface for tlb invalidation for XeHP Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 18/21] drm/i915: Use selective tlb invalidations where supported Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 19/21] drm/i915/gt: document TLB cache invalidation functions Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 20/21] drm/i915/guc: describe enum intel_guc_tlb_invalidation_type Mauro Carvalho Chehab
2022-07-14 12:06 ` [PATCH v2 21/21] drm/i915/guc: document TLB cache invalidation functions Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).