All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] drm/i915: reduce TLB performance regressions
@ 2022-07-27 12:29 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Christian König, Daniel Vetter,
	David Airlie, Sumit Semwal, dri-devel, intel-gfx, linaro-mm-sig,
	linux-kernel, linux-media

Doing TLB invalidation cause performance regressions, like:
	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!

As reported at:
	https://gitlab.freedesktop.org/drm/intel/-/issues/6424

as this is an expensive operation. So, reduce the need of it by:
  - checking if the engine is awake;
  - checking if the engine is not wedged;
  - batching operations.

Additionally, add a workaround for a known hardware issue on some GPUs.

In order to double-check that this series won't be introducing any regressions,
I used this new IGT test:

https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1

Checking the results for 3 different patchsets, on Broadwell:

1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
invalidation and serialization patches:

	$ sudo build/tests/gem_exec_tlb|grep Subtest
	Subtest close-clear: SUCCESS (10.490s)
	Subtest madv-clear: SUCCESS (10.484s)
	Subtest u-unmap-clear: SUCCESS (10.527s)
	Subtest u-shrink-clear: SUCCESS (10.506s)
	Subtest close-dumb: SUCCESS (10.165s)
	Subtest madv-dumb: SUCCESS (10.177s)
	Subtest u-unmap-dumb: SUCCESS (10.172s)
	Subtest u-shrink-dumb: SUCCESS (10.172s)

2) With the new version of the batch TLB invalidation patches from this series:

	$ sudo build/tests/gem_exec_tlb|grep Subtest
	Subtest close-clear: SUCCESS (10.483s)
	Subtest madv-clear: SUCCESS (10.495s)
	Subtest u-unmap-clear: SUCCESS (10.545s)
	Subtest u-shrink-clear: SUCCESS (10.508s)
	Subtest close-dumb: SUCCESS (10.172s)
	Subtest madv-dumb: SUCCESS (10.169s)
	Subtest u-unmap-dumb: SUCCESS (10.174s)
	Subtest u-shrink-dumb: SUCCESS (10.176s)

3) Changing the TLB invalidation routine to do nothing[1]:

	$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
	(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
	(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
	(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
	(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
	(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
	(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
	(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
	(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
	Dynamic subtest smem0 failed.
	**** DEBUG ****
	(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
	(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
	(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
	(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
	****  END  ****
	Subtest close-clear: FAIL (10.434s)
	Subtest madv-clear: SUCCESS (10.479s)
	Subtest u-unmap-clear: SUCCESS (10.512s)

In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
applied don't have TLB invalidation cache issues.

[1] I applied this patch on the top of drm-tip:

	diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
	index 68c2b0d8f187..0aefcd7be5e9 100644
	--- a/drivers/gpu/drm/i915/gt/intel_gt.c
	+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
	@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
	+	// HACK: don't do TLB invalidations!!!
	+	return;
	+

Regards,
Mauro

Chris Wilson (4):
  drm/i915/gt: Ignore TLB invalidations on idle engines
  drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  drm/i915/gt: Skip TLB invalidations once wedged
  drm/i915/gt: Batch TLB invalidations

Mauro Carvalho Chehab (2):
  drm/i915/gt: document with_intel_gt_pm_if_awake()
  drm/i915/gt: describe the new tlb parameter at i915_vma_resource

 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 25 +++---
 drivers/gpu/drm/i915/gt/intel_gt.c            | 77 +++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 11 +++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 +-
 drivers/gpu/drm/i915/i915_vma.c               | 33 ++++++--
 drivers/gpu/drm/i915/i915_vma.h               |  1 +
 drivers/gpu/drm/i915/i915_vma_resource.c      |  9 ++-
 drivers/gpu/drm/i915/i915_vma_resource.h      |  6 +-
 11 files changed, 163 insertions(+), 40 deletions(-)

-- 
2.36.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 0/6] drm/i915: reduce TLB performance regressions
@ 2022-07-27 12:29 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Sumit Semwal,
	linaro-mm-sig, Mauro Carvalho Chehab, Christian König,
	linux-media

Doing TLB invalidation cause performance regressions, like:
	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!

As reported at:
	https://gitlab.freedesktop.org/drm/intel/-/issues/6424

as this is an expensive operation. So, reduce the need of it by:
  - checking if the engine is awake;
  - checking if the engine is not wedged;
  - batching operations.

Additionally, add a workaround for a known hardware issue on some GPUs.

In order to double-check that this series won't be introducing any regressions,
I used this new IGT test:

https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1

Checking the results for 3 different patchsets, on Broadwell:

1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
invalidation and serialization patches:

	$ sudo build/tests/gem_exec_tlb|grep Subtest
	Subtest close-clear: SUCCESS (10.490s)
	Subtest madv-clear: SUCCESS (10.484s)
	Subtest u-unmap-clear: SUCCESS (10.527s)
	Subtest u-shrink-clear: SUCCESS (10.506s)
	Subtest close-dumb: SUCCESS (10.165s)
	Subtest madv-dumb: SUCCESS (10.177s)
	Subtest u-unmap-dumb: SUCCESS (10.172s)
	Subtest u-shrink-dumb: SUCCESS (10.172s)

2) With the new version of the batch TLB invalidation patches from this series:

	$ sudo build/tests/gem_exec_tlb|grep Subtest
	Subtest close-clear: SUCCESS (10.483s)
	Subtest madv-clear: SUCCESS (10.495s)
	Subtest u-unmap-clear: SUCCESS (10.545s)
	Subtest u-shrink-clear: SUCCESS (10.508s)
	Subtest close-dumb: SUCCESS (10.172s)
	Subtest madv-dumb: SUCCESS (10.169s)
	Subtest u-unmap-dumb: SUCCESS (10.174s)
	Subtest u-shrink-dumb: SUCCESS (10.176s)

3) Changing the TLB invalidation routine to do nothing[1]:

	$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
	(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
	(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
	(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
	(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
	(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
	(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
	(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
	(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
	Dynamic subtest smem0 failed.
	**** DEBUG ****
	(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
	(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
	(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
	(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
	****  END  ****
	Subtest close-clear: FAIL (10.434s)
	Subtest madv-clear: SUCCESS (10.479s)
	Subtest u-unmap-clear: SUCCESS (10.512s)

In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
applied don't have TLB invalidation cache issues.

[1] I applied this patch on the top of drm-tip:

	diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
	index 68c2b0d8f187..0aefcd7be5e9 100644
	--- a/drivers/gpu/drm/i915/gt/intel_gt.c
	+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
	@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
	+	// HACK: don't do TLB invalidations!!!
	+	return;
	+

Regards,
Mauro

Chris Wilson (4):
  drm/i915/gt: Ignore TLB invalidations on idle engines
  drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  drm/i915/gt: Skip TLB invalidations once wedged
  drm/i915/gt: Batch TLB invalidations

Mauro Carvalho Chehab (2):
  drm/i915/gt: document with_intel_gt_pm_if_awake()
  drm/i915/gt: describe the new tlb parameter at i915_vma_resource

 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 25 +++---
 drivers/gpu/drm/i915/gt/intel_gt.c            | 77 +++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 11 +++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 +-
 drivers/gpu/drm/i915/i915_vma.c               | 33 ++++++--
 drivers/gpu/drm/i915/i915_vma.h               |  1 +
 drivers/gpu/drm/i915/i915_vma_resource.c      |  9 ++-
 drivers/gpu/drm/i915/i915_vma_resource.h      |  6 +-
 11 files changed, 163 insertions(+), 40 deletions(-)

-- 
2.36.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions
@ 2022-07-27 12:29 ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Sumit Semwal,
	linaro-mm-sig, Mauro Carvalho Chehab, Christian König,
	linux-media

Doing TLB invalidation cause performance regressions, like:
	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!

As reported at:
	https://gitlab.freedesktop.org/drm/intel/-/issues/6424

as this is an expensive operation. So, reduce the need of it by:
  - checking if the engine is awake;
  - checking if the engine is not wedged;
  - batching operations.

Additionally, add a workaround for a known hardware issue on some GPUs.

In order to double-check that this series won't be introducing any regressions,
I used this new IGT test:

https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1

Checking the results for 3 different patchsets, on Broadwell:

1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
invalidation and serialization patches:

	$ sudo build/tests/gem_exec_tlb|grep Subtest
	Subtest close-clear: SUCCESS (10.490s)
	Subtest madv-clear: SUCCESS (10.484s)
	Subtest u-unmap-clear: SUCCESS (10.527s)
	Subtest u-shrink-clear: SUCCESS (10.506s)
	Subtest close-dumb: SUCCESS (10.165s)
	Subtest madv-dumb: SUCCESS (10.177s)
	Subtest u-unmap-dumb: SUCCESS (10.172s)
	Subtest u-shrink-dumb: SUCCESS (10.172s)

2) With the new version of the batch TLB invalidation patches from this series:

	$ sudo build/tests/gem_exec_tlb|grep Subtest
	Subtest close-clear: SUCCESS (10.483s)
	Subtest madv-clear: SUCCESS (10.495s)
	Subtest u-unmap-clear: SUCCESS (10.545s)
	Subtest u-shrink-clear: SUCCESS (10.508s)
	Subtest close-dumb: SUCCESS (10.172s)
	Subtest madv-dumb: SUCCESS (10.169s)
	Subtest u-unmap-dumb: SUCCESS (10.174s)
	Subtest u-shrink-dumb: SUCCESS (10.176s)

3) Changing the TLB invalidation routine to do nothing[1]:

	$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
	(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
	(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
	(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
	(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
	(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
	(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
	(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
	(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
	(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
	(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
	Dynamic subtest smem0 failed.
	**** DEBUG ****
	(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
	(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
	(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
	(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
	****  END  ****
	Subtest close-clear: FAIL (10.434s)
	Subtest madv-clear: SUCCESS (10.479s)
	Subtest u-unmap-clear: SUCCESS (10.512s)

In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
applied don't have TLB invalidation cache issues.

[1] I applied this patch on the top of drm-tip:

	diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
	index 68c2b0d8f187..0aefcd7be5e9 100644
	--- a/drivers/gpu/drm/i915/gt/intel_gt.c
	+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
	@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
	+	// HACK: don't do TLB invalidations!!!
	+	return;
	+

Regards,
Mauro

Chris Wilson (4):
  drm/i915/gt: Ignore TLB invalidations on idle engines
  drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  drm/i915/gt: Skip TLB invalidations once wedged
  drm/i915/gt: Batch TLB invalidations

Mauro Carvalho Chehab (2):
  drm/i915/gt: document with_intel_gt_pm_if_awake()
  drm/i915/gt: describe the new tlb parameter at i915_vma_resource

 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 25 +++---
 drivers/gpu/drm/i915/gt/intel_gt.c            | 77 +++++++++++++++----
 drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++-
 drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 11 +++
 drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 +-
 drivers/gpu/drm/i915/i915_vma.c               | 33 ++++++--
 drivers/gpu/drm/i915/i915_vma.h               |  1 +
 drivers/gpu/drm/i915/i915_vma_resource.c      |  9 ++-
 drivers/gpu/drm/i915/i915_vma_resource.h      |  6 +-
 11 files changed, 163 insertions(+), 40 deletions(-)

-- 
2.36.1



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
  2022-07-27 12:29 ` Mauro Carvalho Chehab
  (?)
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Chris Wilson, Thomas Hellström, Andi Shyti, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	John Harrison, Joonas Lahtinen, Lucas De Marchi,
	Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost,
	Mauro Carvalho Chehab, Rodrigo Vivi, Tvrtko Ursulin, dri-devel,
	intel-gfx, linux-kernel, stable, Fei Yang, Tvrtko Ursulin

From: Chris Wilson <chris.p.wilson@intel.com>

Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.

This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++----
 drivers/gpu/drm/i915/gt/intel_gt.c        | 17 ++++++++++-------
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 97c820eee115..6835279943df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -6,14 +6,15 @@
 
 #include <drm/drm_cache.h>
 
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+
 #include "i915_drv.h"
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
-#include "gt/intel_gt.h"
-
 void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 				 struct sg_table *pages,
 				 unsigned int sg_page_sizes)
@@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 
 	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct intel_gt *gt = to_gt(i915);
 		intel_wakeref_t wakeref;
 
-		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
-			intel_gt_invalidate_tlbs(to_gt(i915));
+		with_intel_gt_pm_if_awake(gt, wakeref)
+			intel_gt_invalidate_tlbs(gt);
 	}
 
 	return pages;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 68c2b0d8f187..c4d43da84d8e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -12,6 +12,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
@@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
 	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
 	enum intel_engine_id id;
 	const i915_reg_t *regs;
 	unsigned int num = 0;
@@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 
 	GEM_TRACE("\n");
 
-	assert_rpm_wakelock_held(&i915->runtime_pm);
-
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
 	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
 
+	awake = 0;
 	for_each_engine(engine, gt, id) {
 		struct reg_and_bit rb;
 
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
 		if (!i915_mmio_reg_offset(rb.reg))
 			continue;
 
 		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
 	}
 
 	spin_unlock_irq(&uncore->lock);
 
-	for_each_engine(engine, gt, id) {
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
 		/*
 		 * HW architecture suggest typical invalidation time at 40us,
 		 * with pessimistic cases up to 100us and a recommendation to
@@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		 */
 		const unsigned int timeout_us = 100;
 		const unsigned int timeout_ms = 4;
-		struct reg_and_bit rb;
 
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
 		if (__intel_wait_for_register_fw(uncore,
 						 rb.reg, rb.bit, 0,
 						 timeout_us, timeout_ms,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index bc898df7a48c..a334787a4939 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang,
	Matthew Brost, Chris Wilson, Matthew Auld, Andi Shyti,
	Dave Airlie, Thomas Hellström, Lucas De Marchi, intel-gfx,
	Rodrigo Vivi, Mauro Carvalho Chehab, Tvrtko Ursulin,
	Tvrtko Ursulin, linux-kernel, stable, John Harrison

From: Chris Wilson <chris.p.wilson@intel.com>

Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.

This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++----
 drivers/gpu/drm/i915/gt/intel_gt.c        | 17 ++++++++++-------
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 97c820eee115..6835279943df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -6,14 +6,15 @@
 
 #include <drm/drm_cache.h>
 
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+
 #include "i915_drv.h"
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
-#include "gt/intel_gt.h"
-
 void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 				 struct sg_table *pages,
 				 unsigned int sg_page_sizes)
@@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 
 	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct intel_gt *gt = to_gt(i915);
 		intel_wakeref_t wakeref;
 
-		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
-			intel_gt_invalidate_tlbs(to_gt(i915));
+		with_intel_gt_pm_if_awake(gt, wakeref)
+			intel_gt_invalidate_tlbs(gt);
 	}
 
 	return pages;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 68c2b0d8f187..c4d43da84d8e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -12,6 +12,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
@@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
 	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
 	enum intel_engine_id id;
 	const i915_reg_t *regs;
 	unsigned int num = 0;
@@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 
 	GEM_TRACE("\n");
 
-	assert_rpm_wakelock_held(&i915->runtime_pm);
-
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
 	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
 
+	awake = 0;
 	for_each_engine(engine, gt, id) {
 		struct reg_and_bit rb;
 
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
 		if (!i915_mmio_reg_offset(rb.reg))
 			continue;
 
 		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
 	}
 
 	spin_unlock_irq(&uncore->lock);
 
-	for_each_engine(engine, gt, id) {
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
 		/*
 		 * HW architecture suggest typical invalidation time at 40us,
 		 * with pessimistic cases up to 100us and a recommendation to
@@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		 */
 		const unsigned int timeout_us = 100;
 		const unsigned int timeout_ms = 4;
-		struct reg_and_bit rb;
 
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
 		if (__intel_wait_for_register_fw(uncore,
 						 rb.reg, rb.bit, 0,
 						 timeout_us, timeout_ms,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index bc898df7a48c..a334787a4939 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH v3 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie,
	Thomas Hellström, Lucas De Marchi, intel-gfx, Rodrigo Vivi,
	Mauro Carvalho Chehab, linux-kernel, stable

From: Chris Wilson <chris.p.wilson@intel.com>

Check if the device is powered down prior to any engine activity,
as, on such cases, all the TLBs were already invalidated, so an
explicit TLB invalidation is not needed, thus reducing the
performance regression impact due to it.

This becomes more significant with GuC, as it can only do so when
the connection to the GuC is awake.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++----
 drivers/gpu/drm/i915/gt/intel_gt.c        | 17 ++++++++++-------
 drivers/gpu/drm/i915/gt/intel_gt_pm.h     |  3 +++
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 97c820eee115..6835279943df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -6,14 +6,15 @@
 
 #include <drm/drm_cache.h>
 
+#include "gt/intel_gt.h"
+#include "gt/intel_gt_pm.h"
+
 #include "i915_drv.h"
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 #include "i915_gem_lmem.h"
 #include "i915_gem_mman.h"
 
-#include "gt/intel_gt.h"
-
 void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 				 struct sg_table *pages,
 				 unsigned int sg_page_sizes)
@@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 
 	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
 		struct drm_i915_private *i915 = to_i915(obj->base.dev);
+		struct intel_gt *gt = to_gt(i915);
 		intel_wakeref_t wakeref;
 
-		with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref)
-			intel_gt_invalidate_tlbs(to_gt(i915));
+		with_intel_gt_pm_if_awake(gt, wakeref)
+			intel_gt_invalidate_tlbs(gt);
 	}
 
 	return pages;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 68c2b0d8f187..c4d43da84d8e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -12,6 +12,7 @@
 
 #include "i915_drv.h"
 #include "intel_context.h"
+#include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
 #include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
@@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	struct drm_i915_private *i915 = gt->i915;
 	struct intel_uncore *uncore = gt->uncore;
 	struct intel_engine_cs *engine;
+	intel_engine_mask_t awake, tmp;
 	enum intel_engine_id id;
 	const i915_reg_t *regs;
 	unsigned int num = 0;
@@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 
 	GEM_TRACE("\n");
 
-	assert_rpm_wakelock_held(&i915->runtime_pm);
-
 	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
 	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
 
+	awake = 0;
 	for_each_engine(engine, gt, id) {
 		struct reg_and_bit rb;
 
+		if (!intel_engine_pm_is_awake(engine))
+			continue;
+
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
 		if (!i915_mmio_reg_offset(rb.reg))
 			continue;
 
 		intel_uncore_write_fw(uncore, rb.reg, rb.bit);
+		awake |= engine->mask;
 	}
 
 	spin_unlock_irq(&uncore->lock);
 
-	for_each_engine(engine, gt, id) {
+	for_each_engine_masked(engine, gt, awake, tmp) {
+		struct reg_and_bit rb;
+
 		/*
 		 * HW architecture suggest typical invalidation time at 40us,
 		 * with pessimistic cases up to 100us and a recommendation to
@@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		 */
 		const unsigned int timeout_us = 100;
 		const unsigned int timeout_ms = 4;
-		struct reg_and_bit rb;
 
 		rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
-		if (!i915_mmio_reg_offset(rb.reg))
-			continue;
-
 		if (__intel_wait_for_register_fw(uncore,
 						 rb.reg, rb.bit, 0,
 						 timeout_us, timeout_ms,
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index bc898df7a48c..a334787a4939 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
+
 static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt)
 {
 	return intel_wakeref_wait_for_idle(&gt->wakeref);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake()
  2022-07-27 12:29 ` Mauro Carvalho Chehab
  (?)
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Chris Wilson, Daniel Vetter, David Airlie,
	Jani Nikula, John Harrison, Joonas Lahtinen, Matthew Brost,
	Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel,
	Tvrtko Ursulin

Add a kernel-doc markup to document this new macro.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt_pm.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index a334787a4939..6c9a46452364 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,14 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+/**
+ * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent
+ *	it to sleep, run some code and then asynchrously put the reference
+ *	away.
+ *
+ * @gt: pointer to the gt
+ * @wf: pointer to a temporary wakeref.
+ */
 #define with_intel_gt_pm_if_awake(gt, wf) \
 	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake()
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Matthew Brost, Tvrtko Ursulin, Tvrtko Ursulin, David Airlie,
	dri-devel, linux-kernel, Chris Wilson, Rodrigo Vivi,
	Mauro Carvalho Chehab, intel-gfx, John Harrison

Add a kernel-doc markup to document this new macro.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt_pm.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index a334787a4939..6c9a46452364 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,14 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+/**
+ * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent
+ *	it to sleep, run some code and then asynchrously put the reference
+ *	away.
+ *
+ * @gt: pointer to the gt
+ * @wf: pointer to a temporary wakeref.
+ */
 #define with_intel_gt_pm_if_awake(gt, wf) \
 	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake()
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, linux-kernel, Chris Wilson,
	Rodrigo Vivi, Mauro Carvalho Chehab, intel-gfx

Add a kernel-doc markup to document this new macro.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt_pm.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index a334787a4939..6c9a46452364 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -55,6 +55,14 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt)
 	for (tmp = 1, intel_gt_pm_get(gt); tmp; \
 	     intel_gt_pm_put(gt), tmp = 0)
 
+/**
+ * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent
+ *	it to sleep, run some code and then asynchrously put the reference
+ *	away.
+ *
+ * @gt: pointer to the gt
+ * @wf: pointer to a temporary wakeref.
+ */
 #define with_intel_gt_pm_if_awake(gt, wf) \
 	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 3/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
  2022-07-27 12:29 ` Mauro Carvalho Chehab
  (?)
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Mauro Carvalho Chehab, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, stable,
	Fei Yang, Tvrtko Ursulin, Thomas Hellström

From: Chris Wilson <chris.p.wilson@intel.com>

Ensure that the TLB of the OA unit is also invalidated
on gen12 HW, as just invalidating the TLB of an engine is not
enough.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index c4d43da84d8e..1d84418e8676 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,6 +11,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
@@ -969,6 +970,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
 	spin_unlock_irq(&uncore->lock);
 
 	for_each_engine_masked(engine, gt, awake, tmp) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 3/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Tvrtko Ursulin, Thomas Hellström, Andi Shyti,
	Tvrtko Ursulin, David Airlie, dri-devel, Lucas De Marchi,
	linux-kernel, Chris Wilson, Daniele Ceraolo Spurio, Rodrigo Vivi,
	Dave Airlie, stable, Mauro Carvalho Chehab, intel-gfx, Fei Yang

From: Chris Wilson <chris.p.wilson@intel.com>

Ensure that the TLB of the OA unit is also invalidated
on gen12 HW, as just invalidating the TLB of an engine is not
enough.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index c4d43da84d8e..1d84418e8676 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,6 +11,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
@@ -969,6 +970,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
 	spin_unlock_irq(&uncore->lock);
 
 	for_each_engine_masked(engine, gt, awake, tmp) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH v3 3/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Thomas Hellström, David Airlie, dri-devel, Lucas De Marchi,
	linux-kernel, Chris Wilson, Rodrigo Vivi, Dave Airlie, stable,
	Mauro Carvalho Chehab, intel-gfx

From: Chris Wilson <chris.p.wilson@intel.com>

Ensure that the TLB of the OA unit is also invalidated
on gen12 HW, as just invalidating the TLB of an engine is not
enough.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index c4d43da84d8e..1d84418e8676 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -11,6 +11,7 @@
 #include "pxp/intel_pxp.h"
 
 #include "i915_drv.h"
+#include "i915_perf_oa_regs.h"
 #include "intel_context.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_regs.h"
@@ -969,6 +970,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
+	if (awake &&
+	    (IS_TIGERLAKE(i915) ||
+	     IS_DG1(i915) ||
+	     IS_ROCKETLAKE(i915) ||
+	     IS_ALDERLAKE_S(i915) ||
+	     IS_ALDERLAKE_P(i915)))
+		intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1);
+
 	spin_unlock_irq(&uncore->lock);
 
 	for_each_engine_masked(engine, gt, awake, tmp) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 4/6] drm/i915/gt: Skip TLB invalidations once wedged
  2022-07-27 12:29 ` Mauro Carvalho Chehab
  (?)
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio,
	Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen,
	Lucas De Marchi, Matt Roper, Mauro Carvalho Chehab, Rodrigo Vivi,
	Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, stable,
	Fei Yang, Tvrtko Ursulin, Thomas Hellström

From: Chris Wilson <chris.p.wilson@intel.com>

Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.

So, an attempt to do a TLB cache invalidation will produce a timeout.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 1d84418e8676..5c55a90672f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -934,6 +934,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
 		return;
 
+	if (intel_gt_is_wedged(gt))
+		return;
+
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 4/6] drm/i915/gt: Skip TLB invalidations once wedged
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Tvrtko Ursulin, Thomas Hellström, Andi Shyti,
	Tvrtko Ursulin, David Airlie, dri-devel, Lucas De Marchi,
	linux-kernel, Chris Wilson, Daniele Ceraolo Spurio, Rodrigo Vivi,
	Dave Airlie, stable, Mauro Carvalho Chehab, intel-gfx, Fei Yang

From: Chris Wilson <chris.p.wilson@intel.com>

Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.

So, an attempt to do a TLB cache invalidation will produce a timeout.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 1d84418e8676..5c55a90672f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -934,6 +934,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
 		return;
 
+	if (intel_gt_is_wedged(gt))
+		return;
+
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH v3 4/6] drm/i915/gt: Skip TLB invalidations once wedged
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Thomas Hellström, David Airlie, dri-devel, Lucas De Marchi,
	linux-kernel, Chris Wilson, Rodrigo Vivi, Dave Airlie, stable,
	Mauro Carvalho Chehab, intel-gfx

From: Chris Wilson <chris.p.wilson@intel.com>

Skip all further TLB invalidations once the device is wedged and
had been reset, as, on such cases, it can no longer process instructions
on the GPU and the user no longer has access to the TLB's in each engine.

So, an attempt to do a TLB cache invalidation will produce a timeout.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 1d84418e8676..5c55a90672f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -934,6 +934,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
 		return;
 
+	if (intel_gt_is_wedged(gt))
+		return;
+
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations
  2022-07-27 12:29 ` Mauro Carvalho Chehab
  (?)
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Chris Wilson, Christian König, Michał Winiarski,
	Thomas Hellström, Andi Shyti, Andrzej Hajda, Ashutosh Dixit,
	Ayaz A Siddiqui, Casey Bowman, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Joonas Lahtinen, Lucas De Marchi, Maarten Lankhorst, Matt Roper,
	Matthew Auld, Mauro Carvalho Chehab, Michael Cheng, Nirmoy Das,
	Ramalingam C, Rodrigo Vivi, Sumit Semwal, Tomas Winkler,
	Tvrtko Ursulin, dri-devel, intel-gfx, linaro-mm-sig,
	linux-kernel, linux-media, stable, Tvrtko Ursulin, Fei Yang

From: Chris Wilson <chris.p.wilson@intel.com>

Invalidate TLB in batches, in order to reduce performance regressions.

Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.

We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

[mchehab: rebased to not require moving the code to a separate file]

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
 drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
 drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
 drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
 drivers/gpu/drm/i915/i915_vma.c               | 33 +++++++++---
 drivers/gpu/drm/i915/i915_vma.h               |  1 +
 drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
 drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
 10 files changed, 125 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5cf36a130061..9f6b14ec189a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -335,7 +335,6 @@ struct drm_i915_gem_object {
 #define I915_BO_READONLY          BIT(7)
 #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
 #define I915_BO_PROTECTED         BIT(9)
-#define I915_BO_WAS_BOUND_BIT     10
 	/**
 	 * @mem_flags - Mutable placement-related flags
 	 *
@@ -616,6 +615,8 @@ struct drm_i915_gem_object {
 		 * pages were last acquired.
 		 */
 		bool dirty:1;
+
+		u32 tlb;
 	} mm;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 6835279943df..8357dbdcab5c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
 		vunmap(ptr);
 }
 
+static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct intel_gt *gt = to_gt(i915);
+
+	if (!obj->mm.tlb)
+		return;
+
+	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
+	obj->mm.tlb = 0;
+}
+
 struct sg_table *
 __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 {
@@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 	__i915_gem_object_reset_page_iter(obj);
 	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
 
-	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
-		struct drm_i915_private *i915 = to_i915(obj->base.dev);
-		struct intel_gt *gt = to_gt(i915);
-		intel_wakeref_t wakeref;
-
-		with_intel_gt_pm_if_awake(gt, wakeref)
-			intel_gt_invalidate_tlbs(gt);
-	}
+	flush_tlb_invalidate(obj);
 
 	return pages;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 5c55a90672f4..f435e06125aa 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 {
 	spin_lock_init(&gt->irq_lock);
 
-	mutex_init(&gt->tlb_invalidate_lock);
-
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
@@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
 	intel_gt_init_timelines(gt);
+	mutex_init(&gt->tlb.invalidate_lock);
+	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
 	intel_gt_pm_init_early(gt);
 
 	intel_uc_init_early(&gt->uc);
@@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
 		intel_gt_fini_requests(gt);
 		intel_gt_fini_reset(gt);
 		intel_gt_fini_timelines(gt);
+		mutex_destroy(&gt->tlb.invalidate_lock);
 		intel_engines_free(gt);
 	}
 }
@@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
 	return rb;
 }
 
-void intel_gt_invalidate_tlbs(struct intel_gt *gt)
+static void mmio_invalidate_full(struct intel_gt *gt)
 {
 	static const i915_reg_t gen8_regs[] = {
 		[RENDER_CLASS]			= GEN8_RTCR,
@@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	const i915_reg_t *regs;
 	unsigned int num = 0;
 
-	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
-		return;
-
-	if (intel_gt_is_wedged(gt))
-		return;
-
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
@@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 			  "Platform does not implement TLB invalidation!"))
 		return;
 
-	GEM_TRACE("\n");
-
-	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
 	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
@@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	GT_TRACE(gt, "invalidated engines %08x\n", awake);
+
 	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
 	if (awake &&
 	    (IS_TIGERLAKE(i915) ||
@@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	 * transitions.
 	 */
 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
-	mutex_unlock(&gt->tlb_invalidate_lock);
+}
+
+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
+{
+	u32 cur = intel_gt_tlb_seqno(gt);
+
+	/* Only skip if a *full* TLB invalidate barrier has passed */
+	return (s32)(cur - ALIGN(seqno, 2)) > 0;
+}
+
+void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
+{
+	intel_wakeref_t wakeref;
+
+	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
+		return;
+
+	if (intel_gt_is_wedged(gt))
+		return;
+
+	if (tlb_seqno_passed(gt, seqno))
+		return;
+
+	with_intel_gt_pm_if_awake(gt, wakeref) {
+		mutex_lock(&gt->tlb.invalidate_lock);
+		if (tlb_seqno_passed(gt, seqno))
+			goto unlock;
+
+		mmio_invalidate_full(gt);
+
+		write_seqcount_invalidate(&gt->tlb.seqno);
+unlock:
+		mutex_unlock(&gt->tlb.invalidate_lock);
+	}
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 82d6f248d876..40b06adf509a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
 void intel_gt_watchdog_work(struct work_struct *work);
 
-void intel_gt_invalidate_tlbs(struct intel_gt *gt);
+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
+{
+	return seqprop_sequence(&gt->tlb.seqno);
+}
+
+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
+{
+	return intel_gt_tlb_seqno(gt) | 1;
+}
+
+void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
 
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index df708802889d..3804a583382b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -11,6 +11,7 @@
 #include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
+#include <linux/seqlock.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
@@ -83,7 +84,22 @@ struct intel_gt {
 	struct intel_uc uc;
 	struct intel_gsc gsc;
 
-	struct mutex tlb_invalidate_lock;
+	struct {
+		/* Serialize global tlb invalidations */
+		struct mutex invalidate_lock;
+
+		/*
+		 * Batch TLB invalidations
+		 *
+		 * After unbinding the PTE, we need to ensure the TLB
+		 * are invalidated prior to releasing the physical pages.
+		 * But we only need one such invalidation for all unbinds,
+		 * so we track how many TLB invalidations have been
+		 * performed since unbind the PTE and only emit an extra
+		 * invalidate if no full barrier has been passed.
+		 */
+		seqcount_mutex_t seqno;
+	} tlb;
 
 	struct i915_wa_list wa_list;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index d8b94d638559..2da6c82a8bd2 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
 void ppgtt_unbind_vma(struct i915_address_space *vm,
 		      struct i915_vma_resource *vma_res)
 {
-	if (vma_res->allocated)
-		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
+	if (!vma_res->allocated)
+		return;
+
+	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
+	if (vma_res->tlb)
+		vma_invalidate_tlb(vm, *vma_res->tlb);
 }
 
 static unsigned long pd_count(u64 size, int shift)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ef3b04c7e153..84a9ccbc5fc5 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
 				   bind_flags);
 	}
 
-	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
-
 	atomic_or(bind_flags, &vma->flags);
 	return 0;
 }
@@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
 	return err;
 }
 
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
+{
+	/*
+	 * Before we release the pages that were bound by this vma, we
+	 * must invalidate all the TLBs that may still have a reference
+	 * back to our physical address. It only needs to be done once,
+	 * so after updating the PTE to point away from the pages, record
+	 * the most recent TLB invalidation seqno, and if we have not yet
+	 * flushed the TLBs upon release, perform a full invalidation.
+	 */
+	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
+}
+
 static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
 {
 	/* We allocate under vma_get_pages, so beware the shrinker */
@@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 		vma->vm->skip_pte_rewrite;
 	trace_i915_vma_unbind(vma);
 
-	unbind_fence = i915_vma_resource_unbind(vma_res);
+	if (async)
+		unbind_fence = i915_vma_resource_unbind(vma_res,
+							&vma->obj->mm.tlb);
+	else
+		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
+
 	vma->resource = NULL;
 
 	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
@@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 
 	i915_vma_detach(vma);
 
-	if (!async && unbind_fence) {
-		dma_fence_wait(unbind_fence, false);
-		dma_fence_put(unbind_fence);
-		unbind_fence = NULL;
+	if (!async) {
+		if (unbind_fence) {
+			dma_fence_wait(unbind_fence, false);
+			dma_fence_put(unbind_fence);
+			unbind_fence = NULL;
+		}
+		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 88ca0bd9c900..5048eed536da 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
 			u64 size, u64 alignment, u64 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 void i915_vma_revoke_mmap(struct i915_vma *vma);
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
 struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
 int __i915_vma_unbind(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
index 27c55027387a..5a67995ea5fe 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.c
+++ b/drivers/gpu/drm/i915/i915_vma_resource.c
@@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
  * Return: A refcounted pointer to a dma-fence that signals when unbinding is
  * complete.
  */
-struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
+struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
+					   u32 *tlb)
 {
 	struct i915_address_space *vm = vma_res->vm;
 
+	vma_res->tlb = tlb;
+
 	/* Reference for the sw fence */
 	i915_vma_resource_get(vma_res);
 
diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
index 5d8427caa2ba..06923d1816e7 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.h
+++ b/drivers/gpu/drm/i915/i915_vma_resource.h
@@ -67,6 +67,7 @@ struct i915_page_sizes {
  * taken when the unbind is scheduled.
  * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
  * needs to be skipped for unbind.
+ * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
  *
  * The lifetime of a struct i915_vma_resource is from a binding request to
  * the actual possible asynchronous unbind has completed.
@@ -119,6 +120,8 @@ struct i915_vma_resource {
 	bool immediate_unbind:1;
 	bool needs_wakeref:1;
 	bool skip_pte_rewrite:1;
+
+	u32 *tlb;
 };
 
 bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
@@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
 
 void i915_vma_resource_free(struct i915_vma_resource *vma_res);
 
-struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
+struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
+					   u32 *tlb);
 
 void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Fei Yang, David Airlie, dri-devel, Daniele Ceraolo Spurio,
	Andrzej Hajda, Sumit Semwal, Michael Cheng, Chris Wilson,
	Ayaz A Siddiqui, Andi Shyti, Dave Airlie, Tomas Winkler,
	Matthew Auld, Thomas Hellström, Casey Bowman,
	Lucas De Marchi, intel-gfx, linaro-mm-sig, Nirmoy Das,
	Rodrigo Vivi, Mauro Carvalho Chehab, Tvrtko Ursulin,
	Michał Winiarski, Tvrtko Ursulin, linux-kernel, stable,
	Ashutosh Dixit, linux-media, Christian König

From: Chris Wilson <chris.p.wilson@intel.com>

Invalidate TLB in batches, in order to reduce performance regressions.

Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.

We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

[mchehab: rebased to not require moving the code to a separate file]

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
 drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
 drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
 drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
 drivers/gpu/drm/i915/i915_vma.c               | 33 +++++++++---
 drivers/gpu/drm/i915/i915_vma.h               |  1 +
 drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
 drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
 10 files changed, 125 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5cf36a130061..9f6b14ec189a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -335,7 +335,6 @@ struct drm_i915_gem_object {
 #define I915_BO_READONLY          BIT(7)
 #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
 #define I915_BO_PROTECTED         BIT(9)
-#define I915_BO_WAS_BOUND_BIT     10
 	/**
 	 * @mem_flags - Mutable placement-related flags
 	 *
@@ -616,6 +615,8 @@ struct drm_i915_gem_object {
 		 * pages were last acquired.
 		 */
 		bool dirty:1;
+
+		u32 tlb;
 	} mm;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 6835279943df..8357dbdcab5c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
 		vunmap(ptr);
 }
 
+static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct intel_gt *gt = to_gt(i915);
+
+	if (!obj->mm.tlb)
+		return;
+
+	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
+	obj->mm.tlb = 0;
+}
+
 struct sg_table *
 __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 {
@@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 	__i915_gem_object_reset_page_iter(obj);
 	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
 
-	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
-		struct drm_i915_private *i915 = to_i915(obj->base.dev);
-		struct intel_gt *gt = to_gt(i915);
-		intel_wakeref_t wakeref;
-
-		with_intel_gt_pm_if_awake(gt, wakeref)
-			intel_gt_invalidate_tlbs(gt);
-	}
+	flush_tlb_invalidate(obj);
 
 	return pages;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 5c55a90672f4..f435e06125aa 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 {
 	spin_lock_init(&gt->irq_lock);
 
-	mutex_init(&gt->tlb_invalidate_lock);
-
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
@@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
 	intel_gt_init_timelines(gt);
+	mutex_init(&gt->tlb.invalidate_lock);
+	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
 	intel_gt_pm_init_early(gt);
 
 	intel_uc_init_early(&gt->uc);
@@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
 		intel_gt_fini_requests(gt);
 		intel_gt_fini_reset(gt);
 		intel_gt_fini_timelines(gt);
+		mutex_destroy(&gt->tlb.invalidate_lock);
 		intel_engines_free(gt);
 	}
 }
@@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
 	return rb;
 }
 
-void intel_gt_invalidate_tlbs(struct intel_gt *gt)
+static void mmio_invalidate_full(struct intel_gt *gt)
 {
 	static const i915_reg_t gen8_regs[] = {
 		[RENDER_CLASS]			= GEN8_RTCR,
@@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	const i915_reg_t *regs;
 	unsigned int num = 0;
 
-	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
-		return;
-
-	if (intel_gt_is_wedged(gt))
-		return;
-
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
@@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 			  "Platform does not implement TLB invalidation!"))
 		return;
 
-	GEM_TRACE("\n");
-
-	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
 	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
@@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	GT_TRACE(gt, "invalidated engines %08x\n", awake);
+
 	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
 	if (awake &&
 	    (IS_TIGERLAKE(i915) ||
@@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	 * transitions.
 	 */
 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
-	mutex_unlock(&gt->tlb_invalidate_lock);
+}
+
+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
+{
+	u32 cur = intel_gt_tlb_seqno(gt);
+
+	/* Only skip if a *full* TLB invalidate barrier has passed */
+	return (s32)(cur - ALIGN(seqno, 2)) > 0;
+}
+
+void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
+{
+	intel_wakeref_t wakeref;
+
+	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
+		return;
+
+	if (intel_gt_is_wedged(gt))
+		return;
+
+	if (tlb_seqno_passed(gt, seqno))
+		return;
+
+	with_intel_gt_pm_if_awake(gt, wakeref) {
+		mutex_lock(&gt->tlb.invalidate_lock);
+		if (tlb_seqno_passed(gt, seqno))
+			goto unlock;
+
+		mmio_invalidate_full(gt);
+
+		write_seqcount_invalidate(&gt->tlb.seqno);
+unlock:
+		mutex_unlock(&gt->tlb.invalidate_lock);
+	}
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 82d6f248d876..40b06adf509a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
 void intel_gt_watchdog_work(struct work_struct *work);
 
-void intel_gt_invalidate_tlbs(struct intel_gt *gt);
+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
+{
+	return seqprop_sequence(&gt->tlb.seqno);
+}
+
+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
+{
+	return intel_gt_tlb_seqno(gt) | 1;
+}
+
+void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
 
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index df708802889d..3804a583382b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -11,6 +11,7 @@
 #include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
+#include <linux/seqlock.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
@@ -83,7 +84,22 @@ struct intel_gt {
 	struct intel_uc uc;
 	struct intel_gsc gsc;
 
-	struct mutex tlb_invalidate_lock;
+	struct {
+		/* Serialize global tlb invalidations */
+		struct mutex invalidate_lock;
+
+		/*
+		 * Batch TLB invalidations
+		 *
+		 * After unbinding the PTE, we need to ensure the TLB
+		 * are invalidated prior to releasing the physical pages.
+		 * But we only need one such invalidation for all unbinds,
+		 * so we track how many TLB invalidations have been
+		 * performed since unbind the PTE and only emit an extra
+		 * invalidate if no full barrier has been passed.
+		 */
+		seqcount_mutex_t seqno;
+	} tlb;
 
 	struct i915_wa_list wa_list;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index d8b94d638559..2da6c82a8bd2 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
 void ppgtt_unbind_vma(struct i915_address_space *vm,
 		      struct i915_vma_resource *vma_res)
 {
-	if (vma_res->allocated)
-		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
+	if (!vma_res->allocated)
+		return;
+
+	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
+	if (vma_res->tlb)
+		vma_invalidate_tlb(vm, *vma_res->tlb);
 }
 
 static unsigned long pd_count(u64 size, int shift)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ef3b04c7e153..84a9ccbc5fc5 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
 				   bind_flags);
 	}
 
-	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
-
 	atomic_or(bind_flags, &vma->flags);
 	return 0;
 }
@@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
 	return err;
 }
 
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
+{
+	/*
+	 * Before we release the pages that were bound by this vma, we
+	 * must invalidate all the TLBs that may still have a reference
+	 * back to our physical address. It only needs to be done once,
+	 * so after updating the PTE to point away from the pages, record
+	 * the most recent TLB invalidation seqno, and if we have not yet
+	 * flushed the TLBs upon release, perform a full invalidation.
+	 */
+	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
+}
+
 static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
 {
 	/* We allocate under vma_get_pages, so beware the shrinker */
@@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 		vma->vm->skip_pte_rewrite;
 	trace_i915_vma_unbind(vma);
 
-	unbind_fence = i915_vma_resource_unbind(vma_res);
+	if (async)
+		unbind_fence = i915_vma_resource_unbind(vma_res,
+							&vma->obj->mm.tlb);
+	else
+		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
+
 	vma->resource = NULL;
 
 	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
@@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 
 	i915_vma_detach(vma);
 
-	if (!async && unbind_fence) {
-		dma_fence_wait(unbind_fence, false);
-		dma_fence_put(unbind_fence);
-		unbind_fence = NULL;
+	if (!async) {
+		if (unbind_fence) {
+			dma_fence_wait(unbind_fence, false);
+			dma_fence_put(unbind_fence);
+			unbind_fence = NULL;
+		}
+		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 88ca0bd9c900..5048eed536da 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
 			u64 size, u64 alignment, u64 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 void i915_vma_revoke_mmap(struct i915_vma *vma);
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
 struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
 int __i915_vma_unbind(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
index 27c55027387a..5a67995ea5fe 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.c
+++ b/drivers/gpu/drm/i915/i915_vma_resource.c
@@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
  * Return: A refcounted pointer to a dma-fence that signals when unbinding is
  * complete.
  */
-struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
+struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
+					   u32 *tlb)
 {
 	struct i915_address_space *vm = vma_res->vm;
 
+	vma_res->tlb = tlb;
+
 	/* Reference for the sw fence */
 	i915_vma_resource_get(vma_res);
 
diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
index 5d8427caa2ba..06923d1816e7 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.h
+++ b/drivers/gpu/drm/i915/i915_vma_resource.h
@@ -67,6 +67,7 @@ struct i915_page_sizes {
  * taken when the unbind is scheduled.
  * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
  * needs to be skipped for unbind.
+ * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
  *
  * The lifetime of a struct i915_vma_resource is from a binding request to
  * the actual possible asynchronous unbind has completed.
@@ -119,6 +120,8 @@ struct i915_vma_resource {
 	bool immediate_unbind:1;
 	bool needs_wakeref:1;
 	bool skip_pte_rewrite:1;
+
+	u32 *tlb;
 };
 
 bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
@@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
 
 void i915_vma_resource_free(struct i915_vma_resource *vma_res);
 
-struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
+struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
+					   u32 *tlb);
 
 void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: David Airlie, dri-devel, Andrzej Hajda, Sumit Semwal,
	Michael Cheng, Chris Wilson, Dave Airlie, Tomas Winkler,
	Matthew Auld, Thomas Hellström, Lucas De Marchi, intel-gfx,
	linaro-mm-sig, Rodrigo Vivi, Mauro Carvalho Chehab,
	Michał Winiarski, linux-kernel, stable, linux-media,
	Christian König

From: Chris Wilson <chris.p.wilson@intel.com>

Invalidate TLB in batches, in order to reduce performance regressions.

Currently, every caller performs a full barrier around a TLB
invalidation, ignoring all other invalidations that may have already
removed their PTEs from the cache. As this is a synchronous operation
and can be quite slow, we cause multiple threads to contend on the TLB
invalidate mutex blocking userspace.

We only need to invalidate the TLB once after replacing our PTE to
ensure that there is no possible continued access to the physical
address before releasing our pages. By tracking a seqno for each full
TLB invalidate we can quickly determine if one has been performed since
rewriting the PTE, and only if necessary trigger one for ourselves.

That helps to reduce the performance regression introduced by TLB
invalidate logic.

[mchehab: rebased to not require moving the code to a separate file]

Cc: stable@vger.kernel.org
Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
 drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
 drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
 drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
 drivers/gpu/drm/i915/i915_vma.c               | 33 +++++++++---
 drivers/gpu/drm/i915/i915_vma.h               |  1 +
 drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
 drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
 10 files changed, 125 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 5cf36a130061..9f6b14ec189a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -335,7 +335,6 @@ struct drm_i915_gem_object {
 #define I915_BO_READONLY          BIT(7)
 #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
 #define I915_BO_PROTECTED         BIT(9)
-#define I915_BO_WAS_BOUND_BIT     10
 	/**
 	 * @mem_flags - Mutable placement-related flags
 	 *
@@ -616,6 +615,8 @@ struct drm_i915_gem_object {
 		 * pages were last acquired.
 		 */
 		bool dirty:1;
+
+		u32 tlb;
 	} mm;
 
 	struct {
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 6835279943df..8357dbdcab5c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
 		vunmap(ptr);
 }
 
+static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct intel_gt *gt = to_gt(i915);
+
+	if (!obj->mm.tlb)
+		return;
+
+	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
+	obj->mm.tlb = 0;
+}
+
 struct sg_table *
 __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 {
@@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
 	__i915_gem_object_reset_page_iter(obj);
 	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
 
-	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
-		struct drm_i915_private *i915 = to_i915(obj->base.dev);
-		struct intel_gt *gt = to_gt(i915);
-		intel_wakeref_t wakeref;
-
-		with_intel_gt_pm_if_awake(gt, wakeref)
-			intel_gt_invalidate_tlbs(gt);
-	}
+	flush_tlb_invalidate(obj);
 
 	return pages;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index 5c55a90672f4..f435e06125aa 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 {
 	spin_lock_init(&gt->irq_lock);
 
-	mutex_init(&gt->tlb_invalidate_lock);
-
 	INIT_LIST_HEAD(&gt->closed_vma);
 	spin_lock_init(&gt->closed_lock);
 
@@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
 	intel_gt_init_reset(gt);
 	intel_gt_init_requests(gt);
 	intel_gt_init_timelines(gt);
+	mutex_init(&gt->tlb.invalidate_lock);
+	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
 	intel_gt_pm_init_early(gt);
 
 	intel_uc_init_early(&gt->uc);
@@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
 		intel_gt_fini_requests(gt);
 		intel_gt_fini_reset(gt);
 		intel_gt_fini_timelines(gt);
+		mutex_destroy(&gt->tlb.invalidate_lock);
 		intel_engines_free(gt);
 	}
 }
@@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
 	return rb;
 }
 
-void intel_gt_invalidate_tlbs(struct intel_gt *gt)
+static void mmio_invalidate_full(struct intel_gt *gt)
 {
 	static const i915_reg_t gen8_regs[] = {
 		[RENDER_CLASS]			= GEN8_RTCR,
@@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	const i915_reg_t *regs;
 	unsigned int num = 0;
 
-	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
-		return;
-
-	if (intel_gt_is_wedged(gt))
-		return;
-
 	if (GRAPHICS_VER(i915) == 12) {
 		regs = gen12_regs;
 		num = ARRAY_SIZE(gen12_regs);
@@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 			  "Platform does not implement TLB invalidation!"))
 		return;
 
-	GEM_TRACE("\n");
-
-	mutex_lock(&gt->tlb_invalidate_lock);
 	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
 
 	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
@@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 		awake |= engine->mask;
 	}
 
+	GT_TRACE(gt, "invalidated engines %08x\n", awake);
+
 	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
 	if (awake &&
 	    (IS_TIGERLAKE(i915) ||
@@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
 	 * transitions.
 	 */
 	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
-	mutex_unlock(&gt->tlb_invalidate_lock);
+}
+
+static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
+{
+	u32 cur = intel_gt_tlb_seqno(gt);
+
+	/* Only skip if a *full* TLB invalidate barrier has passed */
+	return (s32)(cur - ALIGN(seqno, 2)) > 0;
+}
+
+void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
+{
+	intel_wakeref_t wakeref;
+
+	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
+		return;
+
+	if (intel_gt_is_wedged(gt))
+		return;
+
+	if (tlb_seqno_passed(gt, seqno))
+		return;
+
+	with_intel_gt_pm_if_awake(gt, wakeref) {
+		mutex_lock(&gt->tlb.invalidate_lock);
+		if (tlb_seqno_passed(gt, seqno))
+			goto unlock;
+
+		mmio_invalidate_full(gt);
+
+		write_seqcount_invalidate(&gt->tlb.seqno);
+unlock:
+		mutex_unlock(&gt->tlb.invalidate_lock);
+	}
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index 82d6f248d876..40b06adf509a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
 
 void intel_gt_watchdog_work(struct work_struct *work);
 
-void intel_gt_invalidate_tlbs(struct intel_gt *gt);
+static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
+{
+	return seqprop_sequence(&gt->tlb.seqno);
+}
+
+static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
+{
+	return intel_gt_tlb_seqno(gt) | 1;
+}
+
+void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
 
 #endif /* __INTEL_GT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index df708802889d..3804a583382b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -11,6 +11,7 @@
 #include <linux/llist.h>
 #include <linux/mutex.h>
 #include <linux/notifier.h>
+#include <linux/seqlock.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
@@ -83,7 +84,22 @@ struct intel_gt {
 	struct intel_uc uc;
 	struct intel_gsc gsc;
 
-	struct mutex tlb_invalidate_lock;
+	struct {
+		/* Serialize global tlb invalidations */
+		struct mutex invalidate_lock;
+
+		/*
+		 * Batch TLB invalidations
+		 *
+		 * After unbinding the PTE, we need to ensure the TLB
+		 * are invalidated prior to releasing the physical pages.
+		 * But we only need one such invalidation for all unbinds,
+		 * so we track how many TLB invalidations have been
+		 * performed since unbind the PTE and only emit an extra
+		 * invalidate if no full barrier has been passed.
+		 */
+		seqcount_mutex_t seqno;
+	} tlb;
 
 	struct i915_wa_list wa_list;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
index d8b94d638559..2da6c82a8bd2 100644
--- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
@@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
 void ppgtt_unbind_vma(struct i915_address_space *vm,
 		      struct i915_vma_resource *vma_res)
 {
-	if (vma_res->allocated)
-		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
+	if (!vma_res->allocated)
+		return;
+
+	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
+	if (vma_res->tlb)
+		vma_invalidate_tlb(vm, *vma_res->tlb);
 }
 
 static unsigned long pd_count(u64 size, int shift)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ef3b04c7e153..84a9ccbc5fc5 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
 				   bind_flags);
 	}
 
-	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
-
 	atomic_or(bind_flags, &vma->flags);
 	return 0;
 }
@@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
 	return err;
 }
 
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
+{
+	/*
+	 * Before we release the pages that were bound by this vma, we
+	 * must invalidate all the TLBs that may still have a reference
+	 * back to our physical address. It only needs to be done once,
+	 * so after updating the PTE to point away from the pages, record
+	 * the most recent TLB invalidation seqno, and if we have not yet
+	 * flushed the TLBs upon release, perform a full invalidation.
+	 */
+	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
+}
+
 static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
 {
 	/* We allocate under vma_get_pages, so beware the shrinker */
@@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 		vma->vm->skip_pte_rewrite;
 	trace_i915_vma_unbind(vma);
 
-	unbind_fence = i915_vma_resource_unbind(vma_res);
+	if (async)
+		unbind_fence = i915_vma_resource_unbind(vma_res,
+							&vma->obj->mm.tlb);
+	else
+		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
+
 	vma->resource = NULL;
 
 	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
@@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
 
 	i915_vma_detach(vma);
 
-	if (!async && unbind_fence) {
-		dma_fence_wait(unbind_fence, false);
-		dma_fence_put(unbind_fence);
-		unbind_fence = NULL;
+	if (!async) {
+		if (unbind_fence) {
+			dma_fence_wait(unbind_fence, false);
+			dma_fence_put(unbind_fence);
+			unbind_fence = NULL;
+		}
+		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
 	}
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
index 88ca0bd9c900..5048eed536da 100644
--- a/drivers/gpu/drm/i915/i915_vma.h
+++ b/drivers/gpu/drm/i915/i915_vma.h
@@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
 			u64 size, u64 alignment, u64 flags);
 void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
 void i915_vma_revoke_mmap(struct i915_vma *vma);
+void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
 struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
 int __i915_vma_unbind(struct i915_vma *vma);
 int __must_check i915_vma_unbind(struct i915_vma *vma);
diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
index 27c55027387a..5a67995ea5fe 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.c
+++ b/drivers/gpu/drm/i915/i915_vma_resource.c
@@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
  * Return: A refcounted pointer to a dma-fence that signals when unbinding is
  * complete.
  */
-struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
+struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
+					   u32 *tlb)
 {
 	struct i915_address_space *vm = vma_res->vm;
 
+	vma_res->tlb = tlb;
+
 	/* Reference for the sw fence */
 	i915_vma_resource_get(vma_res);
 
diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
index 5d8427caa2ba..06923d1816e7 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.h
+++ b/drivers/gpu/drm/i915/i915_vma_resource.h
@@ -67,6 +67,7 @@ struct i915_page_sizes {
  * taken when the unbind is scheduled.
  * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
  * needs to be skipped for unbind.
+ * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
  *
  * The lifetime of a struct i915_vma_resource is from a binding request to
  * the actual possible asynchronous unbind has completed.
@@ -119,6 +120,8 @@ struct i915_vma_resource {
 	bool immediate_unbind:1;
 	bool needs_wakeref:1;
 	bool skip_pte_rewrite:1;
+
+	u32 *tlb;
 };
 
 bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
@@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
 
 void i915_vma_resource_free(struct i915_vma_resource *vma_res);
 
-struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
+struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
+					   u32 *tlb);
 
 void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource
  2022-07-27 12:29 ` Mauro Carvalho Chehab
  (?)
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Mauro Carvalho Chehab, Daniel Vetter, David Airlie, Jani Nikula,
	Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, dri-devel,
	intel-gfx, linux-kernel

TLB cache invalidation can happen on two different situations:

1. synchronously, at __vma_put_pages();
2. asynchronously.

On the first case, TLB cache invalidation happens inside
__vma_put_pages(). So, no need to do it later on.

However, on the second case, the pages will keep in memory
until __i915_vma_evict() is called.

So, we need to store the TLB data at struct i915_vma_resource,
in order to do a TLB cache invalidation before allowing
userspace to re-use the same memory.

So, i915_vma_resource_unbind() has gained a new parameter
in order to store the TLB data at the second case.

Document it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
index 5a67995ea5fe..4fe09ea0a825 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.c
+++ b/drivers/gpu/drm/i915/i915_vma_resource.c
@@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
 /**
  * i915_vma_resource_unbind - Unbind a vma resource
  * @vma_res: The vma resource to unbind.
+ * @tlb: pointer to vma->obj->mm.tlb associated with the resource
+ *	 to be stored at vma_res->tlb. When not-NULL, it will be used
+ *	 to do TLB cache invalidation before freeing a VMA resource.
+ *	 used only for async unbind.
  *
  * At this point this function does little more than publish a fence that
  * signals immediately unless signaling is held back.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel,
	Rodrigo Vivi, Mauro Carvalho Chehab

TLB cache invalidation can happen on two different situations:

1. synchronously, at __vma_put_pages();
2. asynchronously.

On the first case, TLB cache invalidation happens inside
__vma_put_pages(). So, no need to do it later on.

However, on the second case, the pages will keep in memory
until __i915_vma_evict() is called.

So, we need to store the TLB data at struct i915_vma_resource,
in order to do a TLB cache invalidation before allowing
userspace to re-use the same memory.

So, i915_vma_resource_unbind() has gained a new parameter
in order to store the TLB data at the second case.

Document it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
index 5a67995ea5fe..4fe09ea0a825 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.c
+++ b/drivers/gpu/drm/i915/i915_vma_resource.c
@@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
 /**
  * i915_vma_resource_unbind - Unbind a vma resource
  * @vma_res: The vma resource to unbind.
+ * @tlb: pointer to vma->obj->mm.tlb associated with the resource
+ *	 to be stored at vma_res->tlb. When not-NULL, it will be used
+ *	 to do TLB cache invalidation before freeing a VMA resource.
+ *	 used only for async unbind.
  *
  * At this point this function does little more than publish a fence that
  * signals immediately unless signaling is held back.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource
@ 2022-07-27 12:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw)
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi,
	Mauro Carvalho Chehab

TLB cache invalidation can happen on two different situations:

1. synchronously, at __vma_put_pages();
2. asynchronously.

On the first case, TLB cache invalidation happens inside
__vma_put_pages(). So, no need to do it later on.

However, on the second case, the pages will keep in memory
until __i915_vma_evict() is called.

So, we need to store the TLB data at struct i915_vma_resource,
in order to do a TLB cache invalidation before allowing
userspace to re-use the same memory.

So, i915_vma_resource_unbind() has gained a new parameter
in order to store the TLB data at the second case.

Document it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
---

To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/

 drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
index 5a67995ea5fe..4fe09ea0a825 100644
--- a/drivers/gpu/drm/i915/i915_vma_resource.c
+++ b/drivers/gpu/drm/i915/i915_vma_resource.c
@@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
 /**
  * i915_vma_resource_unbind - Unbind a vma resource
  * @vma_res: The vma resource to unbind.
+ * @tlb: pointer to vma->obj->mm.tlb associated with the resource
+ *	 to be stored at vma_res->tlb. When not-NULL, it will be used
+ *	 to do TLB cache invalidation before freeing a VMA resource.
+ *	 used only for async unbind.
  *
  * At this point this function does little more than publish a fence that
  * signals immediately unless signaling is held back.
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: reduce TLB performance regressions
  2022-07-27 12:29 ` Mauro Carvalho Chehab
                   ` (7 preceding siblings ...)
  (?)
@ 2022-07-27 13:12 ` Patchwork
  -1 siblings, 0 replies; 35+ messages in thread
From: Patchwork @ 2022-07-27 13:12 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: reduce TLB performance regressions
URL   : https://patchwork.freedesktop.org/series/106758/
State : warning

== Summary ==

Error: dim checkpatch failed
735755e9d5d5 drm/i915/gt: Ignore TLB invalidations on idle engines
-:138: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'gt' - possible side-effects?
#138: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:58:
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)

-:138: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'wf' - possible side-effects?
#138: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:58:
+#define with_intel_gt_pm_if_awake(gt, wf) \
+	for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0)

total: 0 errors, 0 warnings, 2 checks, 99 lines checked
225d8336f971 drm/i915/gt: document with_intel_gt_pm_if_awake()
14c4636e0625 drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
2bbb43b7f32b drm/i915/gt: Skip TLB invalidations once wedged
0cb17ababb41 drm/i915/gt: Batch TLB invalidations
051e0bf95aa1 drm/i915/gt: describe the new tlb parameter at i915_vma_resource



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: reduce TLB performance regressions
  2022-07-27 12:29 ` Mauro Carvalho Chehab
                   ` (8 preceding siblings ...)
  (?)
@ 2022-07-27 13:37 ` Patchwork
  -1 siblings, 0 replies; 35+ messages in thread
From: Patchwork @ 2022-07-27 13:37 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 8435 bytes --]

== Series Details ==

Series: drm/i915: reduce TLB performance regressions
URL   : https://patchwork.freedesktop.org/series/106758/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11946 -> Patchwork_106758v1
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/index.html

Participating hosts (38 -> 39)
------------------------------

  Additional (2): fi-hsw-4770 bat-jsl-1 
  Missing    (1): fi-bdw-samus 

Known issues
------------

  Here are the changes found in Patchwork_106758v1 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_suspend@basic-s3@smem:
    - fi-rkl-11600:       NOTRUN -> [FAIL][1] ([fdo#103375])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-rkl-11600/igt@gem_exec_suspend@basic-s3@smem.html

  * igt@i915_pm_backlight@basic-brightness:
    - fi-hsw-4770:        NOTRUN -> [SKIP][2] ([fdo#109271] / [i915#3012])
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-hsw-4770/igt@i915_pm_backlight@basic-brightness.html

  * igt@i915_selftest@live@gem:
    - fi-pnv-d510:        NOTRUN -> [DMESG-FAIL][3] ([i915#4528])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-pnv-d510/igt@i915_selftest@live@gem.html

  * igt@i915_selftest@live@requests:
    - fi-blb-e6850:       [PASS][4] -> [DMESG-FAIL][5] ([i915#4528])
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-blb-e6850/igt@i915_selftest@live@requests.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-blb-e6850/igt@i915_selftest@live@requests.html

  * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy:
    - fi-hsw-4770:        NOTRUN -> [SKIP][6] ([fdo#109271]) +9 similar issues
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-hsw-4770/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-rkl-11600:       NOTRUN -> [SKIP][7] ([fdo#111827])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-rkl-11600/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_chamelium@dp-crc-fast:
    - fi-hsw-4770:        NOTRUN -> [SKIP][8] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-hsw-4770/igt@kms_chamelium@dp-crc-fast.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions:
    - fi-bsw-kefka:       [PASS][9] -> [FAIL][10] ([i915#6298])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html

  * igt@kms_psr@sprite_plane_onoff:
    - fi-hsw-4770:        NOTRUN -> [SKIP][11] ([fdo#109271] / [i915#1072]) +3 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-hsw-4770/igt@kms_psr@sprite_plane_onoff.html

  * igt@runner@aborted:
    - fi-blb-e6850:       NOTRUN -> [FAIL][12] ([fdo#109271] / [i915#2403] / [i915#4312])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-blb-e6850/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@debugfs_test@read_all_entries:
    - fi-kbl-guc:         [FAIL][13] ([i915#6253]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-kbl-guc/igt@debugfs_test@read_all_entries.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-kbl-guc/igt@debugfs_test@read_all_entries.html

  * igt@i915_selftest@live@gtt:
    - {bat-dg2-9}:        [DMESG-WARN][15] ([i915#5763]) -> [PASS][16] +3 similar issues
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/bat-dg2-9/igt@i915_selftest@live@gtt.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/bat-dg2-9/igt@i915_selftest@live@gtt.html

  * igt@i915_selftest@live@requests:
    - fi-pnv-d510:        [DMESG-FAIL][17] ([i915#4528]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-pnv-d510/igt@i915_selftest@live@requests.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-pnv-d510/igt@i915_selftest@live@requests.html

  * igt@i915_suspend@basic-s3-without-i915:
    - fi-rkl-11600:       [INCOMPLETE][19] ([i915#5982]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html

  * igt@kms_frontbuffer_tracking@basic:
    - {bat-rpls-2}:       [SKIP][21] ([i915#1849]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/bat-rpls-2/igt@kms_frontbuffer_tracking@basic.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/bat-rpls-2/igt@kms_frontbuffer_tracking@basic.html

  * igt@prime_vgem@basic-fence-flip:
    - {bat-rpls-2}:       [SKIP][23] ([fdo#109295] / [i915#1845] / [i915#3708]) -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/bat-rpls-2/igt@prime_vgem@basic-fence-flip.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/bat-rpls-2/igt@prime_vgem@basic-fence-flip.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403
  [i915#3003]: https://gitlab.freedesktop.org/drm/intel/issues/3003
  [i915#3012]: https://gitlab.freedesktop.org/drm/intel/issues/3012
  [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#5270]: https://gitlab.freedesktop.org/drm/intel/issues/5270
  [i915#5763]: https://gitlab.freedesktop.org/drm/intel/issues/5763
  [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903
  [i915#5950]: https://gitlab.freedesktop.org/drm/intel/issues/5950
  [i915#5982]: https://gitlab.freedesktop.org/drm/intel/issues/5982
  [i915#6253]: https://gitlab.freedesktop.org/drm/intel/issues/6253
  [i915#6298]: https://gitlab.freedesktop.org/drm/intel/issues/6298


Build changes
-------------

  * Linux: CI_DRM_11946 -> Patchwork_106758v1

  CI-20190529: 20190529
  CI_DRM_11946: 0e9c43d76a145712da46e935d429ce2a3eea80e8 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6598: 97e103419021d0863db527e3f2cf39ccdd132db5 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_106758v1: 0e9c43d76a145712da46e935d429ce2a3eea80e8 @ git://anongit.freedesktop.org/gfx-ci/linux


### Linux commits

dc4693d1d1bc drm/i915/gt: describe the new tlb parameter at i915_vma_resource
23abc0dfdc25 drm/i915/gt: Batch TLB invalidations
7131366111a9 drm/i915/gt: Skip TLB invalidations once wedged
7a9475aabf18 drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
96ce5bc9b090 drm/i915/gt: document with_intel_gt_pm_if_awake()
d5f349d2fc2a drm/i915/gt: Ignore TLB invalidations on idle engines

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/index.html

[-- Attachment #2: Type: text/html, Size: 9186 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake()
  2022-07-27 12:29   ` Mauro Carvalho Chehab
@ 2022-07-27 14:18     ` Andi Shyti
  -1 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-27 14:18 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, linux-kernel, Chris Wilson,
	Rodrigo Vivi, intel-gfx

Hi Mauro,

> Add a kernel-doc markup to document this new macro.
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake()
@ 2022-07-27 14:18     ` Andi Shyti
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-27 14:18 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, intel-gfx, linux-kernel, Chris Wilson, dri-devel,
	Rodrigo Vivi

Hi Mauro,

> Add a kernel-doc markup to document this new macro.
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource
  2022-07-27 12:29   ` Mauro Carvalho Chehab
  (?)
  (?)
@ 2022-07-27 14:20   ` Andi Shyti
  -1 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-27 14:20 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi

Hi Mauro,

> TLB cache invalidation can happen on two different situations:
> 
> 1. synchronously, at __vma_put_pages();
> 2. asynchronously.
> 
> On the first case, TLB cache invalidation happens inside
> __vma_put_pages(). So, no need to do it later on.
> 
> However, on the second case, the pages will keep in memory
> until __i915_vma_evict() is called.
> 
> So, we need to store the TLB data at struct i915_vma_resource,
> in order to do a TLB cache invalidation before allowing
> userspace to re-use the same memory.
> 
> So, i915_vma_resource_unbind() has gained a new parameter
> in order to store the TLB data at the second case.
> 
> Document it.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/
> 
>  drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
> index 5a67995ea5fe..4fe09ea0a825 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.c
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.c
> @@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
>  /**
>   * i915_vma_resource_unbind - Unbind a vma resource
>   * @vma_res: The vma resource to unbind.
> + * @tlb: pointer to vma->obj->mm.tlb associated with the resource
> + *	 to be stored at vma_res->tlb. When not-NULL, it will be used
> + *	 to do TLB cache invalidation before freeing a VMA resource.
> + *	 used only for async unbind.

/used/Used/

With that:

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>

Thanks,
Andi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations
  2022-07-27 12:29   ` Mauro Carvalho Chehab
  (?)
@ 2022-07-27 14:25     ` Andi Shyti
  -1 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-27 14:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Chris Wilson, Christian König, Michał Winiarski,
	Thomas Hellström, Andi Shyti, Andrzej Hajda, Ashutosh Dixit,
	Ayaz A Siddiqui, Casey Bowman, Daniel Vetter,
	Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula,
	Joonas Lahtinen, Lucas De Marchi, Maarten Lankhorst, Matt Roper,
	Matthew Auld, Michael Cheng, Nirmoy Das, Ramalingam C,
	Rodrigo Vivi, Sumit Semwal, Tomas Winkler, Tvrtko Ursulin,
	dri-devel, intel-gfx, linaro-mm-sig, linux-kernel, linux-media,
	stable, Tvrtko Ursulin, Fei Yang

Hi Mauro,

I think there are still some unanswered questions from Tvrtko on
this patch, am I right?

Andi

On Wed, Jul 27, 2022 at 02:29:55PM +0200, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Invalidate TLB in batches, in order to reduce performance regressions.
> 
> Currently, every caller performs a full barrier around a TLB
> invalidation, ignoring all other invalidations that may have already
> removed their PTEs from the cache. As this is a synchronous operation
> and can be quite slow, we cause multiple threads to contend on the TLB
> invalidate mutex blocking userspace.
> 
> We only need to invalidate the TLB once after replacing our PTE to
> ensure that there is no possible continued access to the physical
> address before releasing our pages. By tracking a seqno for each full
> TLB invalidate we can quickly determine if one has been performed since
> rewriting the PTE, and only if necessary trigger one for ourselves.
> 
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> [mchehab: rebased to not require moving the code to a separate file]
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/
> 
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
>  drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
>  drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
>  drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
>  drivers/gpu/drm/i915/i915_vma.c               | 33 +++++++++---
>  drivers/gpu/drm/i915/i915_vma.h               |  1 +
>  drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
>  drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
>  10 files changed, 125 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9f6b14ec189a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -335,7 +335,6 @@ struct drm_i915_gem_object {
>  #define I915_BO_READONLY          BIT(7)
>  #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
>  #define I915_BO_PROTECTED         BIT(9)
> -#define I915_BO_WAS_BOUND_BIT     10
>  	/**
>  	 * @mem_flags - Mutable placement-related flags
>  	 *
> @@ -616,6 +615,8 @@ struct drm_i915_gem_object {
>  		 * pages were last acquired.
>  		 */
>  		bool dirty:1;
> +
> +		u32 tlb;
>  	} mm;
>  
>  	struct {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6835279943df..8357dbdcab5c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
>  		vunmap(ptr);
>  }
>  
> +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct intel_gt *gt = to_gt(i915);
> +
> +	if (!obj->mm.tlb)
> +		return;
> +
> +	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
> +	obj->mm.tlb = 0;
> +}
> +
>  struct sg_table *
>  __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>  {
> @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>  	__i915_gem_object_reset_page_iter(obj);
>  	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
>  
> -	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
> -		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> -		struct intel_gt *gt = to_gt(i915);
> -		intel_wakeref_t wakeref;
> -
> -		with_intel_gt_pm_if_awake(gt, wakeref)
> -			intel_gt_invalidate_tlbs(gt);
> -	}
> +	flush_tlb_invalidate(obj);
>  
>  	return pages;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 5c55a90672f4..f435e06125aa 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>  {
>  	spin_lock_init(&gt->irq_lock);
>  
> -	mutex_init(&gt->tlb_invalidate_lock);
> -
>  	INIT_LIST_HEAD(&gt->closed_vma);
>  	spin_lock_init(&gt->closed_lock);
>  
> @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>  	intel_gt_init_reset(gt);
>  	intel_gt_init_requests(gt);
>  	intel_gt_init_timelines(gt);
> +	mutex_init(&gt->tlb.invalidate_lock);
> +	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>  	intel_gt_pm_init_early(gt);
>  
>  	intel_uc_init_early(&gt->uc);
> @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
>  		intel_gt_fini_requests(gt);
>  		intel_gt_fini_reset(gt);
>  		intel_gt_fini_timelines(gt);
> +		mutex_destroy(&gt->tlb.invalidate_lock);
>  		intel_engines_free(gt);
>  	}
>  }
> @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>  	return rb;
>  }
>  
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> +static void mmio_invalidate_full(struct intel_gt *gt)
>  {
>  	static const i915_reg_t gen8_regs[] = {
>  		[RENDER_CLASS]			= GEN8_RTCR,
> @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	const i915_reg_t *regs;
>  	unsigned int num = 0;
>  
> -	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> -		return;
> -
> -	if (intel_gt_is_wedged(gt))
> -		return;
> -
>  	if (GRAPHICS_VER(i915) == 12) {
>  		regs = gen12_regs;
>  		num = ARRAY_SIZE(gen12_regs);
> @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  			  "Platform does not implement TLB invalidation!"))
>  		return;
>  
> -	GEM_TRACE("\n");
> -
> -	mutex_lock(&gt->tlb_invalidate_lock);
>  	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>  
>  	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  		awake |= engine->mask;
>  	}
>  
> +	GT_TRACE(gt, "invalidated engines %08x\n", awake);
> +
>  	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>  	if (awake &&
>  	    (IS_TIGERLAKE(i915) ||
> @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	 * transitions.
>  	 */
>  	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> -	mutex_unlock(&gt->tlb_invalidate_lock);
> +}
> +
> +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
> +{
> +	u32 cur = intel_gt_tlb_seqno(gt);
> +
> +	/* Only skip if a *full* TLB invalidate barrier has passed */
> +	return (s32)(cur - ALIGN(seqno, 2)) > 0;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
> +{
> +	intel_wakeref_t wakeref;
> +
> +	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> +		return;
> +
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
> +	if (tlb_seqno_passed(gt, seqno))
> +		return;
> +
> +	with_intel_gt_pm_if_awake(gt, wakeref) {
> +		mutex_lock(&gt->tlb.invalidate_lock);
> +		if (tlb_seqno_passed(gt, seqno))
> +			goto unlock;
> +
> +		mmio_invalidate_full(gt);
> +
> +		write_seqcount_invalidate(&gt->tlb.seqno);
> +unlock:
> +		mutex_unlock(&gt->tlb.invalidate_lock);
> +	}
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index 82d6f248d876..40b06adf509a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>  
>  void intel_gt_watchdog_work(struct work_struct *work);
>  
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt);
> +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> +{
> +	return seqprop_sequence(&gt->tlb.seqno);
> +}
> +
> +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> +{
> +	return intel_gt_tlb_seqno(gt) | 1;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
>  
>  #endif /* __INTEL_GT_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index df708802889d..3804a583382b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -11,6 +11,7 @@
>  #include <linux/llist.h>
>  #include <linux/mutex.h>
>  #include <linux/notifier.h>
> +#include <linux/seqlock.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
> @@ -83,7 +84,22 @@ struct intel_gt {
>  	struct intel_uc uc;
>  	struct intel_gsc gsc;
>  
> -	struct mutex tlb_invalidate_lock;
> +	struct {
> +		/* Serialize global tlb invalidations */
> +		struct mutex invalidate_lock;
> +
> +		/*
> +		 * Batch TLB invalidations
> +		 *
> +		 * After unbinding the PTE, we need to ensure the TLB
> +		 * are invalidated prior to releasing the physical pages.
> +		 * But we only need one such invalidation for all unbinds,
> +		 * so we track how many TLB invalidations have been
> +		 * performed since unbind the PTE and only emit an extra
> +		 * invalidate if no full barrier has been passed.
> +		 */
> +		seqcount_mutex_t seqno;
> +	} tlb;
>  
>  	struct i915_wa_list wa_list;
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index d8b94d638559..2da6c82a8bd2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>  void ppgtt_unbind_vma(struct i915_address_space *vm,
>  		      struct i915_vma_resource *vma_res)
>  {
> -	if (vma_res->allocated)
> -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (!vma_res->allocated)
> +		return;
> +
> +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (vma_res->tlb)
> +		vma_invalidate_tlb(vm, *vma_res->tlb);
>  }
>  
>  static unsigned long pd_count(u64 size, int shift)
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index ef3b04c7e153..84a9ccbc5fc5 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
>  				   bind_flags);
>  	}
>  
> -	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> -
>  	atomic_or(bind_flags, &vma->flags);
>  	return 0;
>  }
> @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
>  	return err;
>  }
>  
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> +{
> +	/*
> +	 * Before we release the pages that were bound by this vma, we
> +	 * must invalidate all the TLBs that may still have a reference
> +	 * back to our physical address. It only needs to be done once,
> +	 * so after updating the PTE to point away from the pages, record
> +	 * the most recent TLB invalidation seqno, and if we have not yet
> +	 * flushed the TLBs upon release, perform a full invalidation.
> +	 */
> +	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> +}
> +
>  static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
>  {
>  	/* We allocate under vma_get_pages, so beware the shrinker */
> @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>  		vma->vm->skip_pte_rewrite;
>  	trace_i915_vma_unbind(vma);
>  
> -	unbind_fence = i915_vma_resource_unbind(vma_res);
> +	if (async)
> +		unbind_fence = i915_vma_resource_unbind(vma_res,
> +							&vma->obj->mm.tlb);
> +	else
> +		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
> +
>  	vma->resource = NULL;
>  
>  	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
> @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>  
>  	i915_vma_detach(vma);
>  
> -	if (!async && unbind_fence) {
> -		dma_fence_wait(unbind_fence, false);
> -		dma_fence_put(unbind_fence);
> -		unbind_fence = NULL;
> +	if (!async) {
> +		if (unbind_fence) {
> +			dma_fence_wait(unbind_fence, false);
> +			dma_fence_put(unbind_fence);
> +			unbind_fence = NULL;
> +		}
> +		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
>  	}
>  
>  	/*
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 88ca0bd9c900..5048eed536da 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
>  			u64 size, u64 alignment, u64 flags);
>  void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
>  void i915_vma_revoke_mmap(struct i915_vma *vma);
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
>  struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
>  int __i915_vma_unbind(struct i915_vma *vma);
>  int __must_check i915_vma_unbind(struct i915_vma *vma);
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
> index 27c55027387a..5a67995ea5fe 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.c
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.c
> @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
>   * Return: A refcounted pointer to a dma-fence that signals when unbinding is
>   * complete.
>   */
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb)
>  {
>  	struct i915_address_space *vm = vma_res->vm;
>  
> +	vma_res->tlb = tlb;
> +
>  	/* Reference for the sw fence */
>  	i915_vma_resource_get(vma_res);
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
> index 5d8427caa2ba..06923d1816e7 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.h
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.h
> @@ -67,6 +67,7 @@ struct i915_page_sizes {
>   * taken when the unbind is scheduled.
>   * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
>   * needs to be skipped for unbind.
> + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
>   *
>   * The lifetime of a struct i915_vma_resource is from a binding request to
>   * the actual possible asynchronous unbind has completed.
> @@ -119,6 +120,8 @@ struct i915_vma_resource {
>  	bool immediate_unbind:1;
>  	bool needs_wakeref:1;
>  	bool skip_pte_rewrite:1;
> +
> +	u32 *tlb;
>  };
>  
>  bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
> @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
>  
>  void i915_vma_resource_free(struct i915_vma_resource *vma_res);
>  
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb);
>  
>  void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
>  
> -- 
> 2.36.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations
@ 2022-07-27 14:25     ` Andi Shyti
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-27 14:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Fei Yang, David Airlie, dri-devel, Daniele Ceraolo Spurio,
	Andrzej Hajda, Sumit Semwal, Michael Cheng, Chris Wilson,
	Ayaz A Siddiqui, Andi Shyti, Dave Airlie, Tomas Winkler,
	Matthew Auld, Thomas Hellström, Casey Bowman,
	Lucas De Marchi, intel-gfx, linaro-mm-sig, Nirmoy Das,
	Rodrigo Vivi, Tvrtko Ursulin, Michał Winiarski,
	Tvrtko Ursulin, linux-kernel, stable, Ashutosh Dixit,
	linux-media, Christian König

Hi Mauro,

I think there are still some unanswered questions from Tvrtko on
this patch, am I right?

Andi

On Wed, Jul 27, 2022 at 02:29:55PM +0200, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Invalidate TLB in batches, in order to reduce performance regressions.
> 
> Currently, every caller performs a full barrier around a TLB
> invalidation, ignoring all other invalidations that may have already
> removed their PTEs from the cache. As this is a synchronous operation
> and can be quite slow, we cause multiple threads to contend on the TLB
> invalidate mutex blocking userspace.
> 
> We only need to invalidate the TLB once after replacing our PTE to
> ensure that there is no possible continued access to the physical
> address before releasing our pages. By tracking a seqno for each full
> TLB invalidate we can quickly determine if one has been performed since
> rewriting the PTE, and only if necessary trigger one for ourselves.
> 
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> [mchehab: rebased to not require moving the code to a separate file]
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/
> 
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
>  drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
>  drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
>  drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
>  drivers/gpu/drm/i915/i915_vma.c               | 33 +++++++++---
>  drivers/gpu/drm/i915/i915_vma.h               |  1 +
>  drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
>  drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
>  10 files changed, 125 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9f6b14ec189a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -335,7 +335,6 @@ struct drm_i915_gem_object {
>  #define I915_BO_READONLY          BIT(7)
>  #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
>  #define I915_BO_PROTECTED         BIT(9)
> -#define I915_BO_WAS_BOUND_BIT     10
>  	/**
>  	 * @mem_flags - Mutable placement-related flags
>  	 *
> @@ -616,6 +615,8 @@ struct drm_i915_gem_object {
>  		 * pages were last acquired.
>  		 */
>  		bool dirty:1;
> +
> +		u32 tlb;
>  	} mm;
>  
>  	struct {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6835279943df..8357dbdcab5c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
>  		vunmap(ptr);
>  }
>  
> +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct intel_gt *gt = to_gt(i915);
> +
> +	if (!obj->mm.tlb)
> +		return;
> +
> +	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
> +	obj->mm.tlb = 0;
> +}
> +
>  struct sg_table *
>  __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>  {
> @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>  	__i915_gem_object_reset_page_iter(obj);
>  	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
>  
> -	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
> -		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> -		struct intel_gt *gt = to_gt(i915);
> -		intel_wakeref_t wakeref;
> -
> -		with_intel_gt_pm_if_awake(gt, wakeref)
> -			intel_gt_invalidate_tlbs(gt);
> -	}
> +	flush_tlb_invalidate(obj);
>  
>  	return pages;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 5c55a90672f4..f435e06125aa 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>  {
>  	spin_lock_init(&gt->irq_lock);
>  
> -	mutex_init(&gt->tlb_invalidate_lock);
> -
>  	INIT_LIST_HEAD(&gt->closed_vma);
>  	spin_lock_init(&gt->closed_lock);
>  
> @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>  	intel_gt_init_reset(gt);
>  	intel_gt_init_requests(gt);
>  	intel_gt_init_timelines(gt);
> +	mutex_init(&gt->tlb.invalidate_lock);
> +	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>  	intel_gt_pm_init_early(gt);
>  
>  	intel_uc_init_early(&gt->uc);
> @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
>  		intel_gt_fini_requests(gt);
>  		intel_gt_fini_reset(gt);
>  		intel_gt_fini_timelines(gt);
> +		mutex_destroy(&gt->tlb.invalidate_lock);
>  		intel_engines_free(gt);
>  	}
>  }
> @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>  	return rb;
>  }
>  
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> +static void mmio_invalidate_full(struct intel_gt *gt)
>  {
>  	static const i915_reg_t gen8_regs[] = {
>  		[RENDER_CLASS]			= GEN8_RTCR,
> @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	const i915_reg_t *regs;
>  	unsigned int num = 0;
>  
> -	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> -		return;
> -
> -	if (intel_gt_is_wedged(gt))
> -		return;
> -
>  	if (GRAPHICS_VER(i915) == 12) {
>  		regs = gen12_regs;
>  		num = ARRAY_SIZE(gen12_regs);
> @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  			  "Platform does not implement TLB invalidation!"))
>  		return;
>  
> -	GEM_TRACE("\n");
> -
> -	mutex_lock(&gt->tlb_invalidate_lock);
>  	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>  
>  	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  		awake |= engine->mask;
>  	}
>  
> +	GT_TRACE(gt, "invalidated engines %08x\n", awake);
> +
>  	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>  	if (awake &&
>  	    (IS_TIGERLAKE(i915) ||
> @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	 * transitions.
>  	 */
>  	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> -	mutex_unlock(&gt->tlb_invalidate_lock);
> +}
> +
> +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
> +{
> +	u32 cur = intel_gt_tlb_seqno(gt);
> +
> +	/* Only skip if a *full* TLB invalidate barrier has passed */
> +	return (s32)(cur - ALIGN(seqno, 2)) > 0;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
> +{
> +	intel_wakeref_t wakeref;
> +
> +	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> +		return;
> +
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
> +	if (tlb_seqno_passed(gt, seqno))
> +		return;
> +
> +	with_intel_gt_pm_if_awake(gt, wakeref) {
> +		mutex_lock(&gt->tlb.invalidate_lock);
> +		if (tlb_seqno_passed(gt, seqno))
> +			goto unlock;
> +
> +		mmio_invalidate_full(gt);
> +
> +		write_seqcount_invalidate(&gt->tlb.seqno);
> +unlock:
> +		mutex_unlock(&gt->tlb.invalidate_lock);
> +	}
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index 82d6f248d876..40b06adf509a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>  
>  void intel_gt_watchdog_work(struct work_struct *work);
>  
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt);
> +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> +{
> +	return seqprop_sequence(&gt->tlb.seqno);
> +}
> +
> +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> +{
> +	return intel_gt_tlb_seqno(gt) | 1;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
>  
>  #endif /* __INTEL_GT_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index df708802889d..3804a583382b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -11,6 +11,7 @@
>  #include <linux/llist.h>
>  #include <linux/mutex.h>
>  #include <linux/notifier.h>
> +#include <linux/seqlock.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
> @@ -83,7 +84,22 @@ struct intel_gt {
>  	struct intel_uc uc;
>  	struct intel_gsc gsc;
>  
> -	struct mutex tlb_invalidate_lock;
> +	struct {
> +		/* Serialize global tlb invalidations */
> +		struct mutex invalidate_lock;
> +
> +		/*
> +		 * Batch TLB invalidations
> +		 *
> +		 * After unbinding the PTE, we need to ensure the TLB
> +		 * are invalidated prior to releasing the physical pages.
> +		 * But we only need one such invalidation for all unbinds,
> +		 * so we track how many TLB invalidations have been
> +		 * performed since unbind the PTE and only emit an extra
> +		 * invalidate if no full barrier has been passed.
> +		 */
> +		seqcount_mutex_t seqno;
> +	} tlb;
>  
>  	struct i915_wa_list wa_list;
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index d8b94d638559..2da6c82a8bd2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>  void ppgtt_unbind_vma(struct i915_address_space *vm,
>  		      struct i915_vma_resource *vma_res)
>  {
> -	if (vma_res->allocated)
> -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (!vma_res->allocated)
> +		return;
> +
> +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (vma_res->tlb)
> +		vma_invalidate_tlb(vm, *vma_res->tlb);
>  }
>  
>  static unsigned long pd_count(u64 size, int shift)
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index ef3b04c7e153..84a9ccbc5fc5 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
>  				   bind_flags);
>  	}
>  
> -	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> -
>  	atomic_or(bind_flags, &vma->flags);
>  	return 0;
>  }
> @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
>  	return err;
>  }
>  
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> +{
> +	/*
> +	 * Before we release the pages that were bound by this vma, we
> +	 * must invalidate all the TLBs that may still have a reference
> +	 * back to our physical address. It only needs to be done once,
> +	 * so after updating the PTE to point away from the pages, record
> +	 * the most recent TLB invalidation seqno, and if we have not yet
> +	 * flushed the TLBs upon release, perform a full invalidation.
> +	 */
> +	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> +}
> +
>  static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
>  {
>  	/* We allocate under vma_get_pages, so beware the shrinker */
> @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>  		vma->vm->skip_pte_rewrite;
>  	trace_i915_vma_unbind(vma);
>  
> -	unbind_fence = i915_vma_resource_unbind(vma_res);
> +	if (async)
> +		unbind_fence = i915_vma_resource_unbind(vma_res,
> +							&vma->obj->mm.tlb);
> +	else
> +		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
> +
>  	vma->resource = NULL;
>  
>  	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
> @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>  
>  	i915_vma_detach(vma);
>  
> -	if (!async && unbind_fence) {
> -		dma_fence_wait(unbind_fence, false);
> -		dma_fence_put(unbind_fence);
> -		unbind_fence = NULL;
> +	if (!async) {
> +		if (unbind_fence) {
> +			dma_fence_wait(unbind_fence, false);
> +			dma_fence_put(unbind_fence);
> +			unbind_fence = NULL;
> +		}
> +		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
>  	}
>  
>  	/*
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 88ca0bd9c900..5048eed536da 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
>  			u64 size, u64 alignment, u64 flags);
>  void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
>  void i915_vma_revoke_mmap(struct i915_vma *vma);
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
>  struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
>  int __i915_vma_unbind(struct i915_vma *vma);
>  int __must_check i915_vma_unbind(struct i915_vma *vma);
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
> index 27c55027387a..5a67995ea5fe 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.c
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.c
> @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
>   * Return: A refcounted pointer to a dma-fence that signals when unbinding is
>   * complete.
>   */
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb)
>  {
>  	struct i915_address_space *vm = vma_res->vm;
>  
> +	vma_res->tlb = tlb;
> +
>  	/* Reference for the sw fence */
>  	i915_vma_resource_get(vma_res);
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
> index 5d8427caa2ba..06923d1816e7 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.h
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.h
> @@ -67,6 +67,7 @@ struct i915_page_sizes {
>   * taken when the unbind is scheduled.
>   * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
>   * needs to be skipped for unbind.
> + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
>   *
>   * The lifetime of a struct i915_vma_resource is from a binding request to
>   * the actual possible asynchronous unbind has completed.
> @@ -119,6 +120,8 @@ struct i915_vma_resource {
>  	bool immediate_unbind:1;
>  	bool needs_wakeref:1;
>  	bool skip_pte_rewrite:1;
> +
> +	u32 *tlb;
>  };
>  
>  bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
> @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
>  
>  void i915_vma_resource_free(struct i915_vma_resource *vma_res);
>  
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb);
>  
>  void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
>  
> -- 
> 2.36.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations
@ 2022-07-27 14:25     ` Andi Shyti
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-27 14:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, dri-devel, Andrzej Hajda, Sumit Semwal,
	Michael Cheng, Chris Wilson, Dave Airlie, Tomas Winkler,
	Matthew Auld, Thomas Hellström, Lucas De Marchi, intel-gfx,
	linaro-mm-sig, Rodrigo Vivi, Michał Winiarski, linux-kernel,
	stable, linux-media, Christian König

Hi Mauro,

I think there are still some unanswered questions from Tvrtko on
this patch, am I right?

Andi

On Wed, Jul 27, 2022 at 02:29:55PM +0200, Mauro Carvalho Chehab wrote:
> From: Chris Wilson <chris.p.wilson@intel.com>
> 
> Invalidate TLB in batches, in order to reduce performance regressions.
> 
> Currently, every caller performs a full barrier around a TLB
> invalidation, ignoring all other invalidations that may have already
> removed their PTEs from the cache. As this is a synchronous operation
> and can be quite slow, we cause multiple threads to contend on the TLB
> invalidate mutex blocking userspace.
> 
> We only need to invalidate the TLB once after replacing our PTE to
> ensure that there is no possible continued access to the physical
> address before releasing our pages. By tracking a seqno for each full
> TLB invalidate we can quickly determine if one has been performed since
> rewriting the PTE, and only if necessary trigger one for ourselves.
> 
> That helps to reduce the performance regression introduced by TLB
> invalidate logic.
> 
> [mchehab: rebased to not require moving the code to a separate file]
> 
> Cc: stable@vger.kernel.org
> Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
> ---
> 
> To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover.
> See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/
> 
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 21 +++++---
>  drivers/gpu/drm/i915/gt/intel_gt.c            | 53 ++++++++++++++-----
>  drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++++-
>  drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 ++-
>  drivers/gpu/drm/i915/i915_vma.c               | 33 +++++++++---
>  drivers/gpu/drm/i915/i915_vma.h               |  1 +
>  drivers/gpu/drm/i915/i915_vma_resource.c      |  5 +-
>  drivers/gpu/drm/i915/i915_vma_resource.h      |  6 ++-
>  10 files changed, 125 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 5cf36a130061..9f6b14ec189a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -335,7 +335,6 @@ struct drm_i915_gem_object {
>  #define I915_BO_READONLY          BIT(7)
>  #define I915_TILING_QUIRK_BIT     8 /* unknown swizzling; do not release! */
>  #define I915_BO_PROTECTED         BIT(9)
> -#define I915_BO_WAS_BOUND_BIT     10
>  	/**
>  	 * @mem_flags - Mutable placement-related flags
>  	 *
> @@ -616,6 +615,8 @@ struct drm_i915_gem_object {
>  		 * pages were last acquired.
>  		 */
>  		bool dirty:1;
> +
> +		u32 tlb;
>  	} mm;
>  
>  	struct {
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index 6835279943df..8357dbdcab5c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr)
>  		vunmap(ptr);
>  }
>  
> +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj)
> +{
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct intel_gt *gt = to_gt(i915);
> +
> +	if (!obj->mm.tlb)
> +		return;
> +
> +	intel_gt_invalidate_tlb(gt, obj->mm.tlb);
> +	obj->mm.tlb = 0;
> +}
> +
>  struct sg_table *
>  __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>  {
> @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj)
>  	__i915_gem_object_reset_page_iter(obj);
>  	obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0;
>  
> -	if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) {
> -		struct drm_i915_private *i915 = to_i915(obj->base.dev);
> -		struct intel_gt *gt = to_gt(i915);
> -		intel_wakeref_t wakeref;
> -
> -		with_intel_gt_pm_if_awake(gt, wakeref)
> -			intel_gt_invalidate_tlbs(gt);
> -	}
> +	flush_tlb_invalidate(obj);
>  
>  	return pages;
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 5c55a90672f4..f435e06125aa 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>  {
>  	spin_lock_init(&gt->irq_lock);
>  
> -	mutex_init(&gt->tlb_invalidate_lock);
> -
>  	INIT_LIST_HEAD(&gt->closed_vma);
>  	spin_lock_init(&gt->closed_lock);
>  
> @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt)
>  	intel_gt_init_reset(gt);
>  	intel_gt_init_requests(gt);
>  	intel_gt_init_timelines(gt);
> +	mutex_init(&gt->tlb.invalidate_lock);
> +	seqcount_mutex_init(&gt->tlb.seqno, &gt->tlb.invalidate_lock);
>  	intel_gt_pm_init_early(gt);
>  
>  	intel_uc_init_early(&gt->uc);
> @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
>  		intel_gt_fini_requests(gt);
>  		intel_gt_fini_reset(gt);
>  		intel_gt_fini_timelines(gt);
> +		mutex_destroy(&gt->tlb.invalidate_lock);
>  		intel_engines_free(gt);
>  	}
>  }
> @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8,
>  	return rb;
>  }
>  
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> +static void mmio_invalidate_full(struct intel_gt *gt)
>  {
>  	static const i915_reg_t gen8_regs[] = {
>  		[RENDER_CLASS]			= GEN8_RTCR,
> @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	const i915_reg_t *regs;
>  	unsigned int num = 0;
>  
> -	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> -		return;
> -
> -	if (intel_gt_is_wedged(gt))
> -		return;
> -
>  	if (GRAPHICS_VER(i915) == 12) {
>  		regs = gen12_regs;
>  		num = ARRAY_SIZE(gen12_regs);
> @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  			  "Platform does not implement TLB invalidation!"))
>  		return;
>  
> -	GEM_TRACE("\n");
> -
> -	mutex_lock(&gt->tlb_invalidate_lock);
>  	intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
>  
>  	spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
> @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  		awake |= engine->mask;
>  	}
>  
> +	GT_TRACE(gt, "invalidated engines %08x\n", awake);
> +
>  	/* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */
>  	if (awake &&
>  	    (IS_TIGERLAKE(i915) ||
> @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
>  	 * transitions.
>  	 */
>  	intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL);
> -	mutex_unlock(&gt->tlb_invalidate_lock);
> +}
> +
> +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno)
> +{
> +	u32 cur = intel_gt_tlb_seqno(gt);
> +
> +	/* Only skip if a *full* TLB invalidate barrier has passed */
> +	return (s32)(cur - ALIGN(seqno, 2)) > 0;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno)
> +{
> +	intel_wakeref_t wakeref;
> +
> +	if (I915_SELFTEST_ONLY(gt->awake == -ENODEV))
> +		return;
> +
> +	if (intel_gt_is_wedged(gt))
> +		return;
> +
> +	if (tlb_seqno_passed(gt, seqno))
> +		return;
> +
> +	with_intel_gt_pm_if_awake(gt, wakeref) {
> +		mutex_lock(&gt->tlb.invalidate_lock);
> +		if (tlb_seqno_passed(gt, seqno))
> +			goto unlock;
> +
> +		mmio_invalidate_full(gt);
> +
> +		write_seqcount_invalidate(&gt->tlb.seqno);
> +unlock:
> +		mutex_unlock(&gt->tlb.invalidate_lock);
> +	}
>  }
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index 82d6f248d876..40b06adf509a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info,
>  
>  void intel_gt_watchdog_work(struct work_struct *work);
>  
> -void intel_gt_invalidate_tlbs(struct intel_gt *gt);
> +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt)
> +{
> +	return seqprop_sequence(&gt->tlb.seqno);
> +}
> +
> +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt)
> +{
> +	return intel_gt_tlb_seqno(gt) | 1;
> +}
> +
> +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno);
>  
>  #endif /* __INTEL_GT_H__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index df708802889d..3804a583382b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -11,6 +11,7 @@
>  #include <linux/llist.h>
>  #include <linux/mutex.h>
>  #include <linux/notifier.h>
> +#include <linux/seqlock.h>
>  #include <linux/spinlock.h>
>  #include <linux/types.h>
>  #include <linux/workqueue.h>
> @@ -83,7 +84,22 @@ struct intel_gt {
>  	struct intel_uc uc;
>  	struct intel_gsc gsc;
>  
> -	struct mutex tlb_invalidate_lock;
> +	struct {
> +		/* Serialize global tlb invalidations */
> +		struct mutex invalidate_lock;
> +
> +		/*
> +		 * Batch TLB invalidations
> +		 *
> +		 * After unbinding the PTE, we need to ensure the TLB
> +		 * are invalidated prior to releasing the physical pages.
> +		 * But we only need one such invalidation for all unbinds,
> +		 * so we track how many TLB invalidations have been
> +		 * performed since unbind the PTE and only emit an extra
> +		 * invalidate if no full barrier has been passed.
> +		 */
> +		seqcount_mutex_t seqno;
> +	} tlb;
>  
>  	struct i915_wa_list wa_list;
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> index d8b94d638559..2da6c82a8bd2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>  void ppgtt_unbind_vma(struct i915_address_space *vm,
>  		      struct i915_vma_resource *vma_res)
>  {
> -	if (vma_res->allocated)
> -		vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (!vma_res->allocated)
> +		return;
> +
> +	vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> +	if (vma_res->tlb)
> +		vma_invalidate_tlb(vm, *vma_res->tlb);
>  }
>  
>  static unsigned long pd_count(u64 size, int shift)
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index ef3b04c7e153..84a9ccbc5fc5 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma,
>  				   bind_flags);
>  	}
>  
> -	set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags);
> -
>  	atomic_or(bind_flags, &vma->flags);
>  	return 0;
>  }
> @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma)
>  	return err;
>  }
>  
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb)
> +{
> +	/*
> +	 * Before we release the pages that were bound by this vma, we
> +	 * must invalidate all the TLBs that may still have a reference
> +	 * back to our physical address. It only needs to be done once,
> +	 * so after updating the PTE to point away from the pages, record
> +	 * the most recent TLB invalidation seqno, and if we have not yet
> +	 * flushed the TLBs upon release, perform a full invalidation.
> +	 */
> +	WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt));
> +}
> +
>  static void __vma_put_pages(struct i915_vma *vma, unsigned int count)
>  {
>  	/* We allocate under vma_get_pages, so beware the shrinker */
> @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>  		vma->vm->skip_pte_rewrite;
>  	trace_i915_vma_unbind(vma);
>  
> -	unbind_fence = i915_vma_resource_unbind(vma_res);
> +	if (async)
> +		unbind_fence = i915_vma_resource_unbind(vma_res,
> +							&vma->obj->mm.tlb);
> +	else
> +		unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
> +
>  	vma->resource = NULL;
>  
>  	atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
> @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
>  
>  	i915_vma_detach(vma);
>  
> -	if (!async && unbind_fence) {
> -		dma_fence_wait(unbind_fence, false);
> -		dma_fence_put(unbind_fence);
> -		unbind_fence = NULL;
> +	if (!async) {
> +		if (unbind_fence) {
> +			dma_fence_wait(unbind_fence, false);
> +			dma_fence_put(unbind_fence);
> +			unbind_fence = NULL;
> +		}
> +		vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
>  	}
>  
>  	/*
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index 88ca0bd9c900..5048eed536da 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma,
>  			u64 size, u64 alignment, u64 flags);
>  void __i915_vma_set_map_and_fenceable(struct i915_vma *vma);
>  void i915_vma_revoke_mmap(struct i915_vma *vma);
> +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb);
>  struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async);
>  int __i915_vma_unbind(struct i915_vma *vma);
>  int __must_check i915_vma_unbind(struct i915_vma *vma);
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c
> index 27c55027387a..5a67995ea5fe 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.c
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.c
> @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence,
>   * Return: A refcounted pointer to a dma-fence that signals when unbinding is
>   * complete.
>   */
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res)
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb)
>  {
>  	struct i915_address_space *vm = vma_res->vm;
>  
> +	vma_res->tlb = tlb;
> +
>  	/* Reference for the sw fence */
>  	i915_vma_resource_get(vma_res);
>  
> diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h
> index 5d8427caa2ba..06923d1816e7 100644
> --- a/drivers/gpu/drm/i915/i915_vma_resource.h
> +++ b/drivers/gpu/drm/i915/i915_vma_resource.h
> @@ -67,6 +67,7 @@ struct i915_page_sizes {
>   * taken when the unbind is scheduled.
>   * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting
>   * needs to be skipped for unbind.
> + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL
>   *
>   * The lifetime of a struct i915_vma_resource is from a binding request to
>   * the actual possible asynchronous unbind has completed.
> @@ -119,6 +120,8 @@ struct i915_vma_resource {
>  	bool immediate_unbind:1;
>  	bool needs_wakeref:1;
>  	bool skip_pte_rewrite:1;
> +
> +	u32 *tlb;
>  };
>  
>  bool i915_vma_resource_hold(struct i915_vma_resource *vma_res,
> @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void);
>  
>  void i915_vma_resource_free(struct i915_vma_resource *vma_res);
>  
> -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res);
> +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res,
> +					   u32 *tlb);
>  
>  void __i915_vma_resource_init(struct i915_vma_resource *vma_res);
>  
> -- 
> 2.36.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: reduce TLB performance regressions
  2022-07-27 12:29 ` Mauro Carvalho Chehab
                   ` (9 preceding siblings ...)
  (?)
@ 2022-07-27 15:52 ` Patchwork
  -1 siblings, 0 replies; 35+ messages in thread
From: Patchwork @ 2022-07-27 15:52 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 40925 bytes --]

== Series Details ==

Series: drm/i915: reduce TLB performance regressions
URL   : https://patchwork.freedesktop.org/series/106758/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_11946_full -> Patchwork_106758v1_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_106758v1_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_106758v1_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Participating hosts (13 -> 13)
------------------------------

  No changes in participating hosts

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_106758v1_full:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - shard-tglb:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglb2/igt@i915_pm_rpm@basic-pci-d3-state.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglb3/igt@i915_pm_rpm@basic-pci-d3-state.html

  
New tests
---------

  New tests have been introduced between CI_DRM_11946_full and Patchwork_106758v1_full:

### New IGT tests (4) ###

  * igt@kms_sequence@get-forked@hdmi-a-4-pipe-a:
    - Statuses : 1 pass(s)
    - Exec time: [2.34] s

  * igt@kms_sequence@get-forked@hdmi-a-4-pipe-b:
    - Statuses : 1 pass(s)
    - Exec time: [2.25] s

  * igt@kms_sequence@get-forked@hdmi-a-4-pipe-c:
    - Statuses : 1 pass(s)
    - Exec time: [2.22] s

  * igt@kms_sequence@get-forked@hdmi-a-4-pipe-d:
    - Statuses : 1 pass(s)
    - Exec time: [2.23] s

  

Known issues
------------

  Here are the changes found in Patchwork_106758v1_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@psr2:
    - shard-iclb:         [PASS][3] -> [SKIP][4] ([i915#658])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@feature_discovery@psr2.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@feature_discovery@psr2.html

  * igt@gem_create@create-massive:
    - shard-glk:          NOTRUN -> [DMESG-WARN][5] ([i915#4991])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk2/igt@gem_create@create-massive.html
    - shard-kbl:          NOTRUN -> [DMESG-WARN][6] ([i915#4991])
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@gem_create@create-massive.html

  * igt@gem_eio@unwedge-stress:
    - shard-tglb:         [PASS][7] -> [FAIL][8] ([i915#5784])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglb7/igt@gem_eio@unwedge-stress.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglb1/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_balancer@parallel-keep-submit-fence:
    - shard-iclb:         [PASS][9] -> [SKIP][10] ([i915#4525]) +2 similar issues
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb1/igt@gem_exec_balancer@parallel-keep-submit-fence.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@gem_exec_balancer@parallel-keep-submit-fence.html

  * igt@gem_exec_fair@basic-none@rcs0:
    - shard-kbl:          [PASS][11] -> [FAIL][12] ([i915#2842])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl7/igt@gem_exec_fair@basic-none@rcs0.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl7/igt@gem_exec_fair@basic-none@rcs0.html

  * igt@gem_exec_fair@basic-none@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][13] ([i915#2842])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb4/igt@gem_exec_fair@basic-none@vcs1.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-glk:          [PASS][14] -> [FAIL][15] ([i915#2842])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-glk7/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk3/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-kbl:          NOTRUN -> [FAIL][16] ([i915#2842])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_lmem_swapping@verify-ccs:
    - shard-kbl:          NOTRUN -> [SKIP][17] ([fdo#109271] / [i915#4613]) +3 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@gem_lmem_swapping@verify-ccs.html

  * igt@gem_userptr_blits@input-checking:
    - shard-apl:          NOTRUN -> [DMESG-WARN][18] ([i915#4991])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl3/igt@gem_userptr_blits@input-checking.html

  * igt@gem_workarounds@suspend-resume-context:
    - shard-kbl:          [PASS][19] -> [DMESG-WARN][20] ([i915#180])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl4/igt@gem_workarounds@suspend-resume-context.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@gem_workarounds@suspend-resume-context.html

  * igt@i915_module_load@reload:
    - shard-skl:          [PASS][21] -> [DMESG-WARN][22] ([i915#1982])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl1/igt@i915_module_load@reload.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl4/igt@i915_module_load@reload.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-iclb:         [PASS][23] -> [FAIL][24] ([i915#454])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@i915_pm_dc@dc6-psr.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@i915_pm_dc@dc6-psr.html

  * igt@kms_ccs@pipe-b-crc-primary-basic-y_tiled_gen12_mc_ccs:
    - shard-kbl:          NOTRUN -> [SKIP][25] ([fdo#109271] / [i915#3886]) +6 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl7/igt@kms_ccs@pipe-b-crc-primary-basic-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs:
    - shard-apl:          NOTRUN -> [SKIP][26] ([fdo#109271] / [i915#3886])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl4/igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs.html

  * igt@kms_color_chamelium@pipe-a-gamma:
    - shard-kbl:          NOTRUN -> [SKIP][27] ([fdo#109271] / [fdo#111827]) +11 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@kms_color_chamelium@pipe-a-gamma.html

  * igt@kms_color_chamelium@pipe-b-ctm-limited-range:
    - shard-skl:          NOTRUN -> [SKIP][28] ([fdo#109271] / [fdo#111827])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl1/igt@kms_color_chamelium@pipe-b-ctm-limited-range.html

  * igt@kms_color_chamelium@pipe-d-ctm-red-to-blue:
    - shard-apl:          NOTRUN -> [SKIP][29] ([fdo#109271] / [fdo#111827]) +2 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl4/igt@kms_color_chamelium@pipe-d-ctm-red-to-blue.html

  * igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1:
    - shard-apl:          [PASS][30] -> [DMESG-WARN][31] ([i915#180]) +3 similar issues
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-apl1/igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl1/igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1.html

  * igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-1:
    - shard-glk:          NOTRUN -> [SKIP][32] ([fdo#109271])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk1/igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-1.html

  * igt@kms_flip@plain-flip-fb-recreate-interruptible@b-edp1:
    - shard-skl:          [PASS][33] -> [FAIL][34] ([i915#2122])
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl4/igt@kms_flip@plain-flip-fb-recreate-interruptible@b-edp1.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl7/igt@kms_flip@plain-flip-fb-recreate-interruptible@b-edp1.html

  * igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-32bpp-4tile-upscaling@pipe-a-default-mode:
    - shard-skl:          NOTRUN -> [SKIP][35] ([fdo#109271]) +7 similar issues
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl1/igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-32bpp-4tile-upscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-xtile-to-32bpp-xtile-downscaling@pipe-a-default-mode:
    - shard-iclb:         NOTRUN -> [SKIP][36] ([i915#3555]) +1 similar issue
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-xtile-to-32bpp-xtile-downscaling@pipe-a-default-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling@pipe-a-valid-mode:
    - shard-iclb:         NOTRUN -> [SKIP][37] ([i915#2672]) +8 similar issues
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb8/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling@pipe-a-valid-mode.html

  * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs-downscaling@pipe-a-default-mode:
    - shard-iclb:         NOTRUN -> [SKIP][38] ([i915#2672] / [i915#3555])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb3/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs-downscaling@pipe-a-default-mode.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-render:
    - shard-apl:          NOTRUN -> [SKIP][39] ([fdo#109271]) +24 similar issues
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl3/igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-render.html

  * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max:
    - shard-kbl:          NOTRUN -> [FAIL][40] ([fdo#108145] / [i915#265])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max.html

  * igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1:
    - shard-iclb:         [PASS][41] -> [SKIP][42] ([i915#5235]) +2 similar issues
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1.html

  * igt@kms_psr2_su@page_flip-p010:
    - shard-kbl:          NOTRUN -> [SKIP][43] ([fdo#109271] / [i915#658])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@kms_psr2_su@page_flip-p010.html

  * igt@kms_psr@psr2_sprite_blt:
    - shard-iclb:         [PASS][44] -> [SKIP][45] ([fdo#109441]) +3 similar issues
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@kms_psr@psr2_sprite_blt.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb8/igt@kms_psr@psr2_sprite_blt.html

  * igt@kms_psr_stress_test@invalidate-primary-flip-overlay:
    - shard-iclb:         [PASS][46] -> [SKIP][47] ([i915#5519])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb3/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb8/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html

  * igt@perf@blocking:
    - shard-skl:          [PASS][48] -> [FAIL][49] ([i915#1542])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl4/igt@perf@blocking.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl7/igt@perf@blocking.html

  * igt@prime_nv_pcopy@test2:
    - shard-kbl:          NOTRUN -> [SKIP][50] ([fdo#109271]) +150 similar issues
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@prime_nv_pcopy@test2.html

  
#### Possible fixes ####

  * igt@fbdev@nullptr:
    - {shard-rkl}:        [SKIP][51] ([i915#2582]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@fbdev@nullptr.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@fbdev@nullptr.html

  * igt@gem_ctx_exec@basic-nohangcheck:
    - shard-tglb:         [FAIL][53] ([i915#6268]) -> [PASS][54]
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglb5/igt@gem_ctx_exec@basic-nohangcheck.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglb1/igt@gem_ctx_exec@basic-nohangcheck.html

  * igt@gem_ctx_persistence@legacy-engines-hostile@vebox:
    - {shard-dg1}:        [FAIL][55] ([i915#4883]) -> [PASS][56] +3 similar issues
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-dg1-15/igt@gem_ctx_persistence@legacy-engines-hostile@vebox.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-dg1-19/igt@gem_ctx_persistence@legacy-engines-hostile@vebox.html

  * igt@gem_eio@unwedge-stress:
    - {shard-tglu}:       [TIMEOUT][57] ([i915#3063]) -> [PASS][58]
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglu-6/igt@gem_eio@unwedge-stress.html
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglu-2/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_balancer@fairslice:
    - {shard-rkl}:        [SKIP][59] ([i915#6259]) -> [PASS][60]
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@gem_exec_balancer@fairslice.html
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@gem_exec_balancer@fairslice.html

  * igt@gem_exec_balancer@parallel-bb-first:
    - shard-iclb:         [SKIP][61] ([i915#4525]) -> [PASS][62]
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@gem_exec_balancer@parallel-bb-first.html
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@gem_exec_balancer@parallel-bb-first.html

  * igt@gem_exec_fair@basic-deadline:
    - shard-kbl:          [FAIL][63] ([i915#2846]) -> [PASS][64]
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl7/igt@gem_exec_fair@basic-deadline.html
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl1/igt@gem_exec_fair@basic-deadline.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - {shard-tglu}:       [FAIL][65] ([i915#2842]) -> [PASS][66]
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglu-8/igt@gem_exec_fair@basic-none-share@rcs0.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglu-5/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-pace-solo@rcs0:
    - shard-kbl:          [FAIL][67] ([i915#2842]) -> [PASS][68] +2 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl6/igt@gem_exec_fair@basic-pace-solo@rcs0.html
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl1/igt@gem_exec_fair@basic-pace-solo@rcs0.html

  * igt@gem_exec_reloc@basic-gtt-cpu:
    - {shard-rkl}:        [SKIP][69] ([i915#3281]) -> [PASS][70] +2 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-2/igt@gem_exec_reloc@basic-gtt-cpu.html
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-5/igt@gem_exec_reloc@basic-gtt-cpu.html

  * igt@gem_partial_pwrite_pread@writes-after-reads:
    - {shard-rkl}:        [SKIP][71] ([i915#3282]) -> [PASS][72] +1 similar issue
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-2/igt@gem_partial_pwrite_pread@writes-after-reads.html
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-5/igt@gem_partial_pwrite_pread@writes-after-reads.html

  * igt@gen9_exec_parse@allowed-single:
    - shard-glk:          [DMESG-WARN][73] ([i915#5566] / [i915#716]) -> [PASS][74]
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-glk3/igt@gen9_exec_parse@allowed-single.html
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk2/igt@gen9_exec_parse@allowed-single.html

  * igt@gen9_exec_parse@batch-zero-length:
    - {shard-rkl}:        [SKIP][75] ([i915#2527]) -> [PASS][76]
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-2/igt@gen9_exec_parse@batch-zero-length.html
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-5/igt@gen9_exec_parse@batch-zero-length.html

  * igt@i915_pm_backlight@fade:
    - {shard-rkl}:        [SKIP][77] ([i915#3012]) -> [PASS][78]
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@i915_pm_backlight@fade.html
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@i915_pm_backlight@fade.html

  * igt@i915_pm_dc@dc6-dpms:
    - {shard-tglu}:       [FAIL][79] ([i915#3989] / [i915#454]) -> [PASS][80]
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglu-5/igt@i915_pm_dc@dc6-dpms.html
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglu-2/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a:
    - {shard-tglu}:       [FAIL][81] ([i915#3825]) -> [PASS][82]
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglu-2/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglu-1/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html

  * igt@i915_pm_rc6_residency@rc6-idle@vcs0:
    - shard-kbl:          [WARN][83] ([i915#6405]) -> [PASS][84]
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl4/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html
    - {shard-rkl}:        [WARN][85] ([i915#6405]) -> [PASS][86]
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-1/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html

  * igt@i915_pm_rpm@modeset-non-lpsp-stress:
    - {shard-dg1}:        [SKIP][87] ([i915#1397]) -> [PASS][88] +3 similar issues
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-dg1-17/igt@i915_pm_rpm@modeset-non-lpsp-stress.html
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-dg1-14/igt@i915_pm_rpm@modeset-non-lpsp-stress.html

  * igt@i915_suspend@debugfs-reader:
    - shard-kbl:          [INCOMPLETE][89] ([i915#3614] / [i915#4939]) -> [PASS][90]
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl4/igt@i915_suspend@debugfs-reader.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@i915_suspend@debugfs-reader.html

  * igt@kms_big_fb@y-tiled-32bpp-rotate-90:
    - shard-skl:          [TIMEOUT][91] ([i915#6371]) -> [PASS][92]
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl9/igt@kms_big_fb@y-tiled-32bpp-rotate-90.html
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl9/igt@kms_big_fb@y-tiled-32bpp-rotate-90.html

  * igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size:
    - shard-glk:          [FAIL][93] ([i915#2346]) -> [PASS][94]
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-glk5/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size.html
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk8/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size.html

  * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled:
    - {shard-rkl}:        [SKIP][95] ([fdo#111314] / [i915#4098] / [i915#4369]) -> [PASS][96] +8 similar issues
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled.html
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1:
    - shard-apl:          [FAIL][97] ([i915#79]) -> [PASS][98]
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-apl8/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl4/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-dp1:
    - shard-kbl:          [DMESG-WARN][99] ([i915#180]) -> [PASS][100] +3 similar issues
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl1/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html

  * igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1:
    - shard-skl:          [FAIL][101] ([i915#2122]) -> [PASS][102] +2 similar issues
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl4/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl7/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt:
    - {shard-rkl}:        [SKIP][103] ([i915#1849] / [i915#4098]) -> [PASS][104] +20 similar issues
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt.html
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary:
    - shard-skl:          [DMESG-WARN][105] ([i915#1982]) -> [PASS][106] +1 similar issue
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl9/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl9/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-7efc:
    - {shard-rkl}:        [SKIP][107] ([i915#1849] / [i915#3546] / [i915#4098]) -> [PASS][108]
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html

  * igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
    - {shard-rkl}:        [SKIP][109] ([i915#1849] / [i915#3546] / [i915#4070] / [i915#4098]) -> [PASS][110]
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-1/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html

  * igt@kms_psr@cursor_render:
    - {shard-rkl}:        [SKIP][111] ([i915#1072]) -> [PASS][112] +1 similar issue
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_psr@cursor_render.html
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_psr@cursor_render.html

  * igt@kms_psr@psr2_primary_mmap_cpu:
    - shard-iclb:         [SKIP][113] ([fdo#109441]) -> [PASS][114] +2 similar issues
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@kms_psr@psr2_primary_mmap_cpu.html
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html

  * igt@kms_universal_plane@disable-primary-vs-flip-pipe-b:
    - {shard-rkl}:        [SKIP][115] ([i915#1845] / [i915#4070] / [i915#4098]) -> [PASS][116]
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-1/igt@kms_universal_plane@disable-primary-vs-flip-pipe-b.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_universal_plane@disable-primary-vs-flip-pipe-b.html

  * igt@kms_vblank@pipe-b-query-idle:
    - {shard-rkl}:        [SKIP][117] ([i915#1845] / [i915#4098]) -> [PASS][118] +33 similar issues
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_vblank@pipe-b-query-idle.html
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_vblank@pipe-b-query-idle.html

  * igt@perf@gen12-unprivileged-single-ctx-counters:
    - {shard-rkl}:        [SKIP][119] ([fdo#109289]) -> [PASS][120]
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@perf@gen12-unprivileged-single-ctx-counters.html
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-1/igt@perf@gen12-unprivileged-single-ctx-counters.html

  * igt@perf@polling-parameterized:
    - shard-tglb:         [FAIL][121] ([i915#5639]) -> [PASS][122]
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglb1/igt@perf@polling-parameterized.html
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglb7/igt@perf@polling-parameterized.html

  * igt@perf@polling-small-buf:
    - {shard-rkl}:        [FAIL][123] ([i915#1722]) -> [PASS][124]
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-1/igt@perf@polling-small-buf.html
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@perf@polling-small-buf.html

  * igt@perf@short-reads:
    - shard-skl:          [FAIL][125] ([i915#51]) -> [PASS][126]
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl6/igt@perf@short-reads.html
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl6/igt@perf@short-reads.html

  * igt@perf_pmu@rc6-suspend:
    - shard-apl:          [DMESG-WARN][127] ([i915#180]) -> [PASS][128] +1 similar issue
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-apl1/igt@perf_pmu@rc6-suspend.html
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl4/igt@perf_pmu@rc6-suspend.html

  
#### Warnings ####

  * igt@gem_exec_balancer@parallel-ordering:
    - shard-iclb:         [FAIL][129] ([i915#6117]) -> [SKIP][130] ([i915#4525])
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@gem_exec_balancer@parallel-ordering.html
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@gem_exec_balancer@parallel-ordering.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-kbl:          [INCOMPLETE][131] ([i915#180] / [i915#4939]) -> [FAIL][132] ([i915#4767])
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl6/igt@kms_fbcon_fbt@fbc-suspend.html
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl7/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling@pipe-a-default-mode:
    - shard-skl:          [SKIP][133] ([fdo#109271] / [i915#1888]) -> [SKIP][134] ([fdo#109271])
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl4/igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling@pipe-a-default-mode.html
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl7/igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling@pipe-a-default-mode.html

  * igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf:
    - shard-iclb:         [SKIP][135] ([i915#658]) -> [SKIP][136] ([i915#2920])
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf:
    - shard-iclb:         [SKIP][137] ([i915#2920]) -> [SKIP][138] ([i915#658])
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf.html

  * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area:
    - shard-iclb:         [SKIP][139] ([fdo#111068] / [i915#658]) -> [SKIP][140] ([i915#2920])
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html

  * igt@runner@aborted:
    - shard-kbl:          ([FAIL][141], [FAIL][142], [FAIL][143], [FAIL][144], [FAIL][145]) ([i915#180] / [i915#3002] / [i915#4312] / [i915#5257] / [i915#92]) -> ([FAIL][146], [FAIL][147], [FAIL][148]) ([i915#180] / [i915#3002] / [i915#4312] / [i915#5257])
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl1/igt@runner@aborted.html
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl6/igt@runner@aborted.html
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl6/igt@runner@aborted.html
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl4/igt@runner@aborted.html
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl1/igt@runner@aborted.html
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@runner@aborted.html
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@runner@aborted.html
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109291]: https://bugs.freedesktop.org/show_bug.cgi?id=109291
  [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295
  [fdo#109300]: https://bugs.freedesktop.org/show_bug.cgi?id=109300
  [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308
  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [fdo#109314]: https://bugs.freedesktop.org/show_bug.cgi?id=109314
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109506]: https://bugs.freedesktop.org/show_bug.cgi?id=109506
  [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189
  [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [fdo#111314]: https://bugs.freedesktop.org/show_bug.cgi?id=111314
  [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614
  [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615
  [fdo#111656]: https://bugs.freedesktop.org/show_bug.cgi?id=111656
  [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072
  [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132
  [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#160]: https://gitlab.freedesktop.org/drm/intel/issues/160
  [i915#1722]: https://gitlab.freedesktop.org/drm/intel/issues/1722
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825
  [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839
  [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845
  [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849
  [i915#1850]: https://gitlab.freedesktop.org/drm/intel/issues/1850
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2232]: https://gitlab.freedesktop.org/drm/intel/issues/2232
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2436]: https://gitlab.freedesktop.org/drm/intel/issues/2436
  [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437
  [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527
  [i915#2530]: https://gitlab.freedesktop.org/drm/intel/issues/2530
  [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582
  [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265
  [i915#2658]: https://gitlab.freedesktop.org/drm/intel/issues/2658
  [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672
  [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280
  [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842
  [i915#2846]: https://gitlab.freedesktop.org/drm/intel/issues/2846
  [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856
  [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920
  [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994
  [i915#3002]: https://gitlab.freedesktop.org/drm/intel/issues/3002
  [i915#3012]: https://gitlab.freedesktop.org/drm/intel/issues/3012
  [i915#3063]: https://gitlab.freedesktop.org/drm/intel/issues/3063
  [i915#3116]: https://gitlab.freedesktop.org/drm/intel/issues/3116
  [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281
  [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282
  [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297
  [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359
  [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361
  [i915#3376]: https://gitlab.freedesktop.org/drm/intel/issues/3376
  [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458
  [i915#3536]: https://gitlab.freedesktop.org/drm/intel/issues/3536
  [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539
  [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546
  [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555
  [i915#3558]: https://gitlab.freedesktop.org/drm/intel/issues/3558
  [i915#3614]: https://gitlab.freedesktop.org/drm/intel/issues/3614
  [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637
  [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638
  [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689
  [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708
  [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734
  [i915#3778]: https://gitlab.freedesktop.org/drm/intel/issues/3778
  [i915#3825]: https://gitlab.freedesktop.org/drm/intel/issues/3825
  [i915#3828]: https://gitlab.freedesktop.org/drm/intel/issues/3828
  [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886
  [i915#3955]: https://gitlab.freedesktop.org/drm/intel/issues/3955
  [i915#3989]: https://gitlab.freedesktop.org/drm/intel/issues/3989
  [i915#4016]: https://gitlab.freedesktop.org/drm/intel/issues/4016
  [i915#404]: https://gitlab.freedesktop.org/drm/intel/issues/404
  [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070
  [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077
  [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078
  [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079
  [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083
  [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098
  [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103
  [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270
  [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312
  [i915#4369]: https://gitlab.freedesktop.org/drm/intel/issues/4369
  [i915#4387]: https://gitlab.freedesktop.org/drm/intel/issues/4387
  [i915#4462]: https://gitlab.freedesktop.org/drm/intel/issues/4462
  [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525
  [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538
  [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454
  [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613
  [i915#4767]: https://gitlab.freedesktop.org/drm/intel/issues/4767
  [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812
  [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833
  [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852
  [i915#4853]: https://gitlab.freedesktop.org/drm/intel/issues/4853
  [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860
  [i915#4883]: https://gitlab.freedesktop.org/drm/intel/issues/4883
  [i915#4885]: https://gitlab.freedesktop.org/drm/intel/issues/4885
  [i915#4893]: https://gitlab.freedesktop.org/drm/intel/issues/4893
  [i915#4939]: https://gitlab.freedesktop.org/drm/intel/issues/4939
  [i915#4941]: https://gitlab.freedesktop.org/drm/intel/issues/4941
  [i915#4991]: https://gitlab.freedesktop.org/drm/intel/issues/4991
  [i915#51]: https://gitlab.freedesktop.org/drm/intel/issues/51
  [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176
  [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235
  [i915#5257]: https://gitlab.freedesktop.org/drm/intel/issues/5257
  [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286
  [i915#5287]: https://gitlab.freedesktop.org/drm/intel/issues/5287
  [i915#5288]: https://gitlab.freedesktop.org/drm/intel/issues/5288
  [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289
  [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325
  [i915#5327]: https://gitlab.freedesktop.org/drm/intel/issues/5327
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#5461]: https://gitlab.freedesktop.org/drm/intel/issues/5461
  [i915#5519]: https://gitlab.freedesktop.org/drm/intel/issues/5519
  [i915#5563]: https://gitlab.freedesktop.org/drm/intel/issues/5563
  [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566
  [i915#5639]: https://gitlab.freedesktop.org/drm/intel/issues/5639
  [i915#5784]: https://gitlab.freedesktop.org/drm/intel/issues/5784
  [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095
  [i915#6117]: https://gitlab.freedesktop.org/drm/intel/issues/6117
  [i915#6248]: https://gitlab.freedesktop.org/drm/intel/issues/6248
  [i915#6252]: https://gitlab.freedesktop.org/drm/intel/issues/6252
  [i915#6259]: https://gitlab.freedesktop.org/drm/intel/issues/6259
  [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268
  [i915#6371]: https://gitlab.freedesktop.org/drm/intel/issues/6371
  [i915#6405]: https://gitlab.freedesktop.org/drm/intel/issues/6405
  [i915#6433]: https://gitlab.freedesktop.org/drm/intel/issues/6433
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92


Build changes
-------------

  * Linux: CI_DRM_11946 -> Patchwork_106758v1

  CI-20190529: 20190529
  CI_DRM_11946: 0e9c43d76a145712da46e935d429ce2a3eea80e8 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6598: 97e103419021d0863db527e3f2cf39ccdd132db5 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_106758v1: 0e9c43d76a145712da46e935d429ce2a3eea80e8 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/index.html

[-- Attachment #2: Type: text/html, Size: 42552 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions
  2022-07-27 12:29 ` Mauro Carvalho Chehab
  (?)
@ 2022-07-28 12:08   ` Andi Shyti
  -1 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-28 12:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Sumit Semwal,
	linaro-mm-sig, Christian König, linux-media, Tvrtko Ursulin

Hi Mauro,

Pushed in drm-intel-gt-next.

Thanks,
Andi

On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote:
> Doing TLB invalidation cause performance regressions, like:
> 	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
> 
> As reported at:
> 	https://gitlab.freedesktop.org/drm/intel/-/issues/6424
> 
> as this is an expensive operation. So, reduce the need of it by:
>   - checking if the engine is awake;
>   - checking if the engine is not wedged;
>   - batching operations.
> 
> Additionally, add a workaround for a known hardware issue on some GPUs.
> 
> In order to double-check that this series won't be introducing any regressions,
> I used this new IGT test:
> 
> https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1
> 
> Checking the results for 3 different patchsets, on Broadwell:
> 
> 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
> invalidation and serialization patches:
> 
> 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> 	Subtest close-clear: SUCCESS (10.490s)
> 	Subtest madv-clear: SUCCESS (10.484s)
> 	Subtest u-unmap-clear: SUCCESS (10.527s)
> 	Subtest u-shrink-clear: SUCCESS (10.506s)
> 	Subtest close-dumb: SUCCESS (10.165s)
> 	Subtest madv-dumb: SUCCESS (10.177s)
> 	Subtest u-unmap-dumb: SUCCESS (10.172s)
> 	Subtest u-shrink-dumb: SUCCESS (10.172s)
> 
> 2) With the new version of the batch TLB invalidation patches from this series:
> 
> 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> 	Subtest close-clear: SUCCESS (10.483s)
> 	Subtest madv-clear: SUCCESS (10.495s)
> 	Subtest u-unmap-clear: SUCCESS (10.545s)
> 	Subtest u-shrink-clear: SUCCESS (10.508s)
> 	Subtest close-dumb: SUCCESS (10.172s)
> 	Subtest madv-dumb: SUCCESS (10.169s)
> 	Subtest u-unmap-dumb: SUCCESS (10.174s)
> 	Subtest u-shrink-dumb: SUCCESS (10.176s)
> 
> 3) Changing the TLB invalidation routine to do nothing[1]:
> 
> 	$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
> 	(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
> 	(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
> 	(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
> 	(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
> 	(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
> 	(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
> 	(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
> 	(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
> 	Dynamic subtest smem0 failed.
> 	**** DEBUG ****
> 	(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
> 	(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
> 	(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
> 	(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
> 	****  END  ****
> 	Subtest close-clear: FAIL (10.434s)
> 	Subtest madv-clear: SUCCESS (10.479s)
> 	Subtest u-unmap-clear: SUCCESS (10.512s)
> 
> In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
> as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
> applied don't have TLB invalidation cache issues.
> 
> [1] I applied this patch on the top of drm-tip:
> 
> 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> 	index 68c2b0d8f187..0aefcd7be5e9 100644
> 	--- a/drivers/gpu/drm/i915/gt/intel_gt.c
> 	+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> 	@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> 	+	// HACK: don't do TLB invalidations!!!
> 	+	return;
> 	+
> 
> Regards,
> Mauro
> 
> Chris Wilson (4):
>   drm/i915/gt: Ignore TLB invalidations on idle engines
>   drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
>   drm/i915/gt: Skip TLB invalidations once wedged
>   drm/i915/gt: Batch TLB invalidations
> 
> Mauro Carvalho Chehab (2):
>   drm/i915/gt: document with_intel_gt_pm_if_awake()
>   drm/i915/gt: describe the new tlb parameter at i915_vma_resource
> 
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 25 +++---
>  drivers/gpu/drm/i915/gt/intel_gt.c            | 77 +++++++++++++++----
>  drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++-
>  drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 11 +++
>  drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 +-
>  drivers/gpu/drm/i915/i915_vma.c               | 33 ++++++--
>  drivers/gpu/drm/i915/i915_vma.h               |  1 +
>  drivers/gpu/drm/i915/i915_vma_resource.c      |  9 ++-
>  drivers/gpu/drm/i915/i915_vma_resource.h      |  6 +-
>  11 files changed, 163 insertions(+), 40 deletions(-)
> 
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions
@ 2022-07-28 12:08   ` Andi Shyti
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-28 12:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, Sumit Semwal, linux-media

Hi Mauro,

Pushed in drm-intel-gt-next.

Thanks,
Andi

On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote:
> Doing TLB invalidation cause performance regressions, like:
> 	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
> 
> As reported at:
> 	https://gitlab.freedesktop.org/drm/intel/-/issues/6424
> 
> as this is an expensive operation. So, reduce the need of it by:
>   - checking if the engine is awake;
>   - checking if the engine is not wedged;
>   - batching operations.
> 
> Additionally, add a workaround for a known hardware issue on some GPUs.
> 
> In order to double-check that this series won't be introducing any regressions,
> I used this new IGT test:
> 
> https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1
> 
> Checking the results for 3 different patchsets, on Broadwell:
> 
> 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
> invalidation and serialization patches:
> 
> 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> 	Subtest close-clear: SUCCESS (10.490s)
> 	Subtest madv-clear: SUCCESS (10.484s)
> 	Subtest u-unmap-clear: SUCCESS (10.527s)
> 	Subtest u-shrink-clear: SUCCESS (10.506s)
> 	Subtest close-dumb: SUCCESS (10.165s)
> 	Subtest madv-dumb: SUCCESS (10.177s)
> 	Subtest u-unmap-dumb: SUCCESS (10.172s)
> 	Subtest u-shrink-dumb: SUCCESS (10.172s)
> 
> 2) With the new version of the batch TLB invalidation patches from this series:
> 
> 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> 	Subtest close-clear: SUCCESS (10.483s)
> 	Subtest madv-clear: SUCCESS (10.495s)
> 	Subtest u-unmap-clear: SUCCESS (10.545s)
> 	Subtest u-shrink-clear: SUCCESS (10.508s)
> 	Subtest close-dumb: SUCCESS (10.172s)
> 	Subtest madv-dumb: SUCCESS (10.169s)
> 	Subtest u-unmap-dumb: SUCCESS (10.174s)
> 	Subtest u-shrink-dumb: SUCCESS (10.176s)
> 
> 3) Changing the TLB invalidation routine to do nothing[1]:
> 
> 	$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
> 	(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
> 	(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
> 	(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
> 	(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
> 	(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
> 	(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
> 	(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
> 	(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
> 	Dynamic subtest smem0 failed.
> 	**** DEBUG ****
> 	(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
> 	(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
> 	(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
> 	(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
> 	****  END  ****
> 	Subtest close-clear: FAIL (10.434s)
> 	Subtest madv-clear: SUCCESS (10.479s)
> 	Subtest u-unmap-clear: SUCCESS (10.512s)
> 
> In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
> as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
> applied don't have TLB invalidation cache issues.
> 
> [1] I applied this patch on the top of drm-tip:
> 
> 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> 	index 68c2b0d8f187..0aefcd7be5e9 100644
> 	--- a/drivers/gpu/drm/i915/gt/intel_gt.c
> 	+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> 	@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> 	+	// HACK: don't do TLB invalidations!!!
> 	+	return;
> 	+
> 
> Regards,
> Mauro
> 
> Chris Wilson (4):
>   drm/i915/gt: Ignore TLB invalidations on idle engines
>   drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
>   drm/i915/gt: Skip TLB invalidations once wedged
>   drm/i915/gt: Batch TLB invalidations
> 
> Mauro Carvalho Chehab (2):
>   drm/i915/gt: document with_intel_gt_pm_if_awake()
>   drm/i915/gt: describe the new tlb parameter at i915_vma_resource
> 
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 25 +++---
>  drivers/gpu/drm/i915/gt/intel_gt.c            | 77 +++++++++++++++----
>  drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++-
>  drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 11 +++
>  drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 +-
>  drivers/gpu/drm/i915/i915_vma.c               | 33 ++++++--
>  drivers/gpu/drm/i915/i915_vma.h               |  1 +
>  drivers/gpu/drm/i915/i915_vma_resource.c      |  9 ++-
>  drivers/gpu/drm/i915/i915_vma_resource.h      |  6 +-
>  11 files changed, 163 insertions(+), 40 deletions(-)
> 
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions
@ 2022-07-28 12:08   ` Andi Shyti
  0 siblings, 0 replies; 35+ messages in thread
From: Andi Shyti @ 2022-07-28 12:08 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel,
	Christian König, linaro-mm-sig, Sumit Semwal, linux-media

Hi Mauro,

Pushed in drm-intel-gt-next.

Thanks,
Andi

On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote:
> Doing TLB invalidation cause performance regressions, like:
> 	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
> 
> As reported at:
> 	https://gitlab.freedesktop.org/drm/intel/-/issues/6424
> 
> as this is an expensive operation. So, reduce the need of it by:
>   - checking if the engine is awake;
>   - checking if the engine is not wedged;
>   - batching operations.
> 
> Additionally, add a workaround for a known hardware issue on some GPUs.
> 
> In order to double-check that this series won't be introducing any regressions,
> I used this new IGT test:
> 
> https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1
> 
> Checking the results for 3 different patchsets, on Broadwell:
> 
> 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
> invalidation and serialization patches:
> 
> 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> 	Subtest close-clear: SUCCESS (10.490s)
> 	Subtest madv-clear: SUCCESS (10.484s)
> 	Subtest u-unmap-clear: SUCCESS (10.527s)
> 	Subtest u-shrink-clear: SUCCESS (10.506s)
> 	Subtest close-dumb: SUCCESS (10.165s)
> 	Subtest madv-dumb: SUCCESS (10.177s)
> 	Subtest u-unmap-dumb: SUCCESS (10.172s)
> 	Subtest u-shrink-dumb: SUCCESS (10.172s)
> 
> 2) With the new version of the batch TLB invalidation patches from this series:
> 
> 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> 	Subtest close-clear: SUCCESS (10.483s)
> 	Subtest madv-clear: SUCCESS (10.495s)
> 	Subtest u-unmap-clear: SUCCESS (10.545s)
> 	Subtest u-shrink-clear: SUCCESS (10.508s)
> 	Subtest close-dumb: SUCCESS (10.172s)
> 	Subtest madv-dumb: SUCCESS (10.169s)
> 	Subtest u-unmap-dumb: SUCCESS (10.174s)
> 	Subtest u-shrink-dumb: SUCCESS (10.176s)
> 
> 3) Changing the TLB invalidation routine to do nothing[1]:
> 
> 	$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
> 	(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
> 	(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
> 	(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
> 	(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
> 	(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
> 	(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
> 	(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
> 	(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> 	(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
> 	(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
> 	Dynamic subtest smem0 failed.
> 	**** DEBUG ****
> 	(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
> 	(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
> 	(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
> 	(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
> 	****  END  ****
> 	Subtest close-clear: FAIL (10.434s)
> 	Subtest madv-clear: SUCCESS (10.479s)
> 	Subtest u-unmap-clear: SUCCESS (10.512s)
> 
> In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
> as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
> applied don't have TLB invalidation cache issues.
> 
> [1] I applied this patch on the top of drm-tip:
> 
> 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> 	index 68c2b0d8f187..0aefcd7be5e9 100644
> 	--- a/drivers/gpu/drm/i915/gt/intel_gt.c
> 	+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> 	@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> 	+	// HACK: don't do TLB invalidations!!!
> 	+	return;
> 	+
> 
> Regards,
> Mauro
> 
> Chris Wilson (4):
>   drm/i915/gt: Ignore TLB invalidations on idle engines
>   drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
>   drm/i915/gt: Skip TLB invalidations once wedged
>   drm/i915/gt: Batch TLB invalidations
> 
> Mauro Carvalho Chehab (2):
>   drm/i915/gt: document with_intel_gt_pm_if_awake()
>   drm/i915/gt: describe the new tlb parameter at i915_vma_resource
> 
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 25 +++---
>  drivers/gpu/drm/i915/gt/intel_gt.c            | 77 +++++++++++++++----
>  drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++-
>  drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 11 +++
>  drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++-
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 +-
>  drivers/gpu/drm/i915/i915_vma.c               | 33 ++++++--
>  drivers/gpu/drm/i915/i915_vma.h               |  1 +
>  drivers/gpu/drm/i915/i915_vma_resource.c      |  9 ++-
>  drivers/gpu/drm/i915/i915_vma_resource.h      |  6 +-
>  11 files changed, 163 insertions(+), 40 deletions(-)
> 
> -- 
> 2.36.1
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions
  2022-07-28 12:08   ` Andi Shyti
@ 2022-07-28 12:34     ` Mauro Carvalho Chehab
  -1 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-28 12:34 UTC (permalink / raw)
  To: Andi Shyti
  Cc: Mauro Carvalho Chehab, David Airlie, intel-gfx, linux-kernel,
	dri-devel, Christian König, linaro-mm-sig, Sumit Semwal,
	linux-media

Hi Andi,

On Thu, 28 Jul 2022 14:08:11 +0200
Andi Shyti <andi.shyti@linux.intel.com> wrote:

> Hi Mauro,
> 
> Pushed in drm-intel-gt-next.

Thank you!

I submitted two additional patches moving the TLB code into its own file,
and adding the documentation for it, as agreed during patch 5/6 review:

	https://patchwork.freedesktop.org/series/106806/

That should make easier to maintain TLB-related code and have such
functions properly documented.

Regards,
Mauro


> 
> Thanks,
> Andi
> 
> On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote:
> > Doing TLB invalidation cause performance regressions, like:
> > 	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
> > 
> > As reported at:
> > 	https://gitlab.freedesktop.org/drm/intel/-/issues/6424
> > 
> > as this is an expensive operation. So, reduce the need of it by:
> >   - checking if the engine is awake;
> >   - checking if the engine is not wedged;
> >   - batching operations.
> > 
> > Additionally, add a workaround for a known hardware issue on some GPUs.
> > 
> > In order to double-check that this series won't be introducing any regressions,
> > I used this new IGT test:
> > 
> > https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1
> > 
> > Checking the results for 3 different patchsets, on Broadwell:
> > 
> > 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
> > invalidation and serialization patches:
> > 
> > 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> > 	Subtest close-clear: SUCCESS (10.490s)
> > 	Subtest madv-clear: SUCCESS (10.484s)
> > 	Subtest u-unmap-clear: SUCCESS (10.527s)
> > 	Subtest u-shrink-clear: SUCCESS (10.506s)
> > 	Subtest close-dumb: SUCCESS (10.165s)
> > 	Subtest madv-dumb: SUCCESS (10.177s)
> > 	Subtest u-unmap-dumb: SUCCESS (10.172s)
> > 	Subtest u-shrink-dumb: SUCCESS (10.172s)
> > 
> > 2) With the new version of the batch TLB invalidation patches from this series:
> > 
> > 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> > 	Subtest close-clear: SUCCESS (10.483s)
> > 	Subtest madv-clear: SUCCESS (10.495s)
> > 	Subtest u-unmap-clear: SUCCESS (10.545s)
> > 	Subtest u-shrink-clear: SUCCESS (10.508s)
> > 	Subtest close-dumb: SUCCESS (10.172s)
> > 	Subtest madv-dumb: SUCCESS (10.169s)
> > 	Subtest u-unmap-dumb: SUCCESS (10.174s)
> > 	Subtest u-shrink-dumb: SUCCESS (10.176s)
> > 
> > 3) Changing the TLB invalidation routine to do nothing[1]:
> > 
> > 	$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
> > 	(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
> > 	(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
> > 	(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
> > 	(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
> > 	(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
> > 	(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
> > 	(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
> > 	(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
> > 	Dynamic subtest smem0 failed.
> > 	**** DEBUG ****
> > 	(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
> > 	(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
> > 	(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
> > 	(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
> > 	****  END  ****
> > 	Subtest close-clear: FAIL (10.434s)
> > 	Subtest madv-clear: SUCCESS (10.479s)
> > 	Subtest u-unmap-clear: SUCCESS (10.512s)
> > 
> > In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
> > as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
> > applied don't have TLB invalidation cache issues.
> > 
> > [1] I applied this patch on the top of drm-tip:
> > 
> > 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > 	index 68c2b0d8f187..0aefcd7be5e9 100644
> > 	--- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > 	+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > 	@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> > 	+	// HACK: don't do TLB invalidations!!!
> > 	+	return;
> > 	+
> > 
> > Regards,
> > Mauro
> > 
> > Chris Wilson (4):
> >   drm/i915/gt: Ignore TLB invalidations on idle engines
> >   drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
> >   drm/i915/gt: Skip TLB invalidations once wedged
> >   drm/i915/gt: Batch TLB invalidations
> > 
> > Mauro Carvalho Chehab (2):
> >   drm/i915/gt: document with_intel_gt_pm_if_awake()
> >   drm/i915/gt: describe the new tlb parameter at i915_vma_resource
> > 
> >  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 25 +++---
> >  drivers/gpu/drm/i915/gt/intel_gt.c            | 77 +++++++++++++++----
> >  drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++-
> >  drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 11 +++
> >  drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++-
> >  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 +-
> >  drivers/gpu/drm/i915/i915_vma.c               | 33 ++++++--
> >  drivers/gpu/drm/i915/i915_vma.h               |  1 +
> >  drivers/gpu/drm/i915/i915_vma_resource.c      |  9 ++-
> >  drivers/gpu/drm/i915/i915_vma_resource.h      |  6 +-
> >  11 files changed, 163 insertions(+), 40 deletions(-)
> > 
> > -- 
> > 2.36.1
> >   

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions
@ 2022-07-28 12:34     ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 35+ messages in thread
From: Mauro Carvalho Chehab @ 2022-07-28 12:34 UTC (permalink / raw)
  To: Andi Shyti
  Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Sumit Semwal,
	linaro-mm-sig, Mauro Carvalho Chehab, Christian König,
	linux-media

Hi Andi,

On Thu, 28 Jul 2022 14:08:11 +0200
Andi Shyti <andi.shyti@linux.intel.com> wrote:

> Hi Mauro,
> 
> Pushed in drm-intel-gt-next.

Thank you!

I submitted two additional patches moving the TLB code into its own file,
and adding the documentation for it, as agreed during patch 5/6 review:

	https://patchwork.freedesktop.org/series/106806/

That should make easier to maintain TLB-related code and have such
functions properly documented.

Regards,
Mauro


> 
> Thanks,
> Andi
> 
> On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote:
> > Doing TLB invalidation cause performance regressions, like:
> > 	[424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
> > 
> > As reported at:
> > 	https://gitlab.freedesktop.org/drm/intel/-/issues/6424
> > 
> > as this is an expensive operation. So, reduce the need of it by:
> >   - checking if the engine is awake;
> >   - checking if the engine is not wedged;
> >   - batching operations.
> > 
> > Additionally, add a workaround for a known hardware issue on some GPUs.
> > 
> > In order to double-check that this series won't be introducing any regressions,
> > I used this new IGT test:
> > 
> > https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1
> > 
> > Checking the results for 3 different patchsets, on Broadwell:
> > 
> > 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB
> > invalidation and serialization patches:
> > 
> > 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> > 	Subtest close-clear: SUCCESS (10.490s)
> > 	Subtest madv-clear: SUCCESS (10.484s)
> > 	Subtest u-unmap-clear: SUCCESS (10.527s)
> > 	Subtest u-shrink-clear: SUCCESS (10.506s)
> > 	Subtest close-dumb: SUCCESS (10.165s)
> > 	Subtest madv-dumb: SUCCESS (10.177s)
> > 	Subtest u-unmap-dumb: SUCCESS (10.172s)
> > 	Subtest u-shrink-dumb: SUCCESS (10.172s)
> > 
> > 2) With the new version of the batch TLB invalidation patches from this series:
> > 
> > 	$ sudo build/tests/gem_exec_tlb|grep Subtest
> > 	Subtest close-clear: SUCCESS (10.483s)
> > 	Subtest madv-clear: SUCCESS (10.495s)
> > 	Subtest u-unmap-clear: SUCCESS (10.545s)
> > 	Subtest u-shrink-clear: SUCCESS (10.508s)
> > 	Subtest close-dumb: SUCCESS (10.172s)
> > 	Subtest madv-dumb: SUCCESS (10.169s)
> > 	Subtest u-unmap-dumb: SUCCESS (10.174s)
> > 	Subtest u-shrink-dumb: SUCCESS (10.176s)
> > 
> > 3) Changing the TLB invalidation routine to do nothing[1]:
> > 
> > 	$ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest
> > 	(gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries!
> > 	(gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries!
> > 	(gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries!
> > 	(gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries!
> > 	(gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries!
> > 	(gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries!
> > 	(gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries!
> > 	(gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384:
> > 	(gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq
> > 	(gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries!
> > 	Dynamic subtest smem0 failed.
> > 	**** DEBUG ****
> > 	(gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b
> > 	(gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0
> > 	(gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef
> > 	(gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b
> > 	****  END  ****
> > 	Subtest close-clear: FAIL (10.434s)
> > 	Subtest madv-clear: SUCCESS (10.479s)
> > 	Subtest u-unmap-clear: SUCCESS (10.512s)
> > 
> > In summary, the test does properly detect fail when TLB cache invalidation doesn't happen,
> > as shown at result (3). It also shows that both current drm-tip and drm-tip with this series
> > applied don't have TLB invalidation cache issues.
> > 
> > [1] I applied this patch on the top of drm-tip:
> > 
> > 	diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > 	index 68c2b0d8f187..0aefcd7be5e9 100644
> > 	--- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > 	+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > 	@@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt)
> > 	+	// HACK: don't do TLB invalidations!!!
> > 	+	return;
> > 	+
> > 
> > Regards,
> > Mauro
> > 
> > Chris Wilson (4):
> >   drm/i915/gt: Ignore TLB invalidations on idle engines
> >   drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
> >   drm/i915/gt: Skip TLB invalidations once wedged
> >   drm/i915/gt: Batch TLB invalidations
> > 
> > Mauro Carvalho Chehab (2):
> >   drm/i915/gt: document with_intel_gt_pm_if_awake()
> >   drm/i915/gt: describe the new tlb parameter at i915_vma_resource
> > 
> >  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  3 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_pages.c     | 25 +++---
> >  drivers/gpu/drm/i915/gt/intel_gt.c            | 77 +++++++++++++++----
> >  drivers/gpu/drm/i915/gt/intel_gt.h            | 12 ++-
> >  drivers/gpu/drm/i915/gt/intel_gt_pm.h         | 11 +++
> >  drivers/gpu/drm/i915/gt/intel_gt_types.h      | 18 ++++-
> >  drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  8 +-
> >  drivers/gpu/drm/i915/i915_vma.c               | 33 ++++++--
> >  drivers/gpu/drm/i915/i915_vma.h               |  1 +
> >  drivers/gpu/drm/i915/i915_vma_resource.c      |  9 ++-
> >  drivers/gpu/drm/i915/i915_vma_resource.h      |  6 +-
> >  11 files changed, 163 insertions(+), 40 deletions(-)
> > 
> > -- 
> > 2.36.1
> >   

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-07-28 12:34 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-27 12:29 [PATCH v3 0/6] drm/i915: reduce TLB performance regressions Mauro Carvalho Chehab
2022-07-27 12:29 ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-27 12:29 ` Mauro Carvalho Chehab
2022-07-27 12:29 ` [PATCH v3 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab
2022-07-27 12:29   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-27 12:29   ` Mauro Carvalho Chehab
2022-07-27 12:29 ` [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake() Mauro Carvalho Chehab
2022-07-27 12:29   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-27 12:29   ` Mauro Carvalho Chehab
2022-07-27 14:18   ` [Intel-gfx] " Andi Shyti
2022-07-27 14:18     ` Andi Shyti
2022-07-27 12:29 ` [PATCH v3 3/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations Mauro Carvalho Chehab
2022-07-27 12:29   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-27 12:29   ` Mauro Carvalho Chehab
2022-07-27 12:29 ` [PATCH v3 4/6] drm/i915/gt: Skip TLB invalidations once wedged Mauro Carvalho Chehab
2022-07-27 12:29   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-27 12:29   ` Mauro Carvalho Chehab
2022-07-27 12:29 ` [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab
2022-07-27 12:29   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-27 12:29   ` Mauro Carvalho Chehab
2022-07-27 14:25   ` Andi Shyti
2022-07-27 14:25     ` [Intel-gfx] " Andi Shyti
2022-07-27 14:25     ` Andi Shyti
2022-07-27 12:29 ` [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource Mauro Carvalho Chehab
2022-07-27 12:29   ` [Intel-gfx] " Mauro Carvalho Chehab
2022-07-27 12:29   ` Mauro Carvalho Chehab
2022-07-27 14:20   ` [Intel-gfx] " Andi Shyti
2022-07-27 13:12 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: reduce TLB performance regressions Patchwork
2022-07-27 13:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-07-27 15:52 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2022-07-28 12:08 ` [Intel-gfx] [PATCH v3 0/6] " Andi Shyti
2022-07-28 12:08   ` Andi Shyti
2022-07-28 12:08   ` Andi Shyti
2022-07-28 12:34   ` Mauro Carvalho Chehab
2022-07-28 12:34     ` Mauro Carvalho Chehab

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.