* [PATCH v3 0/6] drm/i915: reduce TLB performance regressions @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Mauro Carvalho Chehab, Christian König, Daniel Vetter, David Airlie, Sumit Semwal, dri-devel, intel-gfx, linaro-mm-sig, linux-kernel, linux-media Doing TLB invalidation cause performance regressions, like: [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! As reported at: https://gitlab.freedesktop.org/drm/intel/-/issues/6424 as this is an expensive operation. So, reduce the need of it by: - checking if the engine is awake; - checking if the engine is not wedged; - batching operations. Additionally, add a workaround for a known hardware issue on some GPUs. In order to double-check that this series won't be introducing any regressions, I used this new IGT test: https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 Checking the results for 3 different patchsets, on Broadwell: 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB invalidation and serialization patches: $ sudo build/tests/gem_exec_tlb|grep Subtest Subtest close-clear: SUCCESS (10.490s) Subtest madv-clear: SUCCESS (10.484s) Subtest u-unmap-clear: SUCCESS (10.527s) Subtest u-shrink-clear: SUCCESS (10.506s) Subtest close-dumb: SUCCESS (10.165s) Subtest madv-dumb: SUCCESS (10.177s) Subtest u-unmap-dumb: SUCCESS (10.172s) Subtest u-shrink-dumb: SUCCESS (10.172s) 2) With the new version of the batch TLB invalidation patches from this series: $ sudo build/tests/gem_exec_tlb|grep Subtest Subtest close-clear: SUCCESS (10.483s) Subtest madv-clear: SUCCESS (10.495s) Subtest u-unmap-clear: SUCCESS (10.545s) Subtest u-shrink-clear: SUCCESS (10.508s) Subtest close-dumb: SUCCESS (10.172s) Subtest madv-dumb: SUCCESS (10.169s) Subtest u-unmap-dumb: SUCCESS (10.174s) Subtest u-shrink-dumb: SUCCESS (10.176s) 3) Changing the TLB invalidation routine to do nothing[1]: $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! Dynamic subtest smem0 failed. **** DEBUG **** (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b **** END **** Subtest close-clear: FAIL (10.434s) Subtest madv-clear: SUCCESS (10.479s) Subtest u-unmap-clear: SUCCESS (10.512s) In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, as shown at result (3). It also shows that both current drm-tip and drm-tip with this series applied don't have TLB invalidation cache issues. [1] I applied this patch on the top of drm-tip: diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 68c2b0d8f187..0aefcd7be5e9 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) + // HACK: don't do TLB invalidations!!! + return; + Regards, Mauro Chris Wilson (4): drm/i915/gt: Ignore TLB invalidations on idle engines drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations drm/i915/gt: Skip TLB invalidations once wedged drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab (2): drm/i915/gt: document with_intel_gt_pm_if_awake() drm/i915/gt: describe the new tlb parameter at i915_vma_resource .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- drivers/gpu/drm/i915/i915_vma.h | 1 + drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- 11 files changed, 163 insertions(+), 40 deletions(-) -- 2.36.1 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Sumit Semwal, linaro-mm-sig, Mauro Carvalho Chehab, Christian König, linux-media Doing TLB invalidation cause performance regressions, like: [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! As reported at: https://gitlab.freedesktop.org/drm/intel/-/issues/6424 as this is an expensive operation. So, reduce the need of it by: - checking if the engine is awake; - checking if the engine is not wedged; - batching operations. Additionally, add a workaround for a known hardware issue on some GPUs. In order to double-check that this series won't be introducing any regressions, I used this new IGT test: https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 Checking the results for 3 different patchsets, on Broadwell: 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB invalidation and serialization patches: $ sudo build/tests/gem_exec_tlb|grep Subtest Subtest close-clear: SUCCESS (10.490s) Subtest madv-clear: SUCCESS (10.484s) Subtest u-unmap-clear: SUCCESS (10.527s) Subtest u-shrink-clear: SUCCESS (10.506s) Subtest close-dumb: SUCCESS (10.165s) Subtest madv-dumb: SUCCESS (10.177s) Subtest u-unmap-dumb: SUCCESS (10.172s) Subtest u-shrink-dumb: SUCCESS (10.172s) 2) With the new version of the batch TLB invalidation patches from this series: $ sudo build/tests/gem_exec_tlb|grep Subtest Subtest close-clear: SUCCESS (10.483s) Subtest madv-clear: SUCCESS (10.495s) Subtest u-unmap-clear: SUCCESS (10.545s) Subtest u-shrink-clear: SUCCESS (10.508s) Subtest close-dumb: SUCCESS (10.172s) Subtest madv-dumb: SUCCESS (10.169s) Subtest u-unmap-dumb: SUCCESS (10.174s) Subtest u-shrink-dumb: SUCCESS (10.176s) 3) Changing the TLB invalidation routine to do nothing[1]: $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! Dynamic subtest smem0 failed. **** DEBUG **** (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b **** END **** Subtest close-clear: FAIL (10.434s) Subtest madv-clear: SUCCESS (10.479s) Subtest u-unmap-clear: SUCCESS (10.512s) In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, as shown at result (3). It also shows that both current drm-tip and drm-tip with this series applied don't have TLB invalidation cache issues. [1] I applied this patch on the top of drm-tip: diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 68c2b0d8f187..0aefcd7be5e9 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) + // HACK: don't do TLB invalidations!!! + return; + Regards, Mauro Chris Wilson (4): drm/i915/gt: Ignore TLB invalidations on idle engines drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations drm/i915/gt: Skip TLB invalidations once wedged drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab (2): drm/i915/gt: document with_intel_gt_pm_if_awake() drm/i915/gt: describe the new tlb parameter at i915_vma_resource .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- drivers/gpu/drm/i915/i915_vma.h | 1 + drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- 11 files changed, 163 insertions(+), 40 deletions(-) -- 2.36.1 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v3 0/6] drm/i915: reduce TLB performance regressions @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Sumit Semwal, linaro-mm-sig, Mauro Carvalho Chehab, Christian König, linux-media Doing TLB invalidation cause performance regressions, like: [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! As reported at: https://gitlab.freedesktop.org/drm/intel/-/issues/6424 as this is an expensive operation. So, reduce the need of it by: - checking if the engine is awake; - checking if the engine is not wedged; - batching operations. Additionally, add a workaround for a known hardware issue on some GPUs. In order to double-check that this series won't be introducing any regressions, I used this new IGT test: https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 Checking the results for 3 different patchsets, on Broadwell: 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB invalidation and serialization patches: $ sudo build/tests/gem_exec_tlb|grep Subtest Subtest close-clear: SUCCESS (10.490s) Subtest madv-clear: SUCCESS (10.484s) Subtest u-unmap-clear: SUCCESS (10.527s) Subtest u-shrink-clear: SUCCESS (10.506s) Subtest close-dumb: SUCCESS (10.165s) Subtest madv-dumb: SUCCESS (10.177s) Subtest u-unmap-dumb: SUCCESS (10.172s) Subtest u-shrink-dumb: SUCCESS (10.172s) 2) With the new version of the batch TLB invalidation patches from this series: $ sudo build/tests/gem_exec_tlb|grep Subtest Subtest close-clear: SUCCESS (10.483s) Subtest madv-clear: SUCCESS (10.495s) Subtest u-unmap-clear: SUCCESS (10.545s) Subtest u-shrink-clear: SUCCESS (10.508s) Subtest close-dumb: SUCCESS (10.172s) Subtest madv-dumb: SUCCESS (10.169s) Subtest u-unmap-dumb: SUCCESS (10.174s) Subtest u-shrink-dumb: SUCCESS (10.176s) 3) Changing the TLB invalidation routine to do nothing[1]: $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! Dynamic subtest smem0 failed. **** DEBUG **** (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b **** END **** Subtest close-clear: FAIL (10.434s) Subtest madv-clear: SUCCESS (10.479s) Subtest u-unmap-clear: SUCCESS (10.512s) In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, as shown at result (3). It also shows that both current drm-tip and drm-tip with this series applied don't have TLB invalidation cache issues. [1] I applied this patch on the top of drm-tip: diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 68c2b0d8f187..0aefcd7be5e9 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) + // HACK: don't do TLB invalidations!!! + return; + Regards, Mauro Chris Wilson (4): drm/i915/gt: Ignore TLB invalidations on idle engines drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations drm/i915/gt: Skip TLB invalidations once wedged drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab (2): drm/i915/gt: document with_intel_gt_pm_if_awake() drm/i915/gt: describe the new tlb parameter at i915_vma_resource .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- drivers/gpu/drm/i915/i915_vma.h | 1 + drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- 11 files changed, 163 insertions(+), 40 deletions(-) -- 2.36.1 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v3 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) @ 2022-07-27 12:29 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Chris Wilson, Thomas Hellström, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen, Lucas De Marchi, Maarten Lankhorst, Matt Roper, Matthew Auld, Matthew Brost, Mauro Carvalho Chehab, Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, stable, Fei Yang, Tvrtko Ursulin From: Chris Wilson <chris.p.wilson@intel.com> Check if the device is powered down prior to any engine activity, as, on such cases, all the TLBs were already invalidated, so an explicit TLB invalidation is not needed, thus reducing the performance regression impact due to it. This becomes more significant with GuC, as it can only do so when the connection to the GuC is awake. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++---- drivers/gpu/drm/i915/gt/intel_gt.c | 17 ++++++++++------- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 3 +++ 3 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 97c820eee115..6835279943df 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -6,14 +6,15 @@ #include <drm/drm_cache.h> +#include "gt/intel_gt.h" +#include "gt/intel_gt_pm.h" + #include "i915_drv.h" #include "i915_gem_object.h" #include "i915_scatterlist.h" #include "i915_gem_lmem.h" #include "i915_gem_mman.h" -#include "gt/intel_gt.h" - void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, unsigned int sg_page_sizes) @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_gt *gt = to_gt(i915); intel_wakeref_t wakeref; - with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref) - intel_gt_invalidate_tlbs(to_gt(i915)); + with_intel_gt_pm_if_awake(gt, wakeref) + intel_gt_invalidate_tlbs(gt); } return pages; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 68c2b0d8f187..c4d43da84d8e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -12,6 +12,7 @@ #include "i915_drv.h" #include "intel_context.h" +#include "intel_engine_pm.h" #include "intel_engine_regs.h" #include "intel_ggtt_gmch.h" #include "intel_gt.h" @@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) struct drm_i915_private *i915 = gt->i915; struct intel_uncore *uncore = gt->uncore; struct intel_engine_cs *engine; + intel_engine_mask_t awake, tmp; enum intel_engine_id id; const i915_reg_t *regs; unsigned int num = 0; @@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) GEM_TRACE("\n"); - assert_rpm_wakelock_held(&i915->runtime_pm); - mutex_lock(>->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ + awake = 0; for_each_engine(engine, gt, id) { struct reg_and_bit rb; + if (!intel_engine_pm_is_awake(engine)) + continue; + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue; intel_uncore_write_fw(uncore, rb.reg, rb.bit); + awake |= engine->mask; } spin_unlock_irq(&uncore->lock); - for_each_engine(engine, gt, id) { + for_each_engine_masked(engine, gt, awake, tmp) { + struct reg_and_bit rb; + /* * HW architecture suggest typical invalidation time at 40us, * with pessimistic cases up to 100us and a recommendation to @@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4; - struct reg_and_bit rb; rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); - if (!i915_mmio_reg_offset(rb.reg)) - continue; - if (__intel_wait_for_register_fw(uncore, rb.reg, rb.bit, 0, timeout_us, timeout_ms, diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index bc898df7a48c..a334787a4939 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt) for (tmp = 1, intel_gt_pm_get(gt); tmp; \ intel_gt_pm_put(gt), tmp = 0) +#define with_intel_gt_pm_if_awake(gt, wf) \ + for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0) + static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt) { return intel_wakeref_wait_for_idle(>->wakeref); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [Intel-gfx] [PATCH v3 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: David Airlie, dri-devel, Chris Wilson, Matthew Auld, Dave Airlie, Thomas Hellström, Lucas De Marchi, intel-gfx, Rodrigo Vivi, Mauro Carvalho Chehab, linux-kernel, stable From: Chris Wilson <chris.p.wilson@intel.com> Check if the device is powered down prior to any engine activity, as, on such cases, all the TLBs were already invalidated, so an explicit TLB invalidation is not needed, thus reducing the performance regression impact due to it. This becomes more significant with GuC, as it can only do so when the connection to the GuC is awake. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++---- drivers/gpu/drm/i915/gt/intel_gt.c | 17 ++++++++++------- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 3 +++ 3 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 97c820eee115..6835279943df 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -6,14 +6,15 @@ #include <drm/drm_cache.h> +#include "gt/intel_gt.h" +#include "gt/intel_gt_pm.h" + #include "i915_drv.h" #include "i915_gem_object.h" #include "i915_scatterlist.h" #include "i915_gem_lmem.h" #include "i915_gem_mman.h" -#include "gt/intel_gt.h" - void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, unsigned int sg_page_sizes) @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_gt *gt = to_gt(i915); intel_wakeref_t wakeref; - with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref) - intel_gt_invalidate_tlbs(to_gt(i915)); + with_intel_gt_pm_if_awake(gt, wakeref) + intel_gt_invalidate_tlbs(gt); } return pages; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 68c2b0d8f187..c4d43da84d8e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -12,6 +12,7 @@ #include "i915_drv.h" #include "intel_context.h" +#include "intel_engine_pm.h" #include "intel_engine_regs.h" #include "intel_ggtt_gmch.h" #include "intel_gt.h" @@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) struct drm_i915_private *i915 = gt->i915; struct intel_uncore *uncore = gt->uncore; struct intel_engine_cs *engine; + intel_engine_mask_t awake, tmp; enum intel_engine_id id; const i915_reg_t *regs; unsigned int num = 0; @@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) GEM_TRACE("\n"); - assert_rpm_wakelock_held(&i915->runtime_pm); - mutex_lock(>->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ + awake = 0; for_each_engine(engine, gt, id) { struct reg_and_bit rb; + if (!intel_engine_pm_is_awake(engine)) + continue; + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue; intel_uncore_write_fw(uncore, rb.reg, rb.bit); + awake |= engine->mask; } spin_unlock_irq(&uncore->lock); - for_each_engine(engine, gt, id) { + for_each_engine_masked(engine, gt, awake, tmp) { + struct reg_and_bit rb; + /* * HW architecture suggest typical invalidation time at 40us, * with pessimistic cases up to 100us and a recommendation to @@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4; - struct reg_and_bit rb; rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); - if (!i915_mmio_reg_offset(rb.reg)) - continue; - if (__intel_wait_for_register_fw(uncore, rb.reg, rb.bit, 0, timeout_us, timeout_ms, diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index bc898df7a48c..a334787a4939 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt) for (tmp = 1, intel_gt_pm_get(gt); tmp; \ intel_gt_pm_put(gt), tmp = 0) +#define with_intel_gt_pm_if_awake(gt, wf) \ + for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0) + static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt) { return intel_wakeref_wait_for_idle(>->wakeref); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: David Airlie, dri-devel, Daniele Ceraolo Spurio, Fei Yang, Matthew Brost, Chris Wilson, Matthew Auld, Andi Shyti, Dave Airlie, Thomas Hellström, Lucas De Marchi, intel-gfx, Rodrigo Vivi, Mauro Carvalho Chehab, Tvrtko Ursulin, Tvrtko Ursulin, linux-kernel, stable, John Harrison From: Chris Wilson <chris.p.wilson@intel.com> Check if the device is powered down prior to any engine activity, as, on such cases, all the TLBs were already invalidated, so an explicit TLB invalidation is not needed, thus reducing the performance regression impact due to it. This becomes more significant with GuC, as it can only do so when the connection to the GuC is awake. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ++++++---- drivers/gpu/drm/i915/gt/intel_gt.c | 17 ++++++++++------- drivers/gpu/drm/i915/gt/intel_gt_pm.h | 3 +++ 3 files changed, 19 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 97c820eee115..6835279943df 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -6,14 +6,15 @@ #include <drm/drm_cache.h> +#include "gt/intel_gt.h" +#include "gt/intel_gt_pm.h" + #include "i915_drv.h" #include "i915_gem_object.h" #include "i915_scatterlist.h" #include "i915_gem_lmem.h" #include "i915_gem_mman.h" -#include "gt/intel_gt.h" - void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, unsigned int sg_page_sizes) @@ -217,10 +218,11 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_gt *gt = to_gt(i915); intel_wakeref_t wakeref; - with_intel_runtime_pm_if_active(&i915->runtime_pm, wakeref) - intel_gt_invalidate_tlbs(to_gt(i915)); + with_intel_gt_pm_if_awake(gt, wakeref) + intel_gt_invalidate_tlbs(gt); } return pages; diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 68c2b0d8f187..c4d43da84d8e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -12,6 +12,7 @@ #include "i915_drv.h" #include "intel_context.h" +#include "intel_engine_pm.h" #include "intel_engine_regs.h" #include "intel_ggtt_gmch.h" #include "intel_gt.h" @@ -924,6 +925,7 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) struct drm_i915_private *i915 = gt->i915; struct intel_uncore *uncore = gt->uncore; struct intel_engine_cs *engine; + intel_engine_mask_t awake, tmp; enum intel_engine_id id; const i915_reg_t *regs; unsigned int num = 0; @@ -947,26 +949,31 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) GEM_TRACE("\n"); - assert_rpm_wakelock_held(&i915->runtime_pm); - mutex_lock(>->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ + awake = 0; for_each_engine(engine, gt, id) { struct reg_and_bit rb; + if (!intel_engine_pm_is_awake(engine)) + continue; + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue; intel_uncore_write_fw(uncore, rb.reg, rb.bit); + awake |= engine->mask; } spin_unlock_irq(&uncore->lock); - for_each_engine(engine, gt, id) { + for_each_engine_masked(engine, gt, awake, tmp) { + struct reg_and_bit rb; + /* * HW architecture suggest typical invalidation time at 40us, * with pessimistic cases up to 100us and a recommendation to @@ -974,12 +981,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4; - struct reg_and_bit rb; rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); - if (!i915_mmio_reg_offset(rb.reg)) - continue; - if (__intel_wait_for_register_fw(uncore, rb.reg, rb.bit, 0, timeout_us, timeout_ms, diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index bc898df7a48c..a334787a4939 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -55,6 +55,9 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt) for (tmp = 1, intel_gt_pm_get(gt); tmp; \ intel_gt_pm_put(gt), tmp = 0) +#define with_intel_gt_pm_if_awake(gt, wf) \ + for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0) + static inline int intel_gt_pm_wait_for_idle(struct intel_gt *gt) { return intel_wakeref_wait_for_idle(>->wakeref); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake() 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) @ 2022-07-27 12:29 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Mauro Carvalho Chehab, Chris Wilson, Daniel Vetter, David Airlie, Jani Nikula, John Harrison, Joonas Lahtinen, Matthew Brost, Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, Tvrtko Ursulin Add a kernel-doc markup to document this new macro. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt_pm.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index a334787a4939..6c9a46452364 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -55,6 +55,14 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt) for (tmp = 1, intel_gt_pm_get(gt); tmp; \ intel_gt_pm_put(gt), tmp = 0) +/** + * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent + * it to sleep, run some code and then asynchrously put the reference + * away. + * + * @gt: pointer to the gt + * @wf: pointer to a temporary wakeref. + */ #define with_intel_gt_pm_if_awake(gt, wf) \ for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0) -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [Intel-gfx] [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake() @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: David Airlie, dri-devel, linux-kernel, Chris Wilson, Rodrigo Vivi, Mauro Carvalho Chehab, intel-gfx Add a kernel-doc markup to document this new macro. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt_pm.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index a334787a4939..6c9a46452364 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -55,6 +55,14 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt) for (tmp = 1, intel_gt_pm_get(gt); tmp; \ intel_gt_pm_put(gt), tmp = 0) +/** + * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent + * it to sleep, run some code and then asynchrously put the reference + * away. + * + * @gt: pointer to the gt + * @wf: pointer to a temporary wakeref. + */ #define with_intel_gt_pm_if_awake(gt, wf) \ for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0) -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake() @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Matthew Brost, Tvrtko Ursulin, Tvrtko Ursulin, David Airlie, dri-devel, linux-kernel, Chris Wilson, Rodrigo Vivi, Mauro Carvalho Chehab, intel-gfx, John Harrison Add a kernel-doc markup to document this new macro. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt_pm.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index a334787a4939..6c9a46452364 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -55,6 +55,14 @@ static inline void intel_gt_pm_might_put(struct intel_gt *gt) for (tmp = 1, intel_gt_pm_get(gt); tmp; \ intel_gt_pm_put(gt), tmp = 0) +/** + * with_intel_gt_pm_if_awake - if GT is PM awake, get a reference to prevent + * it to sleep, run some code and then asynchrously put the reference + * away. + * + * @gt: pointer to the gt + * @wf: pointer to a temporary wakeref. + */ #define with_intel_gt_pm_if_awake(gt, wf) \ for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0) -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake() 2022-07-27 12:29 ` Mauro Carvalho Chehab @ 2022-07-27 14:18 ` Andi Shyti -1 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-27 14:18 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: David Airlie, dri-devel, linux-kernel, Chris Wilson, Rodrigo Vivi, intel-gfx Hi Mauro, > Add a kernel-doc markup to document this new macro. > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Andi ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake() @ 2022-07-27 14:18 ` Andi Shyti 0 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-27 14:18 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: David Airlie, intel-gfx, linux-kernel, Chris Wilson, dri-devel, Rodrigo Vivi Hi Mauro, > Add a kernel-doc markup to document this new macro. > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Andi ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v3 3/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) @ 2022-07-27 12:29 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen, Lucas De Marchi, Matt Roper, Mauro Carvalho Chehab, Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, stable, Fei Yang, Tvrtko Ursulin, Thomas Hellström From: Chris Wilson <chris.p.wilson@intel.com> Ensure that the TLB of the OA unit is also invalidated on gen12 HW, as just invalidating the TLB of an engine is not enough. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index c4d43da84d8e..1d84418e8676 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -11,6 +11,7 @@ #include "pxp/intel_pxp.h" #include "i915_drv.h" +#include "i915_perf_oa_regs.h" #include "intel_context.h" #include "intel_engine_pm.h" #include "intel_engine_regs.h" @@ -969,6 +970,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) awake |= engine->mask; } + /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ + if (awake && + (IS_TIGERLAKE(i915) || + IS_DG1(i915) || + IS_ROCKETLAKE(i915) || + IS_ALDERLAKE_S(i915) || + IS_ALDERLAKE_P(i915))) + intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); + spin_unlock_irq(&uncore->lock); for_each_engine_masked(engine, gt, awake, tmp) { -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [Intel-gfx] [PATCH v3 3/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Thomas Hellström, David Airlie, dri-devel, Lucas De Marchi, linux-kernel, Chris Wilson, Rodrigo Vivi, Dave Airlie, stable, Mauro Carvalho Chehab, intel-gfx From: Chris Wilson <chris.p.wilson@intel.com> Ensure that the TLB of the OA unit is also invalidated on gen12 HW, as just invalidating the TLB of an engine is not enough. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index c4d43da84d8e..1d84418e8676 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -11,6 +11,7 @@ #include "pxp/intel_pxp.h" #include "i915_drv.h" +#include "i915_perf_oa_regs.h" #include "intel_context.h" #include "intel_engine_pm.h" #include "intel_engine_regs.h" @@ -969,6 +970,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) awake |= engine->mask; } + /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ + if (awake && + (IS_TIGERLAKE(i915) || + IS_DG1(i915) || + IS_ROCKETLAKE(i915) || + IS_ALDERLAKE_S(i915) || + IS_ALDERLAKE_P(i915))) + intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); + spin_unlock_irq(&uncore->lock); for_each_engine_masked(engine, gt, awake, tmp) { -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 3/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Tvrtko Ursulin, Thomas Hellström, Andi Shyti, Tvrtko Ursulin, David Airlie, dri-devel, Lucas De Marchi, linux-kernel, Chris Wilson, Daniele Ceraolo Spurio, Rodrigo Vivi, Dave Airlie, stable, Mauro Carvalho Chehab, intel-gfx, Fei Yang From: Chris Wilson <chris.p.wilson@intel.com> Ensure that the TLB of the OA unit is also invalidated on gen12 HW, as just invalidating the TLB of an engine is not enough. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index c4d43da84d8e..1d84418e8676 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -11,6 +11,7 @@ #include "pxp/intel_pxp.h" #include "i915_drv.h" +#include "i915_perf_oa_regs.h" #include "intel_context.h" #include "intel_engine_pm.h" #include "intel_engine_regs.h" @@ -969,6 +970,15 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) awake |= engine->mask; } + /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ + if (awake && + (IS_TIGERLAKE(i915) || + IS_DG1(i915) || + IS_ROCKETLAKE(i915) || + IS_ALDERLAKE_S(i915) || + IS_ALDERLAKE_P(i915))) + intel_uncore_write_fw(uncore, GEN12_OA_TLB_INV_CR, 1); + spin_unlock_irq(&uncore->lock); for_each_engine_masked(engine, gt, awake, tmp) { -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 4/6] drm/i915/gt: Skip TLB invalidations once wedged 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) @ 2022-07-27 12:29 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Chris Wilson, Andi Shyti, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen, Lucas De Marchi, Matt Roper, Mauro Carvalho Chehab, Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel, stable, Fei Yang, Tvrtko Ursulin, Thomas Hellström From: Chris Wilson <chris.p.wilson@intel.com> Skip all further TLB invalidations once the device is wedged and had been reset, as, on such cases, it can no longer process instructions on the GPU and the user no longer has access to the TLB's in each engine. So, an attempt to do a TLB cache invalidation will produce a timeout. That helps to reduce the performance regression introduced by TLB invalidate logic. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 1d84418e8676..5c55a90672f4 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -934,6 +934,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) return; + if (intel_gt_is_wedged(gt)) + return; + if (GRAPHICS_VER(i915) == 12) { regs = gen12_regs; num = ARRAY_SIZE(gen12_regs); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [Intel-gfx] [PATCH v3 4/6] drm/i915/gt: Skip TLB invalidations once wedged @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Thomas Hellström, David Airlie, dri-devel, Lucas De Marchi, linux-kernel, Chris Wilson, Rodrigo Vivi, Dave Airlie, stable, Mauro Carvalho Chehab, intel-gfx From: Chris Wilson <chris.p.wilson@intel.com> Skip all further TLB invalidations once the device is wedged and had been reset, as, on such cases, it can no longer process instructions on the GPU and the user no longer has access to the TLB's in each engine. So, an attempt to do a TLB cache invalidation will produce a timeout. That helps to reduce the performance regression introduced by TLB invalidate logic. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 1d84418e8676..5c55a90672f4 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -934,6 +934,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) return; + if (intel_gt_is_wedged(gt)) + return; + if (GRAPHICS_VER(i915) == 12) { regs = gen12_regs; num = ARRAY_SIZE(gen12_regs); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 4/6] drm/i915/gt: Skip TLB invalidations once wedged @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Tvrtko Ursulin, Thomas Hellström, Andi Shyti, Tvrtko Ursulin, David Airlie, dri-devel, Lucas De Marchi, linux-kernel, Chris Wilson, Daniele Ceraolo Spurio, Rodrigo Vivi, Dave Airlie, stable, Mauro Carvalho Chehab, intel-gfx, Fei Yang From: Chris Wilson <chris.p.wilson@intel.com> Skip all further TLB invalidations once the device is wedged and had been reset, as, on such cases, it can no longer process instructions on the GPU and the user no longer has access to the TLB's in each engine. So, an attempt to do a TLB cache invalidation will produce a timeout. That helps to reduce the performance regression introduced by TLB invalidate logic. Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/gt/intel_gt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 1d84418e8676..5c55a90672f4 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -934,6 +934,9 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) return; + if (intel_gt_is_wedged(gt)) + return; + if (GRAPHICS_VER(i915) == 12) { regs = gen12_regs; num = ARRAY_SIZE(gen12_regs); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) @ 2022-07-27 12:29 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Chris Wilson, Christian König, Michał Winiarski, Thomas Hellström, Andi Shyti, Andrzej Hajda, Ashutosh Dixit, Ayaz A Siddiqui, Casey Bowman, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen, Lucas De Marchi, Maarten Lankhorst, Matt Roper, Matthew Auld, Mauro Carvalho Chehab, Michael Cheng, Nirmoy Das, Ramalingam C, Rodrigo Vivi, Sumit Semwal, Tomas Winkler, Tvrtko Ursulin, dri-devel, intel-gfx, linaro-mm-sig, linux-kernel, linux-media, stable, Tvrtko Ursulin, Fei Yang From: Chris Wilson <chris.p.wilson@intel.com> Invalidate TLB in batches, in order to reduce performance regressions. Currently, every caller performs a full barrier around a TLB invalidation, ignoring all other invalidations that may have already removed their PTEs from the cache. As this is a synchronous operation and can be quite slow, we cause multiple threads to contend on the TLB invalidate mutex blocking userspace. We only need to invalidate the TLB once after replacing our PTE to ensure that there is no possible continued access to the physical address before releasing our pages. By tracking a seqno for each full TLB invalidate we can quickly determine if one has been performed since rewriting the PTE, and only if necessary trigger one for ourselves. That helps to reduce the performance regression introduced by TLB invalidate logic. [mchehab: rebased to not require moving the code to a separate file] Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 21 +++++--- drivers/gpu/drm/i915/gt/intel_gt.c | 53 ++++++++++++++----- drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++++- drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 ++- drivers/gpu/drm/i915/i915_vma.c | 33 +++++++++--- drivers/gpu/drm/i915/i915_vma.h | 1 + drivers/gpu/drm/i915/i915_vma_resource.c | 5 +- drivers/gpu/drm/i915/i915_vma_resource.h | 6 ++- 10 files changed, 125 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 5cf36a130061..9f6b14ec189a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -335,7 +335,6 @@ struct drm_i915_gem_object { #define I915_BO_READONLY BIT(7) #define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */ #define I915_BO_PROTECTED BIT(9) -#define I915_BO_WAS_BOUND_BIT 10 /** * @mem_flags - Mutable placement-related flags * @@ -616,6 +615,8 @@ struct drm_i915_gem_object { * pages were last acquired. */ bool dirty:1; + + u32 tlb; } mm; struct { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 6835279943df..8357dbdcab5c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr) vunmap(ptr); } +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_gt *gt = to_gt(i915); + + if (!obj->mm.tlb) + return; + + intel_gt_invalidate_tlb(gt, obj->mm.tlb); + obj->mm.tlb = 0; +} + struct sg_table * __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) { @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) __i915_gem_object_reset_page_iter(obj); obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0; - if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct intel_gt *gt = to_gt(i915); - intel_wakeref_t wakeref; - - with_intel_gt_pm_if_awake(gt, wakeref) - intel_gt_invalidate_tlbs(gt); - } + flush_tlb_invalidate(obj); return pages; } diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 5c55a90672f4..f435e06125aa 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt) { spin_lock_init(>->irq_lock); - mutex_init(>->tlb_invalidate_lock); - INIT_LIST_HEAD(>->closed_vma); spin_lock_init(>->closed_lock); @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt) intel_gt_init_reset(gt); intel_gt_init_requests(gt); intel_gt_init_timelines(gt); + mutex_init(>->tlb.invalidate_lock); + seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); intel_gt_pm_init_early(gt); intel_uc_init_early(>->uc); @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) intel_gt_fini_requests(gt); intel_gt_fini_reset(gt); intel_gt_fini_timelines(gt); + mutex_destroy(>->tlb.invalidate_lock); intel_engines_free(gt); } } @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, return rb; } -void intel_gt_invalidate_tlbs(struct intel_gt *gt) +static void mmio_invalidate_full(struct intel_gt *gt) { static const i915_reg_t gen8_regs[] = { [RENDER_CLASS] = GEN8_RTCR, @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) const i915_reg_t *regs; unsigned int num = 0; - if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) - return; - - if (intel_gt_is_wedged(gt)) - return; - if (GRAPHICS_VER(i915) == 12) { regs = gen12_regs; num = ARRAY_SIZE(gen12_regs); @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) "Platform does not implement TLB invalidation!")) return; - GEM_TRACE("\n"); - - mutex_lock(>->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) awake |= engine->mask; } + GT_TRACE(gt, "invalidated engines %08x\n", awake); + /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ if (awake && (IS_TIGERLAKE(i915) || @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) * transitions. */ intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); - mutex_unlock(>->tlb_invalidate_lock); +} + +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) +{ + u32 cur = intel_gt_tlb_seqno(gt); + + /* Only skip if a *full* TLB invalidate barrier has passed */ + return (s32)(cur - ALIGN(seqno, 2)) > 0; +} + +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) +{ + intel_wakeref_t wakeref; + + if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) + return; + + if (intel_gt_is_wedged(gt)) + return; + + if (tlb_seqno_passed(gt, seqno)) + return; + + with_intel_gt_pm_if_awake(gt, wakeref) { + mutex_lock(>->tlb.invalidate_lock); + if (tlb_seqno_passed(gt, seqno)) + goto unlock; + + mmio_invalidate_full(gt); + + write_seqcount_invalidate(>->tlb.seqno); +unlock: + mutex_unlock(>->tlb.invalidate_lock); + } } diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 82d6f248d876..40b06adf509a 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info, void intel_gt_watchdog_work(struct work_struct *work); -void intel_gt_invalidate_tlbs(struct intel_gt *gt); +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) +{ + return seqprop_sequence(>->tlb.seqno); +} + +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) +{ + return intel_gt_tlb_seqno(gt) | 1; +} + +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); #endif /* __INTEL_GT_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index df708802889d..3804a583382b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -11,6 +11,7 @@ #include <linux/llist.h> #include <linux/mutex.h> #include <linux/notifier.h> +#include <linux/seqlock.h> #include <linux/spinlock.h> #include <linux/types.h> #include <linux/workqueue.h> @@ -83,7 +84,22 @@ struct intel_gt { struct intel_uc uc; struct intel_gsc gsc; - struct mutex tlb_invalidate_lock; + struct { + /* Serialize global tlb invalidations */ + struct mutex invalidate_lock; + + /* + * Batch TLB invalidations + * + * After unbinding the PTE, we need to ensure the TLB + * are invalidated prior to releasing the physical pages. + * But we only need one such invalidation for all unbinds, + * so we track how many TLB invalidations have been + * performed since unbind the PTE and only emit an extra + * invalidate if no full barrier has been passed. + */ + seqcount_mutex_t seqno; + } tlb; struct i915_wa_list wa_list; diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index d8b94d638559..2da6c82a8bd2 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm, void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma_resource *vma_res) { - if (vma_res->allocated) - vm->clear_range(vm, vma_res->start, vma_res->vma_size); + if (!vma_res->allocated) + return; + + vm->clear_range(vm, vma_res->start, vma_res->vma_size); + if (vma_res->tlb) + vma_invalidate_tlb(vm, *vma_res->tlb); } static unsigned long pd_count(u64 size, int shift) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index ef3b04c7e153..84a9ccbc5fc5 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma, bind_flags); } - set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags); - atomic_or(bind_flags, &vma->flags); return 0; } @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) return err; } +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) +{ + /* + * Before we release the pages that were bound by this vma, we + * must invalidate all the TLBs that may still have a reference + * back to our physical address. It only needs to be done once, + * so after updating the PTE to point away from the pages, record + * the most recent TLB invalidation seqno, and if we have not yet + * flushed the TLBs upon release, perform a full invalidation. + */ + WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); +} + static void __vma_put_pages(struct i915_vma *vma, unsigned int count) { /* We allocate under vma_get_pages, so beware the shrinker */ @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) vma->vm->skip_pte_rewrite; trace_i915_vma_unbind(vma); - unbind_fence = i915_vma_resource_unbind(vma_res); + if (async) + unbind_fence = i915_vma_resource_unbind(vma_res, + &vma->obj->mm.tlb); + else + unbind_fence = i915_vma_resource_unbind(vma_res, NULL); + vma->resource = NULL; atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) i915_vma_detach(vma); - if (!async && unbind_fence) { - dma_fence_wait(unbind_fence, false); - dma_fence_put(unbind_fence); - unbind_fence = NULL; + if (!async) { + if (unbind_fence) { + dma_fence_wait(unbind_fence, false); + dma_fence_put(unbind_fence); + unbind_fence = NULL; + } + vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb); } /* diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 88ca0bd9c900..5048eed536da 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma, u64 size, u64 alignment, u64 flags); void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); void i915_vma_revoke_mmap(struct i915_vma *vma); +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb); struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); int __i915_vma_unbind(struct i915_vma *vma); int __must_check i915_vma_unbind(struct i915_vma *vma); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 27c55027387a..5a67995ea5fe 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, * Return: A refcounted pointer to a dma-fence that signals when unbinding is * complete. */ -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res) +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, + u32 *tlb) { struct i915_address_space *vm = vma_res->vm; + vma_res->tlb = tlb; + /* Reference for the sw fence */ i915_vma_resource_get(vma_res); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index 5d8427caa2ba..06923d1816e7 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -67,6 +67,7 @@ struct i915_page_sizes { * taken when the unbind is scheduled. * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting * needs to be skipped for unbind. + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -119,6 +120,8 @@ struct i915_vma_resource { bool immediate_unbind:1; bool needs_wakeref:1; bool skip_pte_rewrite:1; + + u32 *tlb; }; bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void); void i915_vma_resource_free(struct i915_vma_resource *vma_res); -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, + u32 *tlb); void __i915_vma_resource_init(struct i915_vma_resource *vma_res); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [Intel-gfx] [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: David Airlie, dri-devel, Andrzej Hajda, Sumit Semwal, Michael Cheng, Chris Wilson, Dave Airlie, Tomas Winkler, Matthew Auld, Thomas Hellström, Lucas De Marchi, intel-gfx, linaro-mm-sig, Rodrigo Vivi, Mauro Carvalho Chehab, Michał Winiarski, linux-kernel, stable, linux-media, Christian König From: Chris Wilson <chris.p.wilson@intel.com> Invalidate TLB in batches, in order to reduce performance regressions. Currently, every caller performs a full barrier around a TLB invalidation, ignoring all other invalidations that may have already removed their PTEs from the cache. As this is a synchronous operation and can be quite slow, we cause multiple threads to contend on the TLB invalidate mutex blocking userspace. We only need to invalidate the TLB once after replacing our PTE to ensure that there is no possible continued access to the physical address before releasing our pages. By tracking a seqno for each full TLB invalidate we can quickly determine if one has been performed since rewriting the PTE, and only if necessary trigger one for ourselves. That helps to reduce the performance regression introduced by TLB invalidate logic. [mchehab: rebased to not require moving the code to a separate file] Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 21 +++++--- drivers/gpu/drm/i915/gt/intel_gt.c | 53 ++++++++++++++----- drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++++- drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 ++- drivers/gpu/drm/i915/i915_vma.c | 33 +++++++++--- drivers/gpu/drm/i915/i915_vma.h | 1 + drivers/gpu/drm/i915/i915_vma_resource.c | 5 +- drivers/gpu/drm/i915/i915_vma_resource.h | 6 ++- 10 files changed, 125 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 5cf36a130061..9f6b14ec189a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -335,7 +335,6 @@ struct drm_i915_gem_object { #define I915_BO_READONLY BIT(7) #define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */ #define I915_BO_PROTECTED BIT(9) -#define I915_BO_WAS_BOUND_BIT 10 /** * @mem_flags - Mutable placement-related flags * @@ -616,6 +615,8 @@ struct drm_i915_gem_object { * pages were last acquired. */ bool dirty:1; + + u32 tlb; } mm; struct { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 6835279943df..8357dbdcab5c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr) vunmap(ptr); } +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_gt *gt = to_gt(i915); + + if (!obj->mm.tlb) + return; + + intel_gt_invalidate_tlb(gt, obj->mm.tlb); + obj->mm.tlb = 0; +} + struct sg_table * __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) { @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) __i915_gem_object_reset_page_iter(obj); obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0; - if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct intel_gt *gt = to_gt(i915); - intel_wakeref_t wakeref; - - with_intel_gt_pm_if_awake(gt, wakeref) - intel_gt_invalidate_tlbs(gt); - } + flush_tlb_invalidate(obj); return pages; } diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 5c55a90672f4..f435e06125aa 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt) { spin_lock_init(>->irq_lock); - mutex_init(>->tlb_invalidate_lock); - INIT_LIST_HEAD(>->closed_vma); spin_lock_init(>->closed_lock); @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt) intel_gt_init_reset(gt); intel_gt_init_requests(gt); intel_gt_init_timelines(gt); + mutex_init(>->tlb.invalidate_lock); + seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); intel_gt_pm_init_early(gt); intel_uc_init_early(>->uc); @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) intel_gt_fini_requests(gt); intel_gt_fini_reset(gt); intel_gt_fini_timelines(gt); + mutex_destroy(>->tlb.invalidate_lock); intel_engines_free(gt); } } @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, return rb; } -void intel_gt_invalidate_tlbs(struct intel_gt *gt) +static void mmio_invalidate_full(struct intel_gt *gt) { static const i915_reg_t gen8_regs[] = { [RENDER_CLASS] = GEN8_RTCR, @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) const i915_reg_t *regs; unsigned int num = 0; - if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) - return; - - if (intel_gt_is_wedged(gt)) - return; - if (GRAPHICS_VER(i915) == 12) { regs = gen12_regs; num = ARRAY_SIZE(gen12_regs); @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) "Platform does not implement TLB invalidation!")) return; - GEM_TRACE("\n"); - - mutex_lock(>->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) awake |= engine->mask; } + GT_TRACE(gt, "invalidated engines %08x\n", awake); + /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ if (awake && (IS_TIGERLAKE(i915) || @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) * transitions. */ intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); - mutex_unlock(>->tlb_invalidate_lock); +} + +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) +{ + u32 cur = intel_gt_tlb_seqno(gt); + + /* Only skip if a *full* TLB invalidate barrier has passed */ + return (s32)(cur - ALIGN(seqno, 2)) > 0; +} + +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) +{ + intel_wakeref_t wakeref; + + if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) + return; + + if (intel_gt_is_wedged(gt)) + return; + + if (tlb_seqno_passed(gt, seqno)) + return; + + with_intel_gt_pm_if_awake(gt, wakeref) { + mutex_lock(>->tlb.invalidate_lock); + if (tlb_seqno_passed(gt, seqno)) + goto unlock; + + mmio_invalidate_full(gt); + + write_seqcount_invalidate(>->tlb.seqno); +unlock: + mutex_unlock(>->tlb.invalidate_lock); + } } diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 82d6f248d876..40b06adf509a 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info, void intel_gt_watchdog_work(struct work_struct *work); -void intel_gt_invalidate_tlbs(struct intel_gt *gt); +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) +{ + return seqprop_sequence(>->tlb.seqno); +} + +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) +{ + return intel_gt_tlb_seqno(gt) | 1; +} + +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); #endif /* __INTEL_GT_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index df708802889d..3804a583382b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -11,6 +11,7 @@ #include <linux/llist.h> #include <linux/mutex.h> #include <linux/notifier.h> +#include <linux/seqlock.h> #include <linux/spinlock.h> #include <linux/types.h> #include <linux/workqueue.h> @@ -83,7 +84,22 @@ struct intel_gt { struct intel_uc uc; struct intel_gsc gsc; - struct mutex tlb_invalidate_lock; + struct { + /* Serialize global tlb invalidations */ + struct mutex invalidate_lock; + + /* + * Batch TLB invalidations + * + * After unbinding the PTE, we need to ensure the TLB + * are invalidated prior to releasing the physical pages. + * But we only need one such invalidation for all unbinds, + * so we track how many TLB invalidations have been + * performed since unbind the PTE and only emit an extra + * invalidate if no full barrier has been passed. + */ + seqcount_mutex_t seqno; + } tlb; struct i915_wa_list wa_list; diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index d8b94d638559..2da6c82a8bd2 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm, void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma_resource *vma_res) { - if (vma_res->allocated) - vm->clear_range(vm, vma_res->start, vma_res->vma_size); + if (!vma_res->allocated) + return; + + vm->clear_range(vm, vma_res->start, vma_res->vma_size); + if (vma_res->tlb) + vma_invalidate_tlb(vm, *vma_res->tlb); } static unsigned long pd_count(u64 size, int shift) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index ef3b04c7e153..84a9ccbc5fc5 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma, bind_flags); } - set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags); - atomic_or(bind_flags, &vma->flags); return 0; } @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) return err; } +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) +{ + /* + * Before we release the pages that were bound by this vma, we + * must invalidate all the TLBs that may still have a reference + * back to our physical address. It only needs to be done once, + * so after updating the PTE to point away from the pages, record + * the most recent TLB invalidation seqno, and if we have not yet + * flushed the TLBs upon release, perform a full invalidation. + */ + WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); +} + static void __vma_put_pages(struct i915_vma *vma, unsigned int count) { /* We allocate under vma_get_pages, so beware the shrinker */ @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) vma->vm->skip_pte_rewrite; trace_i915_vma_unbind(vma); - unbind_fence = i915_vma_resource_unbind(vma_res); + if (async) + unbind_fence = i915_vma_resource_unbind(vma_res, + &vma->obj->mm.tlb); + else + unbind_fence = i915_vma_resource_unbind(vma_res, NULL); + vma->resource = NULL; atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) i915_vma_detach(vma); - if (!async && unbind_fence) { - dma_fence_wait(unbind_fence, false); - dma_fence_put(unbind_fence); - unbind_fence = NULL; + if (!async) { + if (unbind_fence) { + dma_fence_wait(unbind_fence, false); + dma_fence_put(unbind_fence); + unbind_fence = NULL; + } + vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb); } /* diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 88ca0bd9c900..5048eed536da 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma, u64 size, u64 alignment, u64 flags); void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); void i915_vma_revoke_mmap(struct i915_vma *vma); +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb); struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); int __i915_vma_unbind(struct i915_vma *vma); int __must_check i915_vma_unbind(struct i915_vma *vma); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 27c55027387a..5a67995ea5fe 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, * Return: A refcounted pointer to a dma-fence that signals when unbinding is * complete. */ -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res) +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, + u32 *tlb) { struct i915_address_space *vm = vma_res->vm; + vma_res->tlb = tlb; + /* Reference for the sw fence */ i915_vma_resource_get(vma_res); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index 5d8427caa2ba..06923d1816e7 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -67,6 +67,7 @@ struct i915_page_sizes { * taken when the unbind is scheduled. * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting * needs to be skipped for unbind. + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -119,6 +120,8 @@ struct i915_vma_resource { bool immediate_unbind:1; bool needs_wakeref:1; bool skip_pte_rewrite:1; + + u32 *tlb; }; bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void); void i915_vma_resource_free(struct i915_vma_resource *vma_res); -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, + u32 *tlb); void __i915_vma_resource_init(struct i915_vma_resource *vma_res); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Fei Yang, David Airlie, dri-devel, Daniele Ceraolo Spurio, Andrzej Hajda, Sumit Semwal, Michael Cheng, Chris Wilson, Ayaz A Siddiqui, Andi Shyti, Dave Airlie, Tomas Winkler, Matthew Auld, Thomas Hellström, Casey Bowman, Lucas De Marchi, intel-gfx, linaro-mm-sig, Nirmoy Das, Rodrigo Vivi, Mauro Carvalho Chehab, Tvrtko Ursulin, Michał Winiarski, Tvrtko Ursulin, linux-kernel, stable, Ashutosh Dixit, linux-media, Christian König From: Chris Wilson <chris.p.wilson@intel.com> Invalidate TLB in batches, in order to reduce performance regressions. Currently, every caller performs a full barrier around a TLB invalidation, ignoring all other invalidations that may have already removed their PTEs from the cache. As this is a synchronous operation and can be quite slow, we cause multiple threads to contend on the TLB invalidate mutex blocking userspace. We only need to invalidate the TLB once after replacing our PTE to ensure that there is no possible continued access to the physical address before releasing our pages. By tracking a seqno for each full TLB invalidate we can quickly determine if one has been performed since rewriting the PTE, and only if necessary trigger one for ourselves. That helps to reduce the performance regression introduced by TLB invalidate logic. [mchehab: rebased to not require moving the code to a separate file] Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 21 +++++--- drivers/gpu/drm/i915/gt/intel_gt.c | 53 ++++++++++++++----- drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++++- drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 ++- drivers/gpu/drm/i915/i915_vma.c | 33 +++++++++--- drivers/gpu/drm/i915/i915_vma.h | 1 + drivers/gpu/drm/i915/i915_vma_resource.c | 5 +- drivers/gpu/drm/i915/i915_vma_resource.h | 6 ++- 10 files changed, 125 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 5cf36a130061..9f6b14ec189a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -335,7 +335,6 @@ struct drm_i915_gem_object { #define I915_BO_READONLY BIT(7) #define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */ #define I915_BO_PROTECTED BIT(9) -#define I915_BO_WAS_BOUND_BIT 10 /** * @mem_flags - Mutable placement-related flags * @@ -616,6 +615,8 @@ struct drm_i915_gem_object { * pages were last acquired. */ bool dirty:1; + + u32 tlb; } mm; struct { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 6835279943df..8357dbdcab5c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr) vunmap(ptr); } +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_gt *gt = to_gt(i915); + + if (!obj->mm.tlb) + return; + + intel_gt_invalidate_tlb(gt, obj->mm.tlb); + obj->mm.tlb = 0; +} + struct sg_table * __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) { @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) __i915_gem_object_reset_page_iter(obj); obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0; - if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { - struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct intel_gt *gt = to_gt(i915); - intel_wakeref_t wakeref; - - with_intel_gt_pm_if_awake(gt, wakeref) - intel_gt_invalidate_tlbs(gt); - } + flush_tlb_invalidate(obj); return pages; } diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 5c55a90672f4..f435e06125aa 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt) { spin_lock_init(>->irq_lock); - mutex_init(>->tlb_invalidate_lock); - INIT_LIST_HEAD(>->closed_vma); spin_lock_init(>->closed_lock); @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt) intel_gt_init_reset(gt); intel_gt_init_requests(gt); intel_gt_init_timelines(gt); + mutex_init(>->tlb.invalidate_lock); + seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); intel_gt_pm_init_early(gt); intel_uc_init_early(>->uc); @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) intel_gt_fini_requests(gt); intel_gt_fini_reset(gt); intel_gt_fini_timelines(gt); + mutex_destroy(>->tlb.invalidate_lock); intel_engines_free(gt); } } @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, return rb; } -void intel_gt_invalidate_tlbs(struct intel_gt *gt) +static void mmio_invalidate_full(struct intel_gt *gt) { static const i915_reg_t gen8_regs[] = { [RENDER_CLASS] = GEN8_RTCR, @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) const i915_reg_t *regs; unsigned int num = 0; - if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) - return; - - if (intel_gt_is_wedged(gt)) - return; - if (GRAPHICS_VER(i915) == 12) { regs = gen12_regs; num = ARRAY_SIZE(gen12_regs); @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) "Platform does not implement TLB invalidation!")) return; - GEM_TRACE("\n"); - - mutex_lock(>->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) awake |= engine->mask; } + GT_TRACE(gt, "invalidated engines %08x\n", awake); + /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ if (awake && (IS_TIGERLAKE(i915) || @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) * transitions. */ intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); - mutex_unlock(>->tlb_invalidate_lock); +} + +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) +{ + u32 cur = intel_gt_tlb_seqno(gt); + + /* Only skip if a *full* TLB invalidate barrier has passed */ + return (s32)(cur - ALIGN(seqno, 2)) > 0; +} + +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) +{ + intel_wakeref_t wakeref; + + if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) + return; + + if (intel_gt_is_wedged(gt)) + return; + + if (tlb_seqno_passed(gt, seqno)) + return; + + with_intel_gt_pm_if_awake(gt, wakeref) { + mutex_lock(>->tlb.invalidate_lock); + if (tlb_seqno_passed(gt, seqno)) + goto unlock; + + mmio_invalidate_full(gt); + + write_seqcount_invalidate(>->tlb.seqno); +unlock: + mutex_unlock(>->tlb.invalidate_lock); + } } diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h index 82d6f248d876..40b06adf509a 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.h +++ b/drivers/gpu/drm/i915/gt/intel_gt.h @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info, void intel_gt_watchdog_work(struct work_struct *work); -void intel_gt_invalidate_tlbs(struct intel_gt *gt); +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) +{ + return seqprop_sequence(>->tlb.seqno); +} + +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) +{ + return intel_gt_tlb_seqno(gt) | 1; +} + +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); #endif /* __INTEL_GT_H__ */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index df708802889d..3804a583382b 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -11,6 +11,7 @@ #include <linux/llist.h> #include <linux/mutex.h> #include <linux/notifier.h> +#include <linux/seqlock.h> #include <linux/spinlock.h> #include <linux/types.h> #include <linux/workqueue.h> @@ -83,7 +84,22 @@ struct intel_gt { struct intel_uc uc; struct intel_gsc gsc; - struct mutex tlb_invalidate_lock; + struct { + /* Serialize global tlb invalidations */ + struct mutex invalidate_lock; + + /* + * Batch TLB invalidations + * + * After unbinding the PTE, we need to ensure the TLB + * are invalidated prior to releasing the physical pages. + * But we only need one such invalidation for all unbinds, + * so we track how many TLB invalidations have been + * performed since unbind the PTE and only emit an extra + * invalidate if no full barrier has been passed. + */ + seqcount_mutex_t seqno; + } tlb; struct i915_wa_list wa_list; diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index d8b94d638559..2da6c82a8bd2 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm, void ppgtt_unbind_vma(struct i915_address_space *vm, struct i915_vma_resource *vma_res) { - if (vma_res->allocated) - vm->clear_range(vm, vma_res->start, vma_res->vma_size); + if (!vma_res->allocated) + return; + + vm->clear_range(vm, vma_res->start, vma_res->vma_size); + if (vma_res->tlb) + vma_invalidate_tlb(vm, *vma_res->tlb); } static unsigned long pd_count(u64 size, int shift) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index ef3b04c7e153..84a9ccbc5fc5 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma, bind_flags); } - set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags); - atomic_or(bind_flags, &vma->flags); return 0; } @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) return err; } +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) +{ + /* + * Before we release the pages that were bound by this vma, we + * must invalidate all the TLBs that may still have a reference + * back to our physical address. It only needs to be done once, + * so after updating the PTE to point away from the pages, record + * the most recent TLB invalidation seqno, and if we have not yet + * flushed the TLBs upon release, perform a full invalidation. + */ + WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); +} + static void __vma_put_pages(struct i915_vma *vma, unsigned int count) { /* We allocate under vma_get_pages, so beware the shrinker */ @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) vma->vm->skip_pte_rewrite; trace_i915_vma_unbind(vma); - unbind_fence = i915_vma_resource_unbind(vma_res); + if (async) + unbind_fence = i915_vma_resource_unbind(vma_res, + &vma->obj->mm.tlb); + else + unbind_fence = i915_vma_resource_unbind(vma_res, NULL); + vma->resource = NULL; atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) i915_vma_detach(vma); - if (!async && unbind_fence) { - dma_fence_wait(unbind_fence, false); - dma_fence_put(unbind_fence); - unbind_fence = NULL; + if (!async) { + if (unbind_fence) { + dma_fence_wait(unbind_fence, false); + dma_fence_put(unbind_fence); + unbind_fence = NULL; + } + vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb); } /* diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 88ca0bd9c900..5048eed536da 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma, u64 size, u64 alignment, u64 flags); void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); void i915_vma_revoke_mmap(struct i915_vma *vma); +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb); struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); int __i915_vma_unbind(struct i915_vma *vma); int __must_check i915_vma_unbind(struct i915_vma *vma); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 27c55027387a..5a67995ea5fe 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, * Return: A refcounted pointer to a dma-fence that signals when unbinding is * complete. */ -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res) +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, + u32 *tlb) { struct i915_address_space *vm = vma_res->vm; + vma_res->tlb = tlb; + /* Reference for the sw fence */ i915_vma_resource_get(vma_res); diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h index 5d8427caa2ba..06923d1816e7 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.h +++ b/drivers/gpu/drm/i915/i915_vma_resource.h @@ -67,6 +67,7 @@ struct i915_page_sizes { * taken when the unbind is scheduled. * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting * needs to be skipped for unbind. + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL * * The lifetime of a struct i915_vma_resource is from a binding request to * the actual possible asynchronous unbind has completed. @@ -119,6 +120,8 @@ struct i915_vma_resource { bool immediate_unbind:1; bool needs_wakeref:1; bool skip_pte_rewrite:1; + + u32 *tlb; }; bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void); void i915_vma_resource_free(struct i915_vma_resource *vma_res); -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, + u32 *tlb); void __i915_vma_resource_init(struct i915_vma_resource *vma_res); -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) @ 2022-07-27 14:25 ` Andi Shyti -1 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-27 14:25 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Chris Wilson, Christian König, Michał Winiarski, Thomas Hellström, Andi Shyti, Andrzej Hajda, Ashutosh Dixit, Ayaz A Siddiqui, Casey Bowman, Daniel Vetter, Daniele Ceraolo Spurio, Dave Airlie, David Airlie, Jani Nikula, Joonas Lahtinen, Lucas De Marchi, Maarten Lankhorst, Matt Roper, Matthew Auld, Michael Cheng, Nirmoy Das, Ramalingam C, Rodrigo Vivi, Sumit Semwal, Tomas Winkler, Tvrtko Ursulin, dri-devel, intel-gfx, linaro-mm-sig, linux-kernel, linux-media, stable, Tvrtko Ursulin, Fei Yang Hi Mauro, I think there are still some unanswered questions from Tvrtko on this patch, am I right? Andi On Wed, Jul 27, 2022 at 02:29:55PM +0200, Mauro Carvalho Chehab wrote: > From: Chris Wilson <chris.p.wilson@intel.com> > > Invalidate TLB in batches, in order to reduce performance regressions. > > Currently, every caller performs a full barrier around a TLB > invalidation, ignoring all other invalidations that may have already > removed their PTEs from the cache. As this is a synchronous operation > and can be quite slow, we cause multiple threads to contend on the TLB > invalidate mutex blocking userspace. > > We only need to invalidate the TLB once after replacing our PTE to > ensure that there is no possible continued access to the physical > address before releasing our pages. By tracking a seqno for each full > TLB invalidate we can quickly determine if one has been performed since > rewriting the PTE, and only if necessary trigger one for ourselves. > > That helps to reduce the performance regression introduced by TLB > invalidate logic. > > [mchehab: rebased to not require moving the code to a separate file] > > Cc: stable@vger.kernel.org > Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> > Cc: Fei Yang <fei.yang@intel.com> > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> > --- > > To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. > See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 21 +++++--- > drivers/gpu/drm/i915/gt/intel_gt.c | 53 ++++++++++++++----- > drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++++- > drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++++- > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 ++- > drivers/gpu/drm/i915/i915_vma.c | 33 +++++++++--- > drivers/gpu/drm/i915/i915_vma.h | 1 + > drivers/gpu/drm/i915/i915_vma_resource.c | 5 +- > drivers/gpu/drm/i915/i915_vma_resource.h | 6 ++- > 10 files changed, 125 insertions(+), 35 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > index 5cf36a130061..9f6b14ec189a 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > @@ -335,7 +335,6 @@ struct drm_i915_gem_object { > #define I915_BO_READONLY BIT(7) > #define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */ > #define I915_BO_PROTECTED BIT(9) > -#define I915_BO_WAS_BOUND_BIT 10 > /** > * @mem_flags - Mutable placement-related flags > * > @@ -616,6 +615,8 @@ struct drm_i915_gem_object { > * pages were last acquired. > */ > bool dirty:1; > + > + u32 tlb; > } mm; > > struct { > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > index 6835279943df..8357dbdcab5c 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr) > vunmap(ptr); > } > > +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) > +{ > + struct drm_i915_private *i915 = to_i915(obj->base.dev); > + struct intel_gt *gt = to_gt(i915); > + > + if (!obj->mm.tlb) > + return; > + > + intel_gt_invalidate_tlb(gt, obj->mm.tlb); > + obj->mm.tlb = 0; > +} > + > struct sg_table * > __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) > { > @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) > __i915_gem_object_reset_page_iter(obj); > obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0; > > - if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { > - struct drm_i915_private *i915 = to_i915(obj->base.dev); > - struct intel_gt *gt = to_gt(i915); > - intel_wakeref_t wakeref; > - > - with_intel_gt_pm_if_awake(gt, wakeref) > - intel_gt_invalidate_tlbs(gt); > - } > + flush_tlb_invalidate(obj); > > return pages; > } > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > index 5c55a90672f4..f435e06125aa 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt) > { > spin_lock_init(>->irq_lock); > > - mutex_init(>->tlb_invalidate_lock); > - > INIT_LIST_HEAD(>->closed_vma); > spin_lock_init(>->closed_lock); > > @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt) > intel_gt_init_reset(gt); > intel_gt_init_requests(gt); > intel_gt_init_timelines(gt); > + mutex_init(>->tlb.invalidate_lock); > + seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); > intel_gt_pm_init_early(gt); > > intel_uc_init_early(>->uc); > @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) > intel_gt_fini_requests(gt); > intel_gt_fini_reset(gt); > intel_gt_fini_timelines(gt); > + mutex_destroy(>->tlb.invalidate_lock); > intel_engines_free(gt); > } > } > @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, > return rb; > } > > -void intel_gt_invalidate_tlbs(struct intel_gt *gt) > +static void mmio_invalidate_full(struct intel_gt *gt) > { > static const i915_reg_t gen8_regs[] = { > [RENDER_CLASS] = GEN8_RTCR, > @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > const i915_reg_t *regs; > unsigned int num = 0; > > - if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) > - return; > - > - if (intel_gt_is_wedged(gt)) > - return; > - > if (GRAPHICS_VER(i915) == 12) { > regs = gen12_regs; > num = ARRAY_SIZE(gen12_regs); > @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > "Platform does not implement TLB invalidation!")) > return; > > - GEM_TRACE("\n"); > - > - mutex_lock(>->tlb_invalidate_lock); > intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); > > spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ > @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > awake |= engine->mask; > } > > + GT_TRACE(gt, "invalidated engines %08x\n", awake); > + > /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ > if (awake && > (IS_TIGERLAKE(i915) || > @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > * transitions. > */ > intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); > - mutex_unlock(>->tlb_invalidate_lock); > +} > + > +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) > +{ > + u32 cur = intel_gt_tlb_seqno(gt); > + > + /* Only skip if a *full* TLB invalidate barrier has passed */ > + return (s32)(cur - ALIGN(seqno, 2)) > 0; > +} > + > +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) > +{ > + intel_wakeref_t wakeref; > + > + if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) > + return; > + > + if (intel_gt_is_wedged(gt)) > + return; > + > + if (tlb_seqno_passed(gt, seqno)) > + return; > + > + with_intel_gt_pm_if_awake(gt, wakeref) { > + mutex_lock(>->tlb.invalidate_lock); > + if (tlb_seqno_passed(gt, seqno)) > + goto unlock; > + > + mmio_invalidate_full(gt); > + > + write_seqcount_invalidate(>->tlb.seqno); > +unlock: > + mutex_unlock(>->tlb.invalidate_lock); > + } > } > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h > index 82d6f248d876..40b06adf509a 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h > @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info, > > void intel_gt_watchdog_work(struct work_struct *work); > > -void intel_gt_invalidate_tlbs(struct intel_gt *gt); > +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) > +{ > + return seqprop_sequence(>->tlb.seqno); > +} > + > +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) > +{ > + return intel_gt_tlb_seqno(gt) | 1; > +} > + > +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); > > #endif /* __INTEL_GT_H__ */ > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h > index df708802889d..3804a583382b 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h > @@ -11,6 +11,7 @@ > #include <linux/llist.h> > #include <linux/mutex.h> > #include <linux/notifier.h> > +#include <linux/seqlock.h> > #include <linux/spinlock.h> > #include <linux/types.h> > #include <linux/workqueue.h> > @@ -83,7 +84,22 @@ struct intel_gt { > struct intel_uc uc; > struct intel_gsc gsc; > > - struct mutex tlb_invalidate_lock; > + struct { > + /* Serialize global tlb invalidations */ > + struct mutex invalidate_lock; > + > + /* > + * Batch TLB invalidations > + * > + * After unbinding the PTE, we need to ensure the TLB > + * are invalidated prior to releasing the physical pages. > + * But we only need one such invalidation for all unbinds, > + * so we track how many TLB invalidations have been > + * performed since unbind the PTE and only emit an extra > + * invalidate if no full barrier has been passed. > + */ > + seqcount_mutex_t seqno; > + } tlb; > > struct i915_wa_list wa_list; > > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c > index d8b94d638559..2da6c82a8bd2 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c > @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm, > void ppgtt_unbind_vma(struct i915_address_space *vm, > struct i915_vma_resource *vma_res) > { > - if (vma_res->allocated) > - vm->clear_range(vm, vma_res->start, vma_res->vma_size); > + if (!vma_res->allocated) > + return; > + > + vm->clear_range(vm, vma_res->start, vma_res->vma_size); > + if (vma_res->tlb) > + vma_invalidate_tlb(vm, *vma_res->tlb); > } > > static unsigned long pd_count(u64 size, int shift) > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c > index ef3b04c7e153..84a9ccbc5fc5 100644 > --- a/drivers/gpu/drm/i915/i915_vma.c > +++ b/drivers/gpu/drm/i915/i915_vma.c > @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma, > bind_flags); > } > > - set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags); > - > atomic_or(bind_flags, &vma->flags); > return 0; > } > @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) > return err; > } > > +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) > +{ > + /* > + * Before we release the pages that were bound by this vma, we > + * must invalidate all the TLBs that may still have a reference > + * back to our physical address. It only needs to be done once, > + * so after updating the PTE to point away from the pages, record > + * the most recent TLB invalidation seqno, and if we have not yet > + * flushed the TLBs upon release, perform a full invalidation. > + */ > + WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); > +} > + > static void __vma_put_pages(struct i915_vma *vma, unsigned int count) > { > /* We allocate under vma_get_pages, so beware the shrinker */ > @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) > vma->vm->skip_pte_rewrite; > trace_i915_vma_unbind(vma); > > - unbind_fence = i915_vma_resource_unbind(vma_res); > + if (async) > + unbind_fence = i915_vma_resource_unbind(vma_res, > + &vma->obj->mm.tlb); > + else > + unbind_fence = i915_vma_resource_unbind(vma_res, NULL); > + > vma->resource = NULL; > > atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), > @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) > > i915_vma_detach(vma); > > - if (!async && unbind_fence) { > - dma_fence_wait(unbind_fence, false); > - dma_fence_put(unbind_fence); > - unbind_fence = NULL; > + if (!async) { > + if (unbind_fence) { > + dma_fence_wait(unbind_fence, false); > + dma_fence_put(unbind_fence); > + unbind_fence = NULL; > + } > + vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb); > } > > /* > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index 88ca0bd9c900..5048eed536da 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma, > u64 size, u64 alignment, u64 flags); > void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); > void i915_vma_revoke_mmap(struct i915_vma *vma); > +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb); > struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); > int __i915_vma_unbind(struct i915_vma *vma); > int __must_check i915_vma_unbind(struct i915_vma *vma); > diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c > index 27c55027387a..5a67995ea5fe 100644 > --- a/drivers/gpu/drm/i915/i915_vma_resource.c > +++ b/drivers/gpu/drm/i915/i915_vma_resource.c > @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, > * Return: A refcounted pointer to a dma-fence that signals when unbinding is > * complete. > */ > -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res) > +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, > + u32 *tlb) > { > struct i915_address_space *vm = vma_res->vm; > > + vma_res->tlb = tlb; > + > /* Reference for the sw fence */ > i915_vma_resource_get(vma_res); > > diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h > index 5d8427caa2ba..06923d1816e7 100644 > --- a/drivers/gpu/drm/i915/i915_vma_resource.h > +++ b/drivers/gpu/drm/i915/i915_vma_resource.h > @@ -67,6 +67,7 @@ struct i915_page_sizes { > * taken when the unbind is scheduled. > * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting > * needs to be skipped for unbind. > + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL > * > * The lifetime of a struct i915_vma_resource is from a binding request to > * the actual possible asynchronous unbind has completed. > @@ -119,6 +120,8 @@ struct i915_vma_resource { > bool immediate_unbind:1; > bool needs_wakeref:1; > bool skip_pte_rewrite:1; > + > + u32 *tlb; > }; > > bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, > @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void); > > void i915_vma_resource_free(struct i915_vma_resource *vma_res); > > -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); > +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, > + u32 *tlb); > > void __i915_vma_resource_init(struct i915_vma_resource *vma_res); > > -- > 2.36.1 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations @ 2022-07-27 14:25 ` Andi Shyti 0 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-27 14:25 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: David Airlie, dri-devel, Andrzej Hajda, Sumit Semwal, Michael Cheng, Chris Wilson, Dave Airlie, Tomas Winkler, Matthew Auld, Thomas Hellström, Lucas De Marchi, intel-gfx, linaro-mm-sig, Rodrigo Vivi, Michał Winiarski, linux-kernel, stable, linux-media, Christian König Hi Mauro, I think there are still some unanswered questions from Tvrtko on this patch, am I right? Andi On Wed, Jul 27, 2022 at 02:29:55PM +0200, Mauro Carvalho Chehab wrote: > From: Chris Wilson <chris.p.wilson@intel.com> > > Invalidate TLB in batches, in order to reduce performance regressions. > > Currently, every caller performs a full barrier around a TLB > invalidation, ignoring all other invalidations that may have already > removed their PTEs from the cache. As this is a synchronous operation > and can be quite slow, we cause multiple threads to contend on the TLB > invalidate mutex blocking userspace. > > We only need to invalidate the TLB once after replacing our PTE to > ensure that there is no possible continued access to the physical > address before releasing our pages. By tracking a seqno for each full > TLB invalidate we can quickly determine if one has been performed since > rewriting the PTE, and only if necessary trigger one for ourselves. > > That helps to reduce the performance regression introduced by TLB > invalidate logic. > > [mchehab: rebased to not require moving the code to a separate file] > > Cc: stable@vger.kernel.org > Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> > Cc: Fei Yang <fei.yang@intel.com> > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> > --- > > To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. > See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 21 +++++--- > drivers/gpu/drm/i915/gt/intel_gt.c | 53 ++++++++++++++----- > drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++++- > drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++++- > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 ++- > drivers/gpu/drm/i915/i915_vma.c | 33 +++++++++--- > drivers/gpu/drm/i915/i915_vma.h | 1 + > drivers/gpu/drm/i915/i915_vma_resource.c | 5 +- > drivers/gpu/drm/i915/i915_vma_resource.h | 6 ++- > 10 files changed, 125 insertions(+), 35 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > index 5cf36a130061..9f6b14ec189a 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > @@ -335,7 +335,6 @@ struct drm_i915_gem_object { > #define I915_BO_READONLY BIT(7) > #define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */ > #define I915_BO_PROTECTED BIT(9) > -#define I915_BO_WAS_BOUND_BIT 10 > /** > * @mem_flags - Mutable placement-related flags > * > @@ -616,6 +615,8 @@ struct drm_i915_gem_object { > * pages were last acquired. > */ > bool dirty:1; > + > + u32 tlb; > } mm; > > struct { > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > index 6835279943df..8357dbdcab5c 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr) > vunmap(ptr); > } > > +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) > +{ > + struct drm_i915_private *i915 = to_i915(obj->base.dev); > + struct intel_gt *gt = to_gt(i915); > + > + if (!obj->mm.tlb) > + return; > + > + intel_gt_invalidate_tlb(gt, obj->mm.tlb); > + obj->mm.tlb = 0; > +} > + > struct sg_table * > __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) > { > @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) > __i915_gem_object_reset_page_iter(obj); > obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0; > > - if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { > - struct drm_i915_private *i915 = to_i915(obj->base.dev); > - struct intel_gt *gt = to_gt(i915); > - intel_wakeref_t wakeref; > - > - with_intel_gt_pm_if_awake(gt, wakeref) > - intel_gt_invalidate_tlbs(gt); > - } > + flush_tlb_invalidate(obj); > > return pages; > } > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > index 5c55a90672f4..f435e06125aa 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt) > { > spin_lock_init(>->irq_lock); > > - mutex_init(>->tlb_invalidate_lock); > - > INIT_LIST_HEAD(>->closed_vma); > spin_lock_init(>->closed_lock); > > @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt) > intel_gt_init_reset(gt); > intel_gt_init_requests(gt); > intel_gt_init_timelines(gt); > + mutex_init(>->tlb.invalidate_lock); > + seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); > intel_gt_pm_init_early(gt); > > intel_uc_init_early(>->uc); > @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) > intel_gt_fini_requests(gt); > intel_gt_fini_reset(gt); > intel_gt_fini_timelines(gt); > + mutex_destroy(>->tlb.invalidate_lock); > intel_engines_free(gt); > } > } > @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, > return rb; > } > > -void intel_gt_invalidate_tlbs(struct intel_gt *gt) > +static void mmio_invalidate_full(struct intel_gt *gt) > { > static const i915_reg_t gen8_regs[] = { > [RENDER_CLASS] = GEN8_RTCR, > @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > const i915_reg_t *regs; > unsigned int num = 0; > > - if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) > - return; > - > - if (intel_gt_is_wedged(gt)) > - return; > - > if (GRAPHICS_VER(i915) == 12) { > regs = gen12_regs; > num = ARRAY_SIZE(gen12_regs); > @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > "Platform does not implement TLB invalidation!")) > return; > > - GEM_TRACE("\n"); > - > - mutex_lock(>->tlb_invalidate_lock); > intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); > > spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ > @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > awake |= engine->mask; > } > > + GT_TRACE(gt, "invalidated engines %08x\n", awake); > + > /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ > if (awake && > (IS_TIGERLAKE(i915) || > @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > * transitions. > */ > intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); > - mutex_unlock(>->tlb_invalidate_lock); > +} > + > +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) > +{ > + u32 cur = intel_gt_tlb_seqno(gt); > + > + /* Only skip if a *full* TLB invalidate barrier has passed */ > + return (s32)(cur - ALIGN(seqno, 2)) > 0; > +} > + > +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) > +{ > + intel_wakeref_t wakeref; > + > + if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) > + return; > + > + if (intel_gt_is_wedged(gt)) > + return; > + > + if (tlb_seqno_passed(gt, seqno)) > + return; > + > + with_intel_gt_pm_if_awake(gt, wakeref) { > + mutex_lock(>->tlb.invalidate_lock); > + if (tlb_seqno_passed(gt, seqno)) > + goto unlock; > + > + mmio_invalidate_full(gt); > + > + write_seqcount_invalidate(>->tlb.seqno); > +unlock: > + mutex_unlock(>->tlb.invalidate_lock); > + } > } > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h > index 82d6f248d876..40b06adf509a 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h > @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info, > > void intel_gt_watchdog_work(struct work_struct *work); > > -void intel_gt_invalidate_tlbs(struct intel_gt *gt); > +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) > +{ > + return seqprop_sequence(>->tlb.seqno); > +} > + > +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) > +{ > + return intel_gt_tlb_seqno(gt) | 1; > +} > + > +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); > > #endif /* __INTEL_GT_H__ */ > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h > index df708802889d..3804a583382b 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h > @@ -11,6 +11,7 @@ > #include <linux/llist.h> > #include <linux/mutex.h> > #include <linux/notifier.h> > +#include <linux/seqlock.h> > #include <linux/spinlock.h> > #include <linux/types.h> > #include <linux/workqueue.h> > @@ -83,7 +84,22 @@ struct intel_gt { > struct intel_uc uc; > struct intel_gsc gsc; > > - struct mutex tlb_invalidate_lock; > + struct { > + /* Serialize global tlb invalidations */ > + struct mutex invalidate_lock; > + > + /* > + * Batch TLB invalidations > + * > + * After unbinding the PTE, we need to ensure the TLB > + * are invalidated prior to releasing the physical pages. > + * But we only need one such invalidation for all unbinds, > + * so we track how many TLB invalidations have been > + * performed since unbind the PTE and only emit an extra > + * invalidate if no full barrier has been passed. > + */ > + seqcount_mutex_t seqno; > + } tlb; > > struct i915_wa_list wa_list; > > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c > index d8b94d638559..2da6c82a8bd2 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c > @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm, > void ppgtt_unbind_vma(struct i915_address_space *vm, > struct i915_vma_resource *vma_res) > { > - if (vma_res->allocated) > - vm->clear_range(vm, vma_res->start, vma_res->vma_size); > + if (!vma_res->allocated) > + return; > + > + vm->clear_range(vm, vma_res->start, vma_res->vma_size); > + if (vma_res->tlb) > + vma_invalidate_tlb(vm, *vma_res->tlb); > } > > static unsigned long pd_count(u64 size, int shift) > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c > index ef3b04c7e153..84a9ccbc5fc5 100644 > --- a/drivers/gpu/drm/i915/i915_vma.c > +++ b/drivers/gpu/drm/i915/i915_vma.c > @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma, > bind_flags); > } > > - set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags); > - > atomic_or(bind_flags, &vma->flags); > return 0; > } > @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) > return err; > } > > +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) > +{ > + /* > + * Before we release the pages that were bound by this vma, we > + * must invalidate all the TLBs that may still have a reference > + * back to our physical address. It only needs to be done once, > + * so after updating the PTE to point away from the pages, record > + * the most recent TLB invalidation seqno, and if we have not yet > + * flushed the TLBs upon release, perform a full invalidation. > + */ > + WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); > +} > + > static void __vma_put_pages(struct i915_vma *vma, unsigned int count) > { > /* We allocate under vma_get_pages, so beware the shrinker */ > @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) > vma->vm->skip_pte_rewrite; > trace_i915_vma_unbind(vma); > > - unbind_fence = i915_vma_resource_unbind(vma_res); > + if (async) > + unbind_fence = i915_vma_resource_unbind(vma_res, > + &vma->obj->mm.tlb); > + else > + unbind_fence = i915_vma_resource_unbind(vma_res, NULL); > + > vma->resource = NULL; > > atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), > @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) > > i915_vma_detach(vma); > > - if (!async && unbind_fence) { > - dma_fence_wait(unbind_fence, false); > - dma_fence_put(unbind_fence); > - unbind_fence = NULL; > + if (!async) { > + if (unbind_fence) { > + dma_fence_wait(unbind_fence, false); > + dma_fence_put(unbind_fence); > + unbind_fence = NULL; > + } > + vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb); > } > > /* > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index 88ca0bd9c900..5048eed536da 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma, > u64 size, u64 alignment, u64 flags); > void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); > void i915_vma_revoke_mmap(struct i915_vma *vma); > +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb); > struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); > int __i915_vma_unbind(struct i915_vma *vma); > int __must_check i915_vma_unbind(struct i915_vma *vma); > diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c > index 27c55027387a..5a67995ea5fe 100644 > --- a/drivers/gpu/drm/i915/i915_vma_resource.c > +++ b/drivers/gpu/drm/i915/i915_vma_resource.c > @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, > * Return: A refcounted pointer to a dma-fence that signals when unbinding is > * complete. > */ > -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res) > +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, > + u32 *tlb) > { > struct i915_address_space *vm = vma_res->vm; > > + vma_res->tlb = tlb; > + > /* Reference for the sw fence */ > i915_vma_resource_get(vma_res); > > diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h > index 5d8427caa2ba..06923d1816e7 100644 > --- a/drivers/gpu/drm/i915/i915_vma_resource.h > +++ b/drivers/gpu/drm/i915/i915_vma_resource.h > @@ -67,6 +67,7 @@ struct i915_page_sizes { > * taken when the unbind is scheduled. > * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting > * needs to be skipped for unbind. > + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL > * > * The lifetime of a struct i915_vma_resource is from a binding request to > * the actual possible asynchronous unbind has completed. > @@ -119,6 +120,8 @@ struct i915_vma_resource { > bool immediate_unbind:1; > bool needs_wakeref:1; > bool skip_pte_rewrite:1; > + > + u32 *tlb; > }; > > bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, > @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void); > > void i915_vma_resource_free(struct i915_vma_resource *vma_res); > > -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); > +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, > + u32 *tlb); > > void __i915_vma_resource_init(struct i915_vma_resource *vma_res); > > -- > 2.36.1 ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations @ 2022-07-27 14:25 ` Andi Shyti 0 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-27 14:25 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Fei Yang, David Airlie, dri-devel, Daniele Ceraolo Spurio, Andrzej Hajda, Sumit Semwal, Michael Cheng, Chris Wilson, Ayaz A Siddiqui, Andi Shyti, Dave Airlie, Tomas Winkler, Matthew Auld, Thomas Hellström, Casey Bowman, Lucas De Marchi, intel-gfx, linaro-mm-sig, Nirmoy Das, Rodrigo Vivi, Tvrtko Ursulin, Michał Winiarski, Tvrtko Ursulin, linux-kernel, stable, Ashutosh Dixit, linux-media, Christian König Hi Mauro, I think there are still some unanswered questions from Tvrtko on this patch, am I right? Andi On Wed, Jul 27, 2022 at 02:29:55PM +0200, Mauro Carvalho Chehab wrote: > From: Chris Wilson <chris.p.wilson@intel.com> > > Invalidate TLB in batches, in order to reduce performance regressions. > > Currently, every caller performs a full barrier around a TLB > invalidation, ignoring all other invalidations that may have already > removed their PTEs from the cache. As this is a synchronous operation > and can be quite slow, we cause multiple threads to contend on the TLB > invalidate mutex blocking userspace. > > We only need to invalidate the TLB once after replacing our PTE to > ensure that there is no possible continued access to the physical > address before releasing our pages. By tracking a seqno for each full > TLB invalidate we can quickly determine if one has been performed since > rewriting the PTE, and only if necessary trigger one for ourselves. > > That helps to reduce the performance regression introduced by TLB > invalidate logic. > > [mchehab: rebased to not require moving the code to a separate file] > > Cc: stable@vger.kernel.org > Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") > Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Signed-off-by: Chris Wilson <chris.p.wilson@intel.com> > Cc: Fei Yang <fei.yang@intel.com> > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> > --- > > To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. > See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 21 +++++--- > drivers/gpu/drm/i915/gt/intel_gt.c | 53 ++++++++++++++----- > drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++++- > drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++++- > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 ++- > drivers/gpu/drm/i915/i915_vma.c | 33 +++++++++--- > drivers/gpu/drm/i915/i915_vma.h | 1 + > drivers/gpu/drm/i915/i915_vma_resource.c | 5 +- > drivers/gpu/drm/i915/i915_vma_resource.h | 6 ++- > 10 files changed, 125 insertions(+), 35 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > index 5cf36a130061..9f6b14ec189a 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > @@ -335,7 +335,6 @@ struct drm_i915_gem_object { > #define I915_BO_READONLY BIT(7) > #define I915_TILING_QUIRK_BIT 8 /* unknown swizzling; do not release! */ > #define I915_BO_PROTECTED BIT(9) > -#define I915_BO_WAS_BOUND_BIT 10 > /** > * @mem_flags - Mutable placement-related flags > * > @@ -616,6 +615,8 @@ struct drm_i915_gem_object { > * pages were last acquired. > */ > bool dirty:1; > + > + u32 tlb; > } mm; > > struct { > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > index 6835279943df..8357dbdcab5c 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > @@ -191,6 +191,18 @@ static void unmap_object(struct drm_i915_gem_object *obj, void *ptr) > vunmap(ptr); > } > > +static void flush_tlb_invalidate(struct drm_i915_gem_object *obj) > +{ > + struct drm_i915_private *i915 = to_i915(obj->base.dev); > + struct intel_gt *gt = to_gt(i915); > + > + if (!obj->mm.tlb) > + return; > + > + intel_gt_invalidate_tlb(gt, obj->mm.tlb); > + obj->mm.tlb = 0; > +} > + > struct sg_table * > __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) > { > @@ -216,14 +228,7 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) > __i915_gem_object_reset_page_iter(obj); > obj->mm.page_sizes.phys = obj->mm.page_sizes.sg = 0; > > - if (test_and_clear_bit(I915_BO_WAS_BOUND_BIT, &obj->flags)) { > - struct drm_i915_private *i915 = to_i915(obj->base.dev); > - struct intel_gt *gt = to_gt(i915); > - intel_wakeref_t wakeref; > - > - with_intel_gt_pm_if_awake(gt, wakeref) > - intel_gt_invalidate_tlbs(gt); > - } > + flush_tlb_invalidate(obj); > > return pages; > } > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > index 5c55a90672f4..f435e06125aa 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > @@ -38,8 +38,6 @@ static void __intel_gt_init_early(struct intel_gt *gt) > { > spin_lock_init(>->irq_lock); > > - mutex_init(>->tlb_invalidate_lock); > - > INIT_LIST_HEAD(>->closed_vma); > spin_lock_init(>->closed_lock); > > @@ -50,6 +48,8 @@ static void __intel_gt_init_early(struct intel_gt *gt) > intel_gt_init_reset(gt); > intel_gt_init_requests(gt); > intel_gt_init_timelines(gt); > + mutex_init(>->tlb.invalidate_lock); > + seqcount_mutex_init(>->tlb.seqno, >->tlb.invalidate_lock); > intel_gt_pm_init_early(gt); > > intel_uc_init_early(>->uc); > @@ -770,6 +770,7 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915) > intel_gt_fini_requests(gt); > intel_gt_fini_reset(gt); > intel_gt_fini_timelines(gt); > + mutex_destroy(>->tlb.invalidate_lock); > intel_engines_free(gt); > } > } > @@ -908,7 +909,7 @@ get_reg_and_bit(const struct intel_engine_cs *engine, const bool gen8, > return rb; > } > > -void intel_gt_invalidate_tlbs(struct intel_gt *gt) > +static void mmio_invalidate_full(struct intel_gt *gt) > { > static const i915_reg_t gen8_regs[] = { > [RENDER_CLASS] = GEN8_RTCR, > @@ -931,12 +932,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > const i915_reg_t *regs; > unsigned int num = 0; > > - if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) > - return; > - > - if (intel_gt_is_wedged(gt)) > - return; > - > if (GRAPHICS_VER(i915) == 12) { > regs = gen12_regs; > num = ARRAY_SIZE(gen12_regs); > @@ -951,9 +946,6 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > "Platform does not implement TLB invalidation!")) > return; > > - GEM_TRACE("\n"); > - > - mutex_lock(>->tlb_invalidate_lock); > intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL); > > spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ > @@ -973,6 +965,8 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > awake |= engine->mask; > } > > + GT_TRACE(gt, "invalidated engines %08x\n", awake); > + > /* Wa_2207587034:tgl,dg1,rkl,adl-s,adl-p */ > if (awake && > (IS_TIGERLAKE(i915) || > @@ -1012,5 +1006,38 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > * transitions. > */ > intel_uncore_forcewake_put_delayed(uncore, FORCEWAKE_ALL); > - mutex_unlock(>->tlb_invalidate_lock); > +} > + > +static bool tlb_seqno_passed(const struct intel_gt *gt, u32 seqno) > +{ > + u32 cur = intel_gt_tlb_seqno(gt); > + > + /* Only skip if a *full* TLB invalidate barrier has passed */ > + return (s32)(cur - ALIGN(seqno, 2)) > 0; > +} > + > +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno) > +{ > + intel_wakeref_t wakeref; > + > + if (I915_SELFTEST_ONLY(gt->awake == -ENODEV)) > + return; > + > + if (intel_gt_is_wedged(gt)) > + return; > + > + if (tlb_seqno_passed(gt, seqno)) > + return; > + > + with_intel_gt_pm_if_awake(gt, wakeref) { > + mutex_lock(>->tlb.invalidate_lock); > + if (tlb_seqno_passed(gt, seqno)) > + goto unlock; > + > + mmio_invalidate_full(gt); > + > + write_seqcount_invalidate(>->tlb.seqno); > +unlock: > + mutex_unlock(>->tlb.invalidate_lock); > + } > } > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h > index 82d6f248d876..40b06adf509a 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h > @@ -101,6 +101,16 @@ void intel_gt_info_print(const struct intel_gt_info *info, > > void intel_gt_watchdog_work(struct work_struct *work); > > -void intel_gt_invalidate_tlbs(struct intel_gt *gt); > +static inline u32 intel_gt_tlb_seqno(const struct intel_gt *gt) > +{ > + return seqprop_sequence(>->tlb.seqno); > +} > + > +static inline u32 intel_gt_next_invalidate_tlb_full(const struct intel_gt *gt) > +{ > + return intel_gt_tlb_seqno(gt) | 1; > +} > + > +void intel_gt_invalidate_tlb(struct intel_gt *gt, u32 seqno); > > #endif /* __INTEL_GT_H__ */ > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h > index df708802889d..3804a583382b 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h > @@ -11,6 +11,7 @@ > #include <linux/llist.h> > #include <linux/mutex.h> > #include <linux/notifier.h> > +#include <linux/seqlock.h> > #include <linux/spinlock.h> > #include <linux/types.h> > #include <linux/workqueue.h> > @@ -83,7 +84,22 @@ struct intel_gt { > struct intel_uc uc; > struct intel_gsc gsc; > > - struct mutex tlb_invalidate_lock; > + struct { > + /* Serialize global tlb invalidations */ > + struct mutex invalidate_lock; > + > + /* > + * Batch TLB invalidations > + * > + * After unbinding the PTE, we need to ensure the TLB > + * are invalidated prior to releasing the physical pages. > + * But we only need one such invalidation for all unbinds, > + * so we track how many TLB invalidations have been > + * performed since unbind the PTE and only emit an extra > + * invalidate if no full barrier has been passed. > + */ > + seqcount_mutex_t seqno; > + } tlb; > > struct i915_wa_list wa_list; > > diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c > index d8b94d638559..2da6c82a8bd2 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c > +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c > @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm, > void ppgtt_unbind_vma(struct i915_address_space *vm, > struct i915_vma_resource *vma_res) > { > - if (vma_res->allocated) > - vm->clear_range(vm, vma_res->start, vma_res->vma_size); > + if (!vma_res->allocated) > + return; > + > + vm->clear_range(vm, vma_res->start, vma_res->vma_size); > + if (vma_res->tlb) > + vma_invalidate_tlb(vm, *vma_res->tlb); > } > > static unsigned long pd_count(u64 size, int shift) > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c > index ef3b04c7e153..84a9ccbc5fc5 100644 > --- a/drivers/gpu/drm/i915/i915_vma.c > +++ b/drivers/gpu/drm/i915/i915_vma.c > @@ -538,8 +538,6 @@ int i915_vma_bind(struct i915_vma *vma, > bind_flags); > } > > - set_bit(I915_BO_WAS_BOUND_BIT, &vma->obj->flags); > - > atomic_or(bind_flags, &vma->flags); > return 0; > } > @@ -1310,6 +1308,19 @@ I915_SELFTEST_EXPORT int i915_vma_get_pages(struct i915_vma *vma) > return err; > } > > +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb) > +{ > + /* > + * Before we release the pages that were bound by this vma, we > + * must invalidate all the TLBs that may still have a reference > + * back to our physical address. It only needs to be done once, > + * so after updating the PTE to point away from the pages, record > + * the most recent TLB invalidation seqno, and if we have not yet > + * flushed the TLBs upon release, perform a full invalidation. > + */ > + WRITE_ONCE(tlb, intel_gt_next_invalidate_tlb_full(vm->gt)); > +} > + > static void __vma_put_pages(struct i915_vma *vma, unsigned int count) > { > /* We allocate under vma_get_pages, so beware the shrinker */ > @@ -1941,7 +1952,12 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) > vma->vm->skip_pte_rewrite; > trace_i915_vma_unbind(vma); > > - unbind_fence = i915_vma_resource_unbind(vma_res); > + if (async) > + unbind_fence = i915_vma_resource_unbind(vma_res, > + &vma->obj->mm.tlb); > + else > + unbind_fence = i915_vma_resource_unbind(vma_res, NULL); > + > vma->resource = NULL; > > atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE), > @@ -1949,10 +1965,13 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) > > i915_vma_detach(vma); > > - if (!async && unbind_fence) { > - dma_fence_wait(unbind_fence, false); > - dma_fence_put(unbind_fence); > - unbind_fence = NULL; > + if (!async) { > + if (unbind_fence) { > + dma_fence_wait(unbind_fence, false); > + dma_fence_put(unbind_fence); > + unbind_fence = NULL; > + } > + vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb); > } > > /* > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index 88ca0bd9c900..5048eed536da 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -213,6 +213,7 @@ bool i915_vma_misplaced(const struct i915_vma *vma, > u64 size, u64 alignment, u64 flags); > void __i915_vma_set_map_and_fenceable(struct i915_vma *vma); > void i915_vma_revoke_mmap(struct i915_vma *vma); > +void vma_invalidate_tlb(struct i915_address_space *vm, u32 tlb); > struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async); > int __i915_vma_unbind(struct i915_vma *vma); > int __must_check i915_vma_unbind(struct i915_vma *vma); > diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c > index 27c55027387a..5a67995ea5fe 100644 > --- a/drivers/gpu/drm/i915/i915_vma_resource.c > +++ b/drivers/gpu/drm/i915/i915_vma_resource.c > @@ -223,10 +223,13 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, > * Return: A refcounted pointer to a dma-fence that signals when unbinding is > * complete. > */ > -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res) > +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, > + u32 *tlb) > { > struct i915_address_space *vm = vma_res->vm; > > + vma_res->tlb = tlb; > + > /* Reference for the sw fence */ > i915_vma_resource_get(vma_res); > > diff --git a/drivers/gpu/drm/i915/i915_vma_resource.h b/drivers/gpu/drm/i915/i915_vma_resource.h > index 5d8427caa2ba..06923d1816e7 100644 > --- a/drivers/gpu/drm/i915/i915_vma_resource.h > +++ b/drivers/gpu/drm/i915/i915_vma_resource.h > @@ -67,6 +67,7 @@ struct i915_page_sizes { > * taken when the unbind is scheduled. > * @skip_pte_rewrite: During ggtt suspend and vm takedown pte rewriting > * needs to be skipped for unbind. > + * @tlb: pointer for obj->mm.tlb, if async unbind. Otherwise, NULL > * > * The lifetime of a struct i915_vma_resource is from a binding request to > * the actual possible asynchronous unbind has completed. > @@ -119,6 +120,8 @@ struct i915_vma_resource { > bool immediate_unbind:1; > bool needs_wakeref:1; > bool skip_pte_rewrite:1; > + > + u32 *tlb; > }; > > bool i915_vma_resource_hold(struct i915_vma_resource *vma_res, > @@ -131,7 +134,8 @@ struct i915_vma_resource *i915_vma_resource_alloc(void); > > void i915_vma_resource_free(struct i915_vma_resource *vma_res); > > -struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res); > +struct dma_fence *i915_vma_resource_unbind(struct i915_vma_resource *vma_res, > + u32 *tlb); > > void __i915_vma_resource_init(struct i915_vma_resource *vma_res); > > -- > 2.36.1 ^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) @ 2022-07-27 12:29 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Mauro Carvalho Chehab, Daniel Vetter, David Airlie, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, Tvrtko Ursulin, dri-devel, intel-gfx, linux-kernel TLB cache invalidation can happen on two different situations: 1. synchronously, at __vma_put_pages(); 2. asynchronously. On the first case, TLB cache invalidation happens inside __vma_put_pages(). So, no need to do it later on. However, on the second case, the pages will keep in memory until __i915_vma_evict() is called. So, we need to store the TLB data at struct i915_vma_resource, in order to do a TLB cache invalidation before allowing userspace to re-use the same memory. So, i915_vma_resource_unbind() has gained a new parameter in order to store the TLB data at the second case. Document it. Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 5a67995ea5fe..4fe09ea0a825 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, /** * i915_vma_resource_unbind - Unbind a vma resource * @vma_res: The vma resource to unbind. + * @tlb: pointer to vma->obj->mm.tlb associated with the resource + * to be stored at vma_res->tlb. When not-NULL, it will be used + * to do TLB cache invalidation before freeing a VMA resource. + * used only for async unbind. * * At this point this function does little more than publish a fence that * signals immediately unless signaling is held back. -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [Intel-gfx] [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi, Mauro Carvalho Chehab TLB cache invalidation can happen on two different situations: 1. synchronously, at __vma_put_pages(); 2. asynchronously. On the first case, TLB cache invalidation happens inside __vma_put_pages(). So, no need to do it later on. However, on the second case, the pages will keep in memory until __i915_vma_evict() is called. So, we need to store the TLB data at struct i915_vma_resource, in order to do a TLB cache invalidation before allowing userspace to re-use the same memory. So, i915_vma_resource_unbind() has gained a new parameter in order to store the TLB data at the second case. Document it. Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 5a67995ea5fe..4fe09ea0a825 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, /** * i915_vma_resource_unbind - Unbind a vma resource * @vma_res: The vma resource to unbind. + * @tlb: pointer to vma->obj->mm.tlb associated with the resource + * to be stored at vma_res->tlb. When not-NULL, it will be used + * to do TLB cache invalidation before freeing a VMA resource. + * used only for async unbind. * * At this point this function does little more than publish a fence that * signals immediately unless signaling is held back. -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource @ 2022-07-27 12:29 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-27 12:29 UTC (permalink / raw) Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi, Mauro Carvalho Chehab TLB cache invalidation can happen on two different situations: 1. synchronously, at __vma_put_pages(); 2. asynchronously. On the first case, TLB cache invalidation happens inside __vma_put_pages(). So, no need to do it later on. However, on the second case, the pages will keep in memory until __i915_vma_evict() is called. So, we need to store the TLB data at struct i915_vma_resource, in order to do a TLB cache invalidation before allowing userspace to re-use the same memory. So, i915_vma_resource_unbind() has gained a new parameter in order to store the TLB data at the second case. Document it. Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> --- To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c index 5a67995ea5fe..4fe09ea0a825 100644 --- a/drivers/gpu/drm/i915/i915_vma_resource.c +++ b/drivers/gpu/drm/i915/i915_vma_resource.c @@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, /** * i915_vma_resource_unbind - Unbind a vma resource * @vma_res: The vma resource to unbind. + * @tlb: pointer to vma->obj->mm.tlb associated with the resource + * to be stored at vma_res->tlb. When not-NULL, it will be used + * to do TLB cache invalidation before freeing a VMA resource. + * used only for async unbind. * * At this point this function does little more than publish a fence that * signals immediately unless signaling is held back. -- 2.36.1 ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) (?) @ 2022-07-27 14:20 ` Andi Shyti -1 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-27 14:20 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Rodrigo Vivi Hi Mauro, > TLB cache invalidation can happen on two different situations: > > 1. synchronously, at __vma_put_pages(); > 2. asynchronously. > > On the first case, TLB cache invalidation happens inside > __vma_put_pages(). So, no need to do it later on. > > However, on the second case, the pages will keep in memory > until __i915_vma_evict() is called. > > So, we need to store the TLB data at struct i915_vma_resource, > in order to do a TLB cache invalidation before allowing > userspace to re-use the same memory. > > So, i915_vma_resource_unbind() has gained a new parameter > in order to store the TLB data at the second case. > > Document it. > > Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> > --- > > To avoid mailbombing on a large number of people, only mailing lists were C/C on the cover. > See [PATCH v3 0/6] at: https://lore.kernel.org/all/cover.1658924372.git.mchehab@kernel.org/ > > drivers/gpu/drm/i915/i915_vma_resource.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_vma_resource.c b/drivers/gpu/drm/i915/i915_vma_resource.c > index 5a67995ea5fe..4fe09ea0a825 100644 > --- a/drivers/gpu/drm/i915/i915_vma_resource.c > +++ b/drivers/gpu/drm/i915/i915_vma_resource.c > @@ -216,6 +216,10 @@ i915_vma_resource_fence_notify(struct i915_sw_fence *fence, > /** > * i915_vma_resource_unbind - Unbind a vma resource > * @vma_res: The vma resource to unbind. > + * @tlb: pointer to vma->obj->mm.tlb associated with the resource > + * to be stored at vma_res->tlb. When not-NULL, it will be used > + * to do TLB cache invalidation before freeing a VMA resource. > + * used only for async unbind. /used/Used/ With that: Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Thanks, Andi ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: reduce TLB performance regressions 2022-07-27 12:29 ` Mauro Carvalho Chehab ` (7 preceding siblings ...) (?) @ 2022-07-27 13:12 ` Patchwork -1 siblings, 0 replies; 35+ messages in thread From: Patchwork @ 2022-07-27 13:12 UTC (permalink / raw) To: Mauro Carvalho Chehab; +Cc: intel-gfx == Series Details == Series: drm/i915: reduce TLB performance regressions URL : https://patchwork.freedesktop.org/series/106758/ State : warning == Summary == Error: dim checkpatch failed 735755e9d5d5 drm/i915/gt: Ignore TLB invalidations on idle engines -:138: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'gt' - possible side-effects? #138: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:58: +#define with_intel_gt_pm_if_awake(gt, wf) \ + for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0) -:138: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'wf' - possible side-effects? #138: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:58: +#define with_intel_gt_pm_if_awake(gt, wf) \ + for (wf = intel_gt_pm_get_if_awake(gt); wf; intel_gt_pm_put_async(gt), wf = 0) total: 0 errors, 0 warnings, 2 checks, 99 lines checked 225d8336f971 drm/i915/gt: document with_intel_gt_pm_if_awake() 14c4636e0625 drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations 2bbb43b7f32b drm/i915/gt: Skip TLB invalidations once wedged 0cb17ababb41 drm/i915/gt: Batch TLB invalidations 051e0bf95aa1 drm/i915/gt: describe the new tlb parameter at i915_vma_resource ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: reduce TLB performance regressions 2022-07-27 12:29 ` Mauro Carvalho Chehab ` (8 preceding siblings ...) (?) @ 2022-07-27 13:37 ` Patchwork -1 siblings, 0 replies; 35+ messages in thread From: Patchwork @ 2022-07-27 13:37 UTC (permalink / raw) To: Mauro Carvalho Chehab; +Cc: intel-gfx [-- Attachment #1: Type: text/plain, Size: 8435 bytes --] == Series Details == Series: drm/i915: reduce TLB performance regressions URL : https://patchwork.freedesktop.org/series/106758/ State : success == Summary == CI Bug Log - changes from CI_DRM_11946 -> Patchwork_106758v1 ==================================================== Summary ------- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/index.html Participating hosts (38 -> 39) ------------------------------ Additional (2): fi-hsw-4770 bat-jsl-1 Missing (1): fi-bdw-samus Known issues ------------ Here are the changes found in Patchwork_106758v1 that come from known issues: ### IGT changes ### #### Issues hit #### * igt@gem_exec_suspend@basic-s3@smem: - fi-rkl-11600: NOTRUN -> [FAIL][1] ([fdo#103375]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-rkl-11600/igt@gem_exec_suspend@basic-s3@smem.html * igt@i915_pm_backlight@basic-brightness: - fi-hsw-4770: NOTRUN -> [SKIP][2] ([fdo#109271] / [i915#3012]) [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-hsw-4770/igt@i915_pm_backlight@basic-brightness.html * igt@i915_selftest@live@gem: - fi-pnv-d510: NOTRUN -> [DMESG-FAIL][3] ([i915#4528]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-pnv-d510/igt@i915_selftest@live@gem.html * igt@i915_selftest@live@requests: - fi-blb-e6850: [PASS][4] -> [DMESG-FAIL][5] ([i915#4528]) [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-blb-e6850/igt@i915_selftest@live@requests.html [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-blb-e6850/igt@i915_selftest@live@requests.html * igt@kms_addfb_basic@addfb25-y-tiled-small-legacy: - fi-hsw-4770: NOTRUN -> [SKIP][6] ([fdo#109271]) +9 similar issues [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-hsw-4770/igt@kms_addfb_basic@addfb25-y-tiled-small-legacy.html * igt@kms_chamelium@common-hpd-after-suspend: - fi-rkl-11600: NOTRUN -> [SKIP][7] ([fdo#111827]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-rkl-11600/igt@kms_chamelium@common-hpd-after-suspend.html * igt@kms_chamelium@dp-crc-fast: - fi-hsw-4770: NOTRUN -> [SKIP][8] ([fdo#109271] / [fdo#111827]) +8 similar issues [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-hsw-4770/igt@kms_chamelium@dp-crc-fast.html * igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions: - fi-bsw-kefka: [PASS][9] -> [FAIL][10] ([i915#6298]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor@atomic-transitions.html * igt@kms_psr@sprite_plane_onoff: - fi-hsw-4770: NOTRUN -> [SKIP][11] ([fdo#109271] / [i915#1072]) +3 similar issues [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-hsw-4770/igt@kms_psr@sprite_plane_onoff.html * igt@runner@aborted: - fi-blb-e6850: NOTRUN -> [FAIL][12] ([fdo#109271] / [i915#2403] / [i915#4312]) [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-blb-e6850/igt@runner@aborted.html #### Possible fixes #### * igt@debugfs_test@read_all_entries: - fi-kbl-guc: [FAIL][13] ([i915#6253]) -> [PASS][14] [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-kbl-guc/igt@debugfs_test@read_all_entries.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-kbl-guc/igt@debugfs_test@read_all_entries.html * igt@i915_selftest@live@gtt: - {bat-dg2-9}: [DMESG-WARN][15] ([i915#5763]) -> [PASS][16] +3 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/bat-dg2-9/igt@i915_selftest@live@gtt.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/bat-dg2-9/igt@i915_selftest@live@gtt.html * igt@i915_selftest@live@requests: - fi-pnv-d510: [DMESG-FAIL][17] ([i915#4528]) -> [PASS][18] [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-pnv-d510/igt@i915_selftest@live@requests.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-pnv-d510/igt@i915_selftest@live@requests.html * igt@i915_suspend@basic-s3-without-i915: - fi-rkl-11600: [INCOMPLETE][19] ([i915#5982]) -> [PASS][20] [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/fi-rkl-11600/igt@i915_suspend@basic-s3-without-i915.html * igt@kms_frontbuffer_tracking@basic: - {bat-rpls-2}: [SKIP][21] ([i915#1849]) -> [PASS][22] [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/bat-rpls-2/igt@kms_frontbuffer_tracking@basic.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/bat-rpls-2/igt@kms_frontbuffer_tracking@basic.html * igt@prime_vgem@basic-fence-flip: - {bat-rpls-2}: [SKIP][23] ([fdo#109295] / [i915#1845] / [i915#3708]) -> [PASS][24] [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/bat-rpls-2/igt@prime_vgem@basic-fence-flip.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/bat-rpls-2/igt@prime_vgem@basic-fence-flip.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [fdo#103375]: https://bugs.freedesktop.org/show_bug.cgi?id=103375 [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271 [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285 [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295 [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827 [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072 [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845 [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849 [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982 [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190 [i915#2403]: https://gitlab.freedesktop.org/drm/intel/issues/2403 [i915#3003]: https://gitlab.freedesktop.org/drm/intel/issues/3003 [i915#3012]: https://gitlab.freedesktop.org/drm/intel/issues/3012 [i915#3301]: https://gitlab.freedesktop.org/drm/intel/issues/3301 [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555 [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708 [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103 [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312 [i915#4528]: https://gitlab.freedesktop.org/drm/intel/issues/4528 [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613 [i915#5270]: https://gitlab.freedesktop.org/drm/intel/issues/5270 [i915#5763]: https://gitlab.freedesktop.org/drm/intel/issues/5763 [i915#5903]: https://gitlab.freedesktop.org/drm/intel/issues/5903 [i915#5950]: https://gitlab.freedesktop.org/drm/intel/issues/5950 [i915#5982]: https://gitlab.freedesktop.org/drm/intel/issues/5982 [i915#6253]: https://gitlab.freedesktop.org/drm/intel/issues/6253 [i915#6298]: https://gitlab.freedesktop.org/drm/intel/issues/6298 Build changes ------------- * Linux: CI_DRM_11946 -> Patchwork_106758v1 CI-20190529: 20190529 CI_DRM_11946: 0e9c43d76a145712da46e935d429ce2a3eea80e8 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6598: 97e103419021d0863db527e3f2cf39ccdd132db5 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_106758v1: 0e9c43d76a145712da46e935d429ce2a3eea80e8 @ git://anongit.freedesktop.org/gfx-ci/linux ### Linux commits dc4693d1d1bc drm/i915/gt: describe the new tlb parameter at i915_vma_resource 23abc0dfdc25 drm/i915/gt: Batch TLB invalidations 7131366111a9 drm/i915/gt: Skip TLB invalidations once wedged 7a9475aabf18 drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations 96ce5bc9b090 drm/i915/gt: document with_intel_gt_pm_if_awake() d5f349d2fc2a drm/i915/gt: Ignore TLB invalidations on idle engines == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/index.html [-- Attachment #2: Type: text/html, Size: 9186 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: reduce TLB performance regressions 2022-07-27 12:29 ` Mauro Carvalho Chehab ` (9 preceding siblings ...) (?) @ 2022-07-27 15:52 ` Patchwork -1 siblings, 0 replies; 35+ messages in thread From: Patchwork @ 2022-07-27 15:52 UTC (permalink / raw) To: Mauro Carvalho Chehab; +Cc: intel-gfx [-- Attachment #1: Type: text/plain, Size: 40925 bytes --] == Series Details == Series: drm/i915: reduce TLB performance regressions URL : https://patchwork.freedesktop.org/series/106758/ State : failure == Summary == CI Bug Log - changes from CI_DRM_11946_full -> Patchwork_106758v1_full ==================================================== Summary ------- **FAILURE** Serious unknown changes coming with Patchwork_106758v1_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_106758v1_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Participating hosts (13 -> 13) ------------------------------ No changes in participating hosts Possible new issues ------------------- Here are the unknown changes that may have been introduced in Patchwork_106758v1_full: ### IGT changes ### #### Possible regressions #### * igt@i915_pm_rpm@basic-pci-d3-state: - shard-tglb: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglb2/igt@i915_pm_rpm@basic-pci-d3-state.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglb3/igt@i915_pm_rpm@basic-pci-d3-state.html New tests --------- New tests have been introduced between CI_DRM_11946_full and Patchwork_106758v1_full: ### New IGT tests (4) ### * igt@kms_sequence@get-forked@hdmi-a-4-pipe-a: - Statuses : 1 pass(s) - Exec time: [2.34] s * igt@kms_sequence@get-forked@hdmi-a-4-pipe-b: - Statuses : 1 pass(s) - Exec time: [2.25] s * igt@kms_sequence@get-forked@hdmi-a-4-pipe-c: - Statuses : 1 pass(s) - Exec time: [2.22] s * igt@kms_sequence@get-forked@hdmi-a-4-pipe-d: - Statuses : 1 pass(s) - Exec time: [2.23] s Known issues ------------ Here are the changes found in Patchwork_106758v1_full that come from known issues: ### IGT changes ### #### Issues hit #### * igt@feature_discovery@psr2: - shard-iclb: [PASS][3] -> [SKIP][4] ([i915#658]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@feature_discovery@psr2.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@feature_discovery@psr2.html * igt@gem_create@create-massive: - shard-glk: NOTRUN -> [DMESG-WARN][5] ([i915#4991]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk2/igt@gem_create@create-massive.html - shard-kbl: NOTRUN -> [DMESG-WARN][6] ([i915#4991]) [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@gem_create@create-massive.html * igt@gem_eio@unwedge-stress: - shard-tglb: [PASS][7] -> [FAIL][8] ([i915#5784]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglb7/igt@gem_eio@unwedge-stress.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglb1/igt@gem_eio@unwedge-stress.html * igt@gem_exec_balancer@parallel-keep-submit-fence: - shard-iclb: [PASS][9] -> [SKIP][10] ([i915#4525]) +2 similar issues [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb1/igt@gem_exec_balancer@parallel-keep-submit-fence.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@gem_exec_balancer@parallel-keep-submit-fence.html * igt@gem_exec_fair@basic-none@rcs0: - shard-kbl: [PASS][11] -> [FAIL][12] ([i915#2842]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl7/igt@gem_exec_fair@basic-none@rcs0.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl7/igt@gem_exec_fair@basic-none@rcs0.html * igt@gem_exec_fair@basic-none@vcs1: - shard-iclb: NOTRUN -> [FAIL][13] ([i915#2842]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb4/igt@gem_exec_fair@basic-none@vcs1.html * igt@gem_exec_fair@basic-pace-share@rcs0: - shard-glk: [PASS][14] -> [FAIL][15] ([i915#2842]) [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-glk7/igt@gem_exec_fair@basic-pace-share@rcs0.html [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk3/igt@gem_exec_fair@basic-pace-share@rcs0.html * igt@gem_exec_fair@basic-pace@vcs0: - shard-kbl: NOTRUN -> [FAIL][16] ([i915#2842]) [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@gem_exec_fair@basic-pace@vcs0.html * igt@gem_lmem_swapping@verify-ccs: - shard-kbl: NOTRUN -> [SKIP][17] ([fdo#109271] / [i915#4613]) +3 similar issues [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@gem_lmem_swapping@verify-ccs.html * igt@gem_userptr_blits@input-checking: - shard-apl: NOTRUN -> [DMESG-WARN][18] ([i915#4991]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl3/igt@gem_userptr_blits@input-checking.html * igt@gem_workarounds@suspend-resume-context: - shard-kbl: [PASS][19] -> [DMESG-WARN][20] ([i915#180]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl4/igt@gem_workarounds@suspend-resume-context.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@gem_workarounds@suspend-resume-context.html * igt@i915_module_load@reload: - shard-skl: [PASS][21] -> [DMESG-WARN][22] ([i915#1982]) [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl1/igt@i915_module_load@reload.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl4/igt@i915_module_load@reload.html * igt@i915_pm_dc@dc6-psr: - shard-iclb: [PASS][23] -> [FAIL][24] ([i915#454]) [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@i915_pm_dc@dc6-psr.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@i915_pm_dc@dc6-psr.html * igt@kms_ccs@pipe-b-crc-primary-basic-y_tiled_gen12_mc_ccs: - shard-kbl: NOTRUN -> [SKIP][25] ([fdo#109271] / [i915#3886]) +6 similar issues [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl7/igt@kms_ccs@pipe-b-crc-primary-basic-y_tiled_gen12_mc_ccs.html * igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs: - shard-apl: NOTRUN -> [SKIP][26] ([fdo#109271] / [i915#3886]) [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl4/igt@kms_ccs@pipe-c-bad-pixel-format-y_tiled_gen12_mc_ccs.html * igt@kms_color_chamelium@pipe-a-gamma: - shard-kbl: NOTRUN -> [SKIP][27] ([fdo#109271] / [fdo#111827]) +11 similar issues [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@kms_color_chamelium@pipe-a-gamma.html * igt@kms_color_chamelium@pipe-b-ctm-limited-range: - shard-skl: NOTRUN -> [SKIP][28] ([fdo#109271] / [fdo#111827]) [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl1/igt@kms_color_chamelium@pipe-b-ctm-limited-range.html * igt@kms_color_chamelium@pipe-d-ctm-red-to-blue: - shard-apl: NOTRUN -> [SKIP][29] ([fdo#109271] / [fdo#111827]) +2 similar issues [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl4/igt@kms_color_chamelium@pipe-d-ctm-red-to-blue.html * igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1: - shard-apl: [PASS][30] -> [DMESG-WARN][31] ([i915#180]) +3 similar issues [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-apl1/igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1.html [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl1/igt@kms_cursor_crc@cursor-suspend@pipe-b-dp-1.html * igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-1: - shard-glk: NOTRUN -> [SKIP][32] ([fdo#109271]) [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk1/igt@kms_dither@fb-8bpc-vs-panel-6bpc@pipe-a-hdmi-a-1.html * igt@kms_flip@plain-flip-fb-recreate-interruptible@b-edp1: - shard-skl: [PASS][33] -> [FAIL][34] ([i915#2122]) [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl4/igt@kms_flip@plain-flip-fb-recreate-interruptible@b-edp1.html [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl7/igt@kms_flip@plain-flip-fb-recreate-interruptible@b-edp1.html * igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-32bpp-4tile-upscaling@pipe-a-default-mode: - shard-skl: NOTRUN -> [SKIP][35] ([fdo#109271]) +7 similar issues [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl1/igt@kms_flip_scaled_crc@flip-64bpp-4tile-to-32bpp-4tile-upscaling@pipe-a-default-mode.html * igt@kms_flip_scaled_crc@flip-64bpp-xtile-to-32bpp-xtile-downscaling@pipe-a-default-mode: - shard-iclb: NOTRUN -> [SKIP][36] ([i915#3555]) +1 similar issue [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_flip_scaled_crc@flip-64bpp-xtile-to-32bpp-xtile-downscaling@pipe-a-default-mode.html * igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling@pipe-a-valid-mode: - shard-iclb: NOTRUN -> [SKIP][37] ([i915#2672]) +8 similar issues [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb8/igt@kms_flip_scaled_crc@flip-64bpp-yftile-to-16bpp-yftile-upscaling@pipe-a-valid-mode.html * igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs-downscaling@pipe-a-default-mode: - shard-iclb: NOTRUN -> [SKIP][38] ([i915#2672] / [i915#3555]) [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb3/igt@kms_flip_scaled_crc@flip-64bpp-ytile-to-32bpp-ytilercccs-downscaling@pipe-a-default-mode.html * igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-render: - shard-apl: NOTRUN -> [SKIP][39] ([fdo#109271]) +24 similar issues [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl3/igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-render.html * igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max: - shard-kbl: NOTRUN -> [FAIL][40] ([fdo#108145] / [i915#265]) [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@kms_plane_alpha_blend@pipe-a-constant-alpha-max.html * igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1: - shard-iclb: [PASS][41] -> [SKIP][42] ([i915#5235]) +2 similar issues [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1.html [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_plane_scaling@planes-unity-scaling-downscale-factor-0-5@pipe-a-edp-1.html * igt@kms_psr2_su@page_flip-p010: - shard-kbl: NOTRUN -> [SKIP][43] ([fdo#109271] / [i915#658]) [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@kms_psr2_su@page_flip-p010.html * igt@kms_psr@psr2_sprite_blt: - shard-iclb: [PASS][44] -> [SKIP][45] ([fdo#109441]) +3 similar issues [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@kms_psr@psr2_sprite_blt.html [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb8/igt@kms_psr@psr2_sprite_blt.html * igt@kms_psr_stress_test@invalidate-primary-flip-overlay: - shard-iclb: [PASS][46] -> [SKIP][47] ([i915#5519]) [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb3/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb8/igt@kms_psr_stress_test@invalidate-primary-flip-overlay.html * igt@perf@blocking: - shard-skl: [PASS][48] -> [FAIL][49] ([i915#1542]) [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl4/igt@perf@blocking.html [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl7/igt@perf@blocking.html * igt@prime_nv_pcopy@test2: - shard-kbl: NOTRUN -> [SKIP][50] ([fdo#109271]) +150 similar issues [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@prime_nv_pcopy@test2.html #### Possible fixes #### * igt@fbdev@nullptr: - {shard-rkl}: [SKIP][51] ([i915#2582]) -> [PASS][52] [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@fbdev@nullptr.html [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@fbdev@nullptr.html * igt@gem_ctx_exec@basic-nohangcheck: - shard-tglb: [FAIL][53] ([i915#6268]) -> [PASS][54] [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglb5/igt@gem_ctx_exec@basic-nohangcheck.html [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglb1/igt@gem_ctx_exec@basic-nohangcheck.html * igt@gem_ctx_persistence@legacy-engines-hostile@vebox: - {shard-dg1}: [FAIL][55] ([i915#4883]) -> [PASS][56] +3 similar issues [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-dg1-15/igt@gem_ctx_persistence@legacy-engines-hostile@vebox.html [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-dg1-19/igt@gem_ctx_persistence@legacy-engines-hostile@vebox.html * igt@gem_eio@unwedge-stress: - {shard-tglu}: [TIMEOUT][57] ([i915#3063]) -> [PASS][58] [57]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglu-6/igt@gem_eio@unwedge-stress.html [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglu-2/igt@gem_eio@unwedge-stress.html * igt@gem_exec_balancer@fairslice: - {shard-rkl}: [SKIP][59] ([i915#6259]) -> [PASS][60] [59]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@gem_exec_balancer@fairslice.html [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@gem_exec_balancer@fairslice.html * igt@gem_exec_balancer@parallel-bb-first: - shard-iclb: [SKIP][61] ([i915#4525]) -> [PASS][62] [61]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@gem_exec_balancer@parallel-bb-first.html [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@gem_exec_balancer@parallel-bb-first.html * igt@gem_exec_fair@basic-deadline: - shard-kbl: [FAIL][63] ([i915#2846]) -> [PASS][64] [63]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl7/igt@gem_exec_fair@basic-deadline.html [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl1/igt@gem_exec_fair@basic-deadline.html * igt@gem_exec_fair@basic-none-share@rcs0: - {shard-tglu}: [FAIL][65] ([i915#2842]) -> [PASS][66] [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglu-8/igt@gem_exec_fair@basic-none-share@rcs0.html [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglu-5/igt@gem_exec_fair@basic-none-share@rcs0.html * igt@gem_exec_fair@basic-pace-solo@rcs0: - shard-kbl: [FAIL][67] ([i915#2842]) -> [PASS][68] +2 similar issues [67]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl6/igt@gem_exec_fair@basic-pace-solo@rcs0.html [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl1/igt@gem_exec_fair@basic-pace-solo@rcs0.html * igt@gem_exec_reloc@basic-gtt-cpu: - {shard-rkl}: [SKIP][69] ([i915#3281]) -> [PASS][70] +2 similar issues [69]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-2/igt@gem_exec_reloc@basic-gtt-cpu.html [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-5/igt@gem_exec_reloc@basic-gtt-cpu.html * igt@gem_partial_pwrite_pread@writes-after-reads: - {shard-rkl}: [SKIP][71] ([i915#3282]) -> [PASS][72] +1 similar issue [71]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-2/igt@gem_partial_pwrite_pread@writes-after-reads.html [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-5/igt@gem_partial_pwrite_pread@writes-after-reads.html * igt@gen9_exec_parse@allowed-single: - shard-glk: [DMESG-WARN][73] ([i915#5566] / [i915#716]) -> [PASS][74] [73]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-glk3/igt@gen9_exec_parse@allowed-single.html [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk2/igt@gen9_exec_parse@allowed-single.html * igt@gen9_exec_parse@batch-zero-length: - {shard-rkl}: [SKIP][75] ([i915#2527]) -> [PASS][76] [75]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-2/igt@gen9_exec_parse@batch-zero-length.html [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-5/igt@gen9_exec_parse@batch-zero-length.html * igt@i915_pm_backlight@fade: - {shard-rkl}: [SKIP][77] ([i915#3012]) -> [PASS][78] [77]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@i915_pm_backlight@fade.html [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@i915_pm_backlight@fade.html * igt@i915_pm_dc@dc6-dpms: - {shard-tglu}: [FAIL][79] ([i915#3989] / [i915#454]) -> [PASS][80] [79]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglu-5/igt@i915_pm_dc@dc6-dpms.html [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglu-2/igt@i915_pm_dc@dc6-dpms.html * igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a: - {shard-tglu}: [FAIL][81] ([i915#3825]) -> [PASS][82] [81]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglu-2/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglu-1/igt@i915_pm_lpsp@kms-lpsp@kms-lpsp-hdmi-a.html * igt@i915_pm_rc6_residency@rc6-idle@vcs0: - shard-kbl: [WARN][83] ([i915#6405]) -> [PASS][84] [83]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl4/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html - {shard-rkl}: [WARN][85] ([i915#6405]) -> [PASS][86] [85]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-1/igt@i915_pm_rc6_residency@rc6-idle@vcs0.html * igt@i915_pm_rpm@modeset-non-lpsp-stress: - {shard-dg1}: [SKIP][87] ([i915#1397]) -> [PASS][88] +3 similar issues [87]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-dg1-17/igt@i915_pm_rpm@modeset-non-lpsp-stress.html [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-dg1-14/igt@i915_pm_rpm@modeset-non-lpsp-stress.html * igt@i915_suspend@debugfs-reader: - shard-kbl: [INCOMPLETE][89] ([i915#3614] / [i915#4939]) -> [PASS][90] [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl4/igt@i915_suspend@debugfs-reader.html [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@i915_suspend@debugfs-reader.html * igt@kms_big_fb@y-tiled-32bpp-rotate-90: - shard-skl: [TIMEOUT][91] ([i915#6371]) -> [PASS][92] [91]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl9/igt@kms_big_fb@y-tiled-32bpp-rotate-90.html [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl9/igt@kms_big_fb@y-tiled-32bpp-rotate-90.html * igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size: - shard-glk: [FAIL][93] ([i915#2346]) -> [PASS][94] [93]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-glk5/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size.html [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-glk8/igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size.html * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled: - {shard-rkl}: [SKIP][95] ([fdo#111314] / [i915#4098] / [i915#4369]) -> [PASS][96] +8 similar issues [95]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled.html [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-cpu-untiled.html * igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1: - shard-apl: [FAIL][97] ([i915#79]) -> [PASS][98] [97]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-apl8/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl4/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-dp1.html * igt@kms_flip@flip-vs-suspend-interruptible@c-dp1: - shard-kbl: [DMESG-WARN][99] ([i915#180]) -> [PASS][100] +3 similar issues [99]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl1/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@kms_flip@flip-vs-suspend-interruptible@c-dp1.html * igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1: - shard-skl: [FAIL][101] ([i915#2122]) -> [PASS][102] +2 similar issues [101]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl4/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html [102]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl7/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt: - {shard-rkl}: [SKIP][103] ([i915#1849] / [i915#4098]) -> [PASS][104] +20 similar issues [103]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt.html [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt.html * igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary: - shard-skl: [DMESG-WARN][105] ([i915#1982]) -> [PASS][106] +1 similar issue [105]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl9/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl9/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html * igt@kms_plane_alpha_blend@pipe-a-alpha-7efc: - {shard-rkl}: [SKIP][107] ([i915#1849] / [i915#3546] / [i915#4098]) -> [PASS][108] [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html * igt@kms_plane_alpha_blend@pipe-a-coverage-7efc: - {shard-rkl}: [SKIP][109] ([i915#1849] / [i915#3546] / [i915#4070] / [i915#4098]) -> [PASS][110] [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-1/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_plane_alpha_blend@pipe-a-coverage-7efc.html * igt@kms_psr@cursor_render: - {shard-rkl}: [SKIP][111] ([i915#1072]) -> [PASS][112] +1 similar issue [111]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_psr@cursor_render.html [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_psr@cursor_render.html * igt@kms_psr@psr2_primary_mmap_cpu: - shard-iclb: [SKIP][113] ([fdo#109441]) -> [PASS][114] +2 similar issues [113]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@kms_psr@psr2_primary_mmap_cpu.html [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html * igt@kms_universal_plane@disable-primary-vs-flip-pipe-b: - {shard-rkl}: [SKIP][115] ([i915#1845] / [i915#4070] / [i915#4098]) -> [PASS][116] [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-1/igt@kms_universal_plane@disable-primary-vs-flip-pipe-b.html [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_universal_plane@disable-primary-vs-flip-pipe-b.html * igt@kms_vblank@pipe-b-query-idle: - {shard-rkl}: [SKIP][117] ([i915#1845] / [i915#4098]) -> [PASS][118] +33 similar issues [117]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@kms_vblank@pipe-b-query-idle.html [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@kms_vblank@pipe-b-query-idle.html * igt@perf@gen12-unprivileged-single-ctx-counters: - {shard-rkl}: [SKIP][119] ([fdo#109289]) -> [PASS][120] [119]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-5/igt@perf@gen12-unprivileged-single-ctx-counters.html [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-1/igt@perf@gen12-unprivileged-single-ctx-counters.html * igt@perf@polling-parameterized: - shard-tglb: [FAIL][121] ([i915#5639]) -> [PASS][122] [121]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-tglb1/igt@perf@polling-parameterized.html [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-tglb7/igt@perf@polling-parameterized.html * igt@perf@polling-small-buf: - {shard-rkl}: [FAIL][123] ([i915#1722]) -> [PASS][124] [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-rkl-1/igt@perf@polling-small-buf.html [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-rkl-6/igt@perf@polling-small-buf.html * igt@perf@short-reads: - shard-skl: [FAIL][125] ([i915#51]) -> [PASS][126] [125]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl6/igt@perf@short-reads.html [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl6/igt@perf@short-reads.html * igt@perf_pmu@rc6-suspend: - shard-apl: [DMESG-WARN][127] ([i915#180]) -> [PASS][128] +1 similar issue [127]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-apl1/igt@perf_pmu@rc6-suspend.html [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-apl4/igt@perf_pmu@rc6-suspend.html #### Warnings #### * igt@gem_exec_balancer@parallel-ordering: - shard-iclb: [FAIL][129] ([i915#6117]) -> [SKIP][130] ([i915#4525]) [129]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@gem_exec_balancer@parallel-ordering.html [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@gem_exec_balancer@parallel-ordering.html * igt@kms_fbcon_fbt@fbc-suspend: - shard-kbl: [INCOMPLETE][131] ([i915#180] / [i915#4939]) -> [FAIL][132] ([i915#4767]) [131]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl6/igt@kms_fbcon_fbt@fbc-suspend.html [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl7/igt@kms_fbcon_fbt@fbc-suspend.html * igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling@pipe-a-default-mode: - shard-skl: [SKIP][133] ([fdo#109271] / [i915#1888]) -> [SKIP][134] ([fdo#109271]) [133]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-skl4/igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling@pipe-a-default-mode.html [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-skl7/igt@kms_flip_scaled_crc@flip-64bpp-linear-to-32bpp-linear-downscaling@pipe-a-default-mode.html * igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf: - shard-iclb: [SKIP][135] ([i915#658]) -> [SKIP][136] ([i915#2920]) [135]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf.html [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_psr2_sf@cursor-plane-move-continuous-exceed-fully-sf.html * igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf: - shard-iclb: [SKIP][137] ([i915#2920]) -> [SKIP][138] ([i915#658]) [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb2/igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf.html [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb6/igt@kms_psr2_sf@overlay-plane-move-continuous-exceed-fully-sf.html * igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area: - shard-iclb: [SKIP][139] ([fdo#111068] / [i915#658]) -> [SKIP][140] ([i915#2920]) [139]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-iclb5/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-iclb2/igt@kms_psr2_sf@overlay-primary-update-sf-dmg-area.html * igt@runner@aborted: - shard-kbl: ([FAIL][141], [FAIL][142], [FAIL][143], [FAIL][144], [FAIL][145]) ([i915#180] / [i915#3002] / [i915#4312] / [i915#5257] / [i915#92]) -> ([FAIL][146], [FAIL][147], [FAIL][148]) ([i915#180] / [i915#3002] / [i915#4312] / [i915#5257]) [141]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl1/igt@runner@aborted.html [142]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl6/igt@runner@aborted.html [143]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl6/igt@runner@aborted.html [144]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl4/igt@runner@aborted.html [145]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11946/shard-kbl1/igt@runner@aborted.html [146]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl4/igt@runner@aborted.html [147]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@runner@aborted.html [148]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/shard-kbl6/igt@runner@aborted.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145 [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271 [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274 [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280 [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285 [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289 [fdo#109291]: https://bugs.freedesktop.org/show_bug.cgi?id=109291 [fdo#109295]: https://bugs.freedesktop.org/show_bug.cgi?id=109295 [fdo#109300]: https://bugs.freedesktop.org/show_bug.cgi?id=109300 [fdo#109308]: https://bugs.freedesktop.org/show_bug.cgi?id=109308 [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309 [fdo#109314]: https://bugs.freedesktop.org/show_bug.cgi?id=109314 [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441 [fdo#109506]: https://bugs.freedesktop.org/show_bug.cgi?id=109506 [fdo#110189]: https://bugs.freedesktop.org/show_bug.cgi?id=110189 [fdo#110723]: https://bugs.freedesktop.org/show_bug.cgi?id=110723 [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068 [fdo#111314]: https://bugs.freedesktop.org/show_bug.cgi?id=111314 [fdo#111614]: https://bugs.freedesktop.org/show_bug.cgi?id=111614 [fdo#111615]: https://bugs.freedesktop.org/show_bug.cgi?id=111615 [fdo#111656]: https://bugs.freedesktop.org/show_bug.cgi?id=111656 [fdo#111825]: https://bugs.freedesktop.org/show_bug.cgi?id=111825 [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827 [i915#1072]: https://gitlab.freedesktop.org/drm/intel/issues/1072 [i915#132]: https://gitlab.freedesktop.org/drm/intel/issues/132 [i915#1397]: https://gitlab.freedesktop.org/drm/intel/issues/1397 [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542 [i915#160]: https://gitlab.freedesktop.org/drm/intel/issues/160 [i915#1722]: https://gitlab.freedesktop.org/drm/intel/issues/1722 [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180 [i915#1825]: https://gitlab.freedesktop.org/drm/intel/issues/1825 [i915#1839]: https://gitlab.freedesktop.org/drm/intel/issues/1839 [i915#1845]: https://gitlab.freedesktop.org/drm/intel/issues/1845 [i915#1849]: https://gitlab.freedesktop.org/drm/intel/issues/1849 [i915#1850]: https://gitlab.freedesktop.org/drm/intel/issues/1850 [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888 [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982 [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122 [i915#2232]: https://gitlab.freedesktop.org/drm/intel/issues/2232 [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346 [i915#2436]: https://gitlab.freedesktop.org/drm/intel/issues/2436 [i915#2437]: https://gitlab.freedesktop.org/drm/intel/issues/2437 [i915#2527]: https://gitlab.freedesktop.org/drm/intel/issues/2527 [i915#2530]: https://gitlab.freedesktop.org/drm/intel/issues/2530 [i915#2582]: https://gitlab.freedesktop.org/drm/intel/issues/2582 [i915#265]: https://gitlab.freedesktop.org/drm/intel/issues/265 [i915#2658]: https://gitlab.freedesktop.org/drm/intel/issues/2658 [i915#2672]: https://gitlab.freedesktop.org/drm/intel/issues/2672 [i915#280]: https://gitlab.freedesktop.org/drm/intel/issues/280 [i915#2842]: https://gitlab.freedesktop.org/drm/intel/issues/2842 [i915#2846]: https://gitlab.freedesktop.org/drm/intel/issues/2846 [i915#2856]: https://gitlab.freedesktop.org/drm/intel/issues/2856 [i915#2920]: https://gitlab.freedesktop.org/drm/intel/issues/2920 [i915#2994]: https://gitlab.freedesktop.org/drm/intel/issues/2994 [i915#3002]: https://gitlab.freedesktop.org/drm/intel/issues/3002 [i915#3012]: https://gitlab.freedesktop.org/drm/intel/issues/3012 [i915#3063]: https://gitlab.freedesktop.org/drm/intel/issues/3063 [i915#3116]: https://gitlab.freedesktop.org/drm/intel/issues/3116 [i915#3281]: https://gitlab.freedesktop.org/drm/intel/issues/3281 [i915#3282]: https://gitlab.freedesktop.org/drm/intel/issues/3282 [i915#3297]: https://gitlab.freedesktop.org/drm/intel/issues/3297 [i915#3359]: https://gitlab.freedesktop.org/drm/intel/issues/3359 [i915#3361]: https://gitlab.freedesktop.org/drm/intel/issues/3361 [i915#3376]: https://gitlab.freedesktop.org/drm/intel/issues/3376 [i915#3458]: https://gitlab.freedesktop.org/drm/intel/issues/3458 [i915#3536]: https://gitlab.freedesktop.org/drm/intel/issues/3536 [i915#3539]: https://gitlab.freedesktop.org/drm/intel/issues/3539 [i915#3546]: https://gitlab.freedesktop.org/drm/intel/issues/3546 [i915#3555]: https://gitlab.freedesktop.org/drm/intel/issues/3555 [i915#3558]: https://gitlab.freedesktop.org/drm/intel/issues/3558 [i915#3614]: https://gitlab.freedesktop.org/drm/intel/issues/3614 [i915#3637]: https://gitlab.freedesktop.org/drm/intel/issues/3637 [i915#3638]: https://gitlab.freedesktop.org/drm/intel/issues/3638 [i915#3689]: https://gitlab.freedesktop.org/drm/intel/issues/3689 [i915#3708]: https://gitlab.freedesktop.org/drm/intel/issues/3708 [i915#3734]: https://gitlab.freedesktop.org/drm/intel/issues/3734 [i915#3778]: https://gitlab.freedesktop.org/drm/intel/issues/3778 [i915#3825]: https://gitlab.freedesktop.org/drm/intel/issues/3825 [i915#3828]: https://gitlab.freedesktop.org/drm/intel/issues/3828 [i915#3886]: https://gitlab.freedesktop.org/drm/intel/issues/3886 [i915#3955]: https://gitlab.freedesktop.org/drm/intel/issues/3955 [i915#3989]: https://gitlab.freedesktop.org/drm/intel/issues/3989 [i915#4016]: https://gitlab.freedesktop.org/drm/intel/issues/4016 [i915#404]: https://gitlab.freedesktop.org/drm/intel/issues/404 [i915#4070]: https://gitlab.freedesktop.org/drm/intel/issues/4070 [i915#4077]: https://gitlab.freedesktop.org/drm/intel/issues/4077 [i915#4078]: https://gitlab.freedesktop.org/drm/intel/issues/4078 [i915#4079]: https://gitlab.freedesktop.org/drm/intel/issues/4079 [i915#4083]: https://gitlab.freedesktop.org/drm/intel/issues/4083 [i915#4098]: https://gitlab.freedesktop.org/drm/intel/issues/4098 [i915#4103]: https://gitlab.freedesktop.org/drm/intel/issues/4103 [i915#4270]: https://gitlab.freedesktop.org/drm/intel/issues/4270 [i915#4312]: https://gitlab.freedesktop.org/drm/intel/issues/4312 [i915#4369]: https://gitlab.freedesktop.org/drm/intel/issues/4369 [i915#4387]: https://gitlab.freedesktop.org/drm/intel/issues/4387 [i915#4462]: https://gitlab.freedesktop.org/drm/intel/issues/4462 [i915#4525]: https://gitlab.freedesktop.org/drm/intel/issues/4525 [i915#4538]: https://gitlab.freedesktop.org/drm/intel/issues/4538 [i915#454]: https://gitlab.freedesktop.org/drm/intel/issues/454 [i915#4613]: https://gitlab.freedesktop.org/drm/intel/issues/4613 [i915#4767]: https://gitlab.freedesktop.org/drm/intel/issues/4767 [i915#4812]: https://gitlab.freedesktop.org/drm/intel/issues/4812 [i915#4833]: https://gitlab.freedesktop.org/drm/intel/issues/4833 [i915#4852]: https://gitlab.freedesktop.org/drm/intel/issues/4852 [i915#4853]: https://gitlab.freedesktop.org/drm/intel/issues/4853 [i915#4860]: https://gitlab.freedesktop.org/drm/intel/issues/4860 [i915#4883]: https://gitlab.freedesktop.org/drm/intel/issues/4883 [i915#4885]: https://gitlab.freedesktop.org/drm/intel/issues/4885 [i915#4893]: https://gitlab.freedesktop.org/drm/intel/issues/4893 [i915#4939]: https://gitlab.freedesktop.org/drm/intel/issues/4939 [i915#4941]: https://gitlab.freedesktop.org/drm/intel/issues/4941 [i915#4991]: https://gitlab.freedesktop.org/drm/intel/issues/4991 [i915#51]: https://gitlab.freedesktop.org/drm/intel/issues/51 [i915#5176]: https://gitlab.freedesktop.org/drm/intel/issues/5176 [i915#5235]: https://gitlab.freedesktop.org/drm/intel/issues/5235 [i915#5257]: https://gitlab.freedesktop.org/drm/intel/issues/5257 [i915#5286]: https://gitlab.freedesktop.org/drm/intel/issues/5286 [i915#5287]: https://gitlab.freedesktop.org/drm/intel/issues/5287 [i915#5288]: https://gitlab.freedesktop.org/drm/intel/issues/5288 [i915#5289]: https://gitlab.freedesktop.org/drm/intel/issues/5289 [i915#5325]: https://gitlab.freedesktop.org/drm/intel/issues/5325 [i915#5327]: https://gitlab.freedesktop.org/drm/intel/issues/5327 [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533 [i915#5461]: https://gitlab.freedesktop.org/drm/intel/issues/5461 [i915#5519]: https://gitlab.freedesktop.org/drm/intel/issues/5519 [i915#5563]: https://gitlab.freedesktop.org/drm/intel/issues/5563 [i915#5566]: https://gitlab.freedesktop.org/drm/intel/issues/5566 [i915#5639]: https://gitlab.freedesktop.org/drm/intel/issues/5639 [i915#5784]: https://gitlab.freedesktop.org/drm/intel/issues/5784 [i915#6095]: https://gitlab.freedesktop.org/drm/intel/issues/6095 [i915#6117]: https://gitlab.freedesktop.org/drm/intel/issues/6117 [i915#6248]: https://gitlab.freedesktop.org/drm/intel/issues/6248 [i915#6252]: https://gitlab.freedesktop.org/drm/intel/issues/6252 [i915#6259]: https://gitlab.freedesktop.org/drm/intel/issues/6259 [i915#6268]: https://gitlab.freedesktop.org/drm/intel/issues/6268 [i915#6371]: https://gitlab.freedesktop.org/drm/intel/issues/6371 [i915#6405]: https://gitlab.freedesktop.org/drm/intel/issues/6405 [i915#6433]: https://gitlab.freedesktop.org/drm/intel/issues/6433 [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658 [i915#716]: https://gitlab.freedesktop.org/drm/intel/issues/716 [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79 [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92 Build changes ------------- * Linux: CI_DRM_11946 -> Patchwork_106758v1 CI-20190529: 20190529 CI_DRM_11946: 0e9c43d76a145712da46e935d429ce2a3eea80e8 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6598: 97e103419021d0863db527e3f2cf39ccdd132db5 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_106758v1: 0e9c43d76a145712da46e935d429ce2a3eea80e8 @ git://anongit.freedesktop.org/gfx-ci/linux piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_106758v1/index.html [-- Attachment #2: Type: text/html, Size: 42552 bytes --] ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions 2022-07-27 12:29 ` Mauro Carvalho Chehab (?) @ 2022-07-28 12:08 ` Andi Shyti -1 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-28 12:08 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Sumit Semwal, linaro-mm-sig, Christian König, linux-media, Tvrtko Ursulin Hi Mauro, Pushed in drm-intel-gt-next. Thanks, Andi On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote: > Doing TLB invalidation cause performance regressions, like: > [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! > > As reported at: > https://gitlab.freedesktop.org/drm/intel/-/issues/6424 > > as this is an expensive operation. So, reduce the need of it by: > - checking if the engine is awake; > - checking if the engine is not wedged; > - batching operations. > > Additionally, add a workaround for a known hardware issue on some GPUs. > > In order to double-check that this series won't be introducing any regressions, > I used this new IGT test: > > https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 > > Checking the results for 3 different patchsets, on Broadwell: > > 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB > invalidation and serialization patches: > > $ sudo build/tests/gem_exec_tlb|grep Subtest > Subtest close-clear: SUCCESS (10.490s) > Subtest madv-clear: SUCCESS (10.484s) > Subtest u-unmap-clear: SUCCESS (10.527s) > Subtest u-shrink-clear: SUCCESS (10.506s) > Subtest close-dumb: SUCCESS (10.165s) > Subtest madv-dumb: SUCCESS (10.177s) > Subtest u-unmap-dumb: SUCCESS (10.172s) > Subtest u-shrink-dumb: SUCCESS (10.172s) > > 2) With the new version of the batch TLB invalidation patches from this series: > > $ sudo build/tests/gem_exec_tlb|grep Subtest > Subtest close-clear: SUCCESS (10.483s) > Subtest madv-clear: SUCCESS (10.495s) > Subtest u-unmap-clear: SUCCESS (10.545s) > Subtest u-shrink-clear: SUCCESS (10.508s) > Subtest close-dumb: SUCCESS (10.172s) > Subtest madv-dumb: SUCCESS (10.169s) > Subtest u-unmap-dumb: SUCCESS (10.174s) > Subtest u-shrink-dumb: SUCCESS (10.176s) > > 3) Changing the TLB invalidation routine to do nothing[1]: > > $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest > (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! > (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! > (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! > (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! > (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! > (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! > (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! > (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! > Dynamic subtest smem0 failed. > **** DEBUG **** > (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b > (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 > (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef > (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b > **** END **** > Subtest close-clear: FAIL (10.434s) > Subtest madv-clear: SUCCESS (10.479s) > Subtest u-unmap-clear: SUCCESS (10.512s) > > In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, > as shown at result (3). It also shows that both current drm-tip and drm-tip with this series > applied don't have TLB invalidation cache issues. > > [1] I applied this patch on the top of drm-tip: > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > index 68c2b0d8f187..0aefcd7be5e9 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > + // HACK: don't do TLB invalidations!!! > + return; > + > > Regards, > Mauro > > Chris Wilson (4): > drm/i915/gt: Ignore TLB invalidations on idle engines > drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations > drm/i915/gt: Skip TLB invalidations once wedged > drm/i915/gt: Batch TLB invalidations > > Mauro Carvalho Chehab (2): > drm/i915/gt: document with_intel_gt_pm_if_awake() > drm/i915/gt: describe the new tlb parameter at i915_vma_resource > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- > drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- > drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- > drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ > drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- > drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- > drivers/gpu/drm/i915/i915_vma.h | 1 + > drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- > drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- > 11 files changed, 163 insertions(+), 40 deletions(-) > > -- > 2.36.1 > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions @ 2022-07-28 12:08 ` Andi Shyti 0 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-28 12:08 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Christian König, linaro-mm-sig, Sumit Semwal, linux-media Hi Mauro, Pushed in drm-intel-gt-next. Thanks, Andi On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote: > Doing TLB invalidation cause performance regressions, like: > [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! > > As reported at: > https://gitlab.freedesktop.org/drm/intel/-/issues/6424 > > as this is an expensive operation. So, reduce the need of it by: > - checking if the engine is awake; > - checking if the engine is not wedged; > - batching operations. > > Additionally, add a workaround for a known hardware issue on some GPUs. > > In order to double-check that this series won't be introducing any regressions, > I used this new IGT test: > > https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 > > Checking the results for 3 different patchsets, on Broadwell: > > 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB > invalidation and serialization patches: > > $ sudo build/tests/gem_exec_tlb|grep Subtest > Subtest close-clear: SUCCESS (10.490s) > Subtest madv-clear: SUCCESS (10.484s) > Subtest u-unmap-clear: SUCCESS (10.527s) > Subtest u-shrink-clear: SUCCESS (10.506s) > Subtest close-dumb: SUCCESS (10.165s) > Subtest madv-dumb: SUCCESS (10.177s) > Subtest u-unmap-dumb: SUCCESS (10.172s) > Subtest u-shrink-dumb: SUCCESS (10.172s) > > 2) With the new version of the batch TLB invalidation patches from this series: > > $ sudo build/tests/gem_exec_tlb|grep Subtest > Subtest close-clear: SUCCESS (10.483s) > Subtest madv-clear: SUCCESS (10.495s) > Subtest u-unmap-clear: SUCCESS (10.545s) > Subtest u-shrink-clear: SUCCESS (10.508s) > Subtest close-dumb: SUCCESS (10.172s) > Subtest madv-dumb: SUCCESS (10.169s) > Subtest u-unmap-dumb: SUCCESS (10.174s) > Subtest u-shrink-dumb: SUCCESS (10.176s) > > 3) Changing the TLB invalidation routine to do nothing[1]: > > $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest > (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! > (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! > (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! > (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! > (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! > (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! > (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! > (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! > Dynamic subtest smem0 failed. > **** DEBUG **** > (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b > (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 > (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef > (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b > **** END **** > Subtest close-clear: FAIL (10.434s) > Subtest madv-clear: SUCCESS (10.479s) > Subtest u-unmap-clear: SUCCESS (10.512s) > > In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, > as shown at result (3). It also shows that both current drm-tip and drm-tip with this series > applied don't have TLB invalidation cache issues. > > [1] I applied this patch on the top of drm-tip: > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > index 68c2b0d8f187..0aefcd7be5e9 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > + // HACK: don't do TLB invalidations!!! > + return; > + > > Regards, > Mauro > > Chris Wilson (4): > drm/i915/gt: Ignore TLB invalidations on idle engines > drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations > drm/i915/gt: Skip TLB invalidations once wedged > drm/i915/gt: Batch TLB invalidations > > Mauro Carvalho Chehab (2): > drm/i915/gt: document with_intel_gt_pm_if_awake() > drm/i915/gt: describe the new tlb parameter at i915_vma_resource > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- > drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- > drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- > drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ > drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- > drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- > drivers/gpu/drm/i915/i915_vma.h | 1 + > drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- > drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- > 11 files changed, 163 insertions(+), 40 deletions(-) > > -- > 2.36.1 > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions @ 2022-07-28 12:08 ` Andi Shyti 0 siblings, 0 replies; 35+ messages in thread From: Andi Shyti @ 2022-07-28 12:08 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Tvrtko Ursulin, David Airlie, intel-gfx, linux-kernel, dri-devel, Christian König, linaro-mm-sig, Sumit Semwal, linux-media Hi Mauro, Pushed in drm-intel-gt-next. Thanks, Andi On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote: > Doing TLB invalidation cause performance regressions, like: > [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! > > As reported at: > https://gitlab.freedesktop.org/drm/intel/-/issues/6424 > > as this is an expensive operation. So, reduce the need of it by: > - checking if the engine is awake; > - checking if the engine is not wedged; > - batching operations. > > Additionally, add a workaround for a known hardware issue on some GPUs. > > In order to double-check that this series won't be introducing any regressions, > I used this new IGT test: > > https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 > > Checking the results for 3 different patchsets, on Broadwell: > > 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB > invalidation and serialization patches: > > $ sudo build/tests/gem_exec_tlb|grep Subtest > Subtest close-clear: SUCCESS (10.490s) > Subtest madv-clear: SUCCESS (10.484s) > Subtest u-unmap-clear: SUCCESS (10.527s) > Subtest u-shrink-clear: SUCCESS (10.506s) > Subtest close-dumb: SUCCESS (10.165s) > Subtest madv-dumb: SUCCESS (10.177s) > Subtest u-unmap-dumb: SUCCESS (10.172s) > Subtest u-shrink-dumb: SUCCESS (10.172s) > > 2) With the new version of the batch TLB invalidation patches from this series: > > $ sudo build/tests/gem_exec_tlb|grep Subtest > Subtest close-clear: SUCCESS (10.483s) > Subtest madv-clear: SUCCESS (10.495s) > Subtest u-unmap-clear: SUCCESS (10.545s) > Subtest u-shrink-clear: SUCCESS (10.508s) > Subtest close-dumb: SUCCESS (10.172s) > Subtest madv-dumb: SUCCESS (10.169s) > Subtest u-unmap-dumb: SUCCESS (10.174s) > Subtest u-shrink-dumb: SUCCESS (10.176s) > > 3) Changing the TLB invalidation routine to do nothing[1]: > > $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest > (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! > (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! > (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! > (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! > (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! > (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! > (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! > (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq > (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! > Dynamic subtest smem0 failed. > **** DEBUG **** > (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b > (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 > (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef > (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b > **** END **** > Subtest close-clear: FAIL (10.434s) > Subtest madv-clear: SUCCESS (10.479s) > Subtest u-unmap-clear: SUCCESS (10.512s) > > In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, > as shown at result (3). It also shows that both current drm-tip and drm-tip with this series > applied don't have TLB invalidation cache issues. > > [1] I applied this patch on the top of drm-tip: > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > index 68c2b0d8f187..0aefcd7be5e9 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > + // HACK: don't do TLB invalidations!!! > + return; > + > > Regards, > Mauro > > Chris Wilson (4): > drm/i915/gt: Ignore TLB invalidations on idle engines > drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations > drm/i915/gt: Skip TLB invalidations once wedged > drm/i915/gt: Batch TLB invalidations > > Mauro Carvalho Chehab (2): > drm/i915/gt: document with_intel_gt_pm_if_awake() > drm/i915/gt: describe the new tlb parameter at i915_vma_resource > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- > drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- > drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- > drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ > drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- > drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- > drivers/gpu/drm/i915/i915_vma.h | 1 + > drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- > drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- > 11 files changed, 163 insertions(+), 40 deletions(-) > > -- > 2.36.1 > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions 2022-07-28 12:08 ` Andi Shyti @ 2022-07-28 12:34 ` Mauro Carvalho Chehab -1 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-28 12:34 UTC (permalink / raw) To: Andi Shyti Cc: Mauro Carvalho Chehab, David Airlie, intel-gfx, linux-kernel, dri-devel, Christian König, linaro-mm-sig, Sumit Semwal, linux-media Hi Andi, On Thu, 28 Jul 2022 14:08:11 +0200 Andi Shyti <andi.shyti@linux.intel.com> wrote: > Hi Mauro, > > Pushed in drm-intel-gt-next. Thank you! I submitted two additional patches moving the TLB code into its own file, and adding the documentation for it, as agreed during patch 5/6 review: https://patchwork.freedesktop.org/series/106806/ That should make easier to maintain TLB-related code and have such functions properly documented. Regards, Mauro > > Thanks, > Andi > > On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote: > > Doing TLB invalidation cause performance regressions, like: > > [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! > > > > As reported at: > > https://gitlab.freedesktop.org/drm/intel/-/issues/6424 > > > > as this is an expensive operation. So, reduce the need of it by: > > - checking if the engine is awake; > > - checking if the engine is not wedged; > > - batching operations. > > > > Additionally, add a workaround for a known hardware issue on some GPUs. > > > > In order to double-check that this series won't be introducing any regressions, > > I used this new IGT test: > > > > https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 > > > > Checking the results for 3 different patchsets, on Broadwell: > > > > 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB > > invalidation and serialization patches: > > > > $ sudo build/tests/gem_exec_tlb|grep Subtest > > Subtest close-clear: SUCCESS (10.490s) > > Subtest madv-clear: SUCCESS (10.484s) > > Subtest u-unmap-clear: SUCCESS (10.527s) > > Subtest u-shrink-clear: SUCCESS (10.506s) > > Subtest close-dumb: SUCCESS (10.165s) > > Subtest madv-dumb: SUCCESS (10.177s) > > Subtest u-unmap-dumb: SUCCESS (10.172s) > > Subtest u-shrink-dumb: SUCCESS (10.172s) > > > > 2) With the new version of the batch TLB invalidation patches from this series: > > > > $ sudo build/tests/gem_exec_tlb|grep Subtest > > Subtest close-clear: SUCCESS (10.483s) > > Subtest madv-clear: SUCCESS (10.495s) > > Subtest u-unmap-clear: SUCCESS (10.545s) > > Subtest u-shrink-clear: SUCCESS (10.508s) > > Subtest close-dumb: SUCCESS (10.172s) > > Subtest madv-dumb: SUCCESS (10.169s) > > Subtest u-unmap-dumb: SUCCESS (10.174s) > > Subtest u-shrink-dumb: SUCCESS (10.176s) > > > > 3) Changing the TLB invalidation routine to do nothing[1]: > > > > $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest > > (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! > > (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! > > (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! > > (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! > > (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! > > (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! > > (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! > > (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! > > Dynamic subtest smem0 failed. > > **** DEBUG **** > > (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b > > (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 > > (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef > > (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b > > **** END **** > > Subtest close-clear: FAIL (10.434s) > > Subtest madv-clear: SUCCESS (10.479s) > > Subtest u-unmap-clear: SUCCESS (10.512s) > > > > In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, > > as shown at result (3). It also shows that both current drm-tip and drm-tip with this series > > applied don't have TLB invalidation cache issues. > > > > [1] I applied this patch on the top of drm-tip: > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > > index 68c2b0d8f187..0aefcd7be5e9 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > > @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > > + // HACK: don't do TLB invalidations!!! > > + return; > > + > > > > Regards, > > Mauro > > > > Chris Wilson (4): > > drm/i915/gt: Ignore TLB invalidations on idle engines > > drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations > > drm/i915/gt: Skip TLB invalidations once wedged > > drm/i915/gt: Batch TLB invalidations > > > > Mauro Carvalho Chehab (2): > > drm/i915/gt: document with_intel_gt_pm_if_awake() > > drm/i915/gt: describe the new tlb parameter at i915_vma_resource > > > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- > > drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- > > drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- > > drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ > > drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- > > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- > > drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- > > drivers/gpu/drm/i915/i915_vma.h | 1 + > > drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- > > drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- > > 11 files changed, 163 insertions(+), 40 deletions(-) > > > > -- > > 2.36.1 > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [Intel-gfx] [PATCH v3 0/6] drm/i915: reduce TLB performance regressions @ 2022-07-28 12:34 ` Mauro Carvalho Chehab 0 siblings, 0 replies; 35+ messages in thread From: Mauro Carvalho Chehab @ 2022-07-28 12:34 UTC (permalink / raw) To: Andi Shyti Cc: David Airlie, intel-gfx, linux-kernel, dri-devel, Sumit Semwal, linaro-mm-sig, Mauro Carvalho Chehab, Christian König, linux-media Hi Andi, On Thu, 28 Jul 2022 14:08:11 +0200 Andi Shyti <andi.shyti@linux.intel.com> wrote: > Hi Mauro, > > Pushed in drm-intel-gt-next. Thank you! I submitted two additional patches moving the TLB code into its own file, and adding the documentation for it, as agreed during patch 5/6 review: https://patchwork.freedesktop.org/series/106806/ That should make easier to maintain TLB-related code and have such functions properly documented. Regards, Mauro > > Thanks, > Andi > > On Wed, Jul 27, 2022 at 02:29:50PM +0200, Mauro Carvalho Chehab wrote: > > Doing TLB invalidation cause performance regressions, like: > > [424.370996] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms! > > > > As reported at: > > https://gitlab.freedesktop.org/drm/intel/-/issues/6424 > > > > as this is an expensive operation. So, reduce the need of it by: > > - checking if the engine is awake; > > - checking if the engine is not wedged; > > - batching operations. > > > > Additionally, add a workaround for a known hardware issue on some GPUs. > > > > In order to double-check that this series won't be introducing any regressions, > > I used this new IGT test: > > > > https://patchwork.freedesktop.org/patch/495684/?series=106757&rev=1 > > > > Checking the results for 3 different patchsets, on Broadwell: > > > > 1) On the top of drm-tip (2022y-07m-14d-08h-35m-36) - e. g. with TLB > > invalidation and serialization patches: > > > > $ sudo build/tests/gem_exec_tlb|grep Subtest > > Subtest close-clear: SUCCESS (10.490s) > > Subtest madv-clear: SUCCESS (10.484s) > > Subtest u-unmap-clear: SUCCESS (10.527s) > > Subtest u-shrink-clear: SUCCESS (10.506s) > > Subtest close-dumb: SUCCESS (10.165s) > > Subtest madv-dumb: SUCCESS (10.177s) > > Subtest u-unmap-dumb: SUCCESS (10.172s) > > Subtest u-shrink-dumb: SUCCESS (10.172s) > > > > 2) With the new version of the batch TLB invalidation patches from this series: > > > > $ sudo build/tests/gem_exec_tlb|grep Subtest > > Subtest close-clear: SUCCESS (10.483s) > > Subtest madv-clear: SUCCESS (10.495s) > > Subtest u-unmap-clear: SUCCESS (10.545s) > > Subtest u-shrink-clear: SUCCESS (10.508s) > > Subtest close-dumb: SUCCESS (10.172s) > > Subtest madv-dumb: SUCCESS (10.169s) > > Subtest u-unmap-dumb: SUCCESS (10.174s) > > Subtest u-shrink-dumb: SUCCESS (10.176s) > > > > 3) Changing the TLB invalidation routine to do nothing[1]: > > > > $ sudo ~/freedesktop-igt/build/tests/gem_exec_tlb|grep Subtest > > (gem_exec_tlb:1958) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1958) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1958) CRITICAL: Found deadbeef in a new (clear) buffer after 3 tries! > > (gem_exec_tlb:1956) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1956) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1956) CRITICAL: Found deadbeef in a new (clear) buffer after 89 tries! > > (gem_exec_tlb:1957) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1957) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1957) CRITICAL: Found deadbeef in a new (clear) buffer after 256 tries! > > (gem_exec_tlb:1960) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1960) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1960) CRITICAL: Found deadbeef in a new (clear) buffer after 845 tries! > > (gem_exec_tlb:1961) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1961) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1961) CRITICAL: Found deadbeef in a new (clear) buffer after 1138 tries! > > (gem_exec_tlb:1954) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1954) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1954) CRITICAL: Found deadbeef in a new (clear) buffer after 1359 tries! > > (gem_exec_tlb:1955) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1955) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1955) CRITICAL: Found deadbeef in a new (clear) buffer after 1794 tries! > > (gem_exec_tlb:1959) CRITICAL: Test assertion failure function check_bo, file ../tests/i915/gem_exec_tlb.c:384: > > (gem_exec_tlb:1959) CRITICAL: Failed assertion: !sq > > (gem_exec_tlb:1959) CRITICAL: Found deadbeef in a new (clear) buffer after 2139 tries! > > Dynamic subtest smem0 failed. > > **** DEBUG **** > > (gem_exec_tlb:1944) DEBUG: 2M hole:200000 contains poison:6b6b6b6b > > (gem_exec_tlb:1944) DEBUG: Running writer for 200000 at 300000 on bcs0 > > (gem_exec_tlb:1944) DEBUG: Closing hole:200000 on rcs0, sample:deadbeef > > (gem_exec_tlb:1944) DEBUG: Rechecking hole:200000, sample:6b6b6b6b > > **** END **** > > Subtest close-clear: FAIL (10.434s) > > Subtest madv-clear: SUCCESS (10.479s) > > Subtest u-unmap-clear: SUCCESS (10.512s) > > > > In summary, the test does properly detect fail when TLB cache invalidation doesn't happen, > > as shown at result (3). It also shows that both current drm-tip and drm-tip with this series > > applied don't have TLB invalidation cache issues. > > > > [1] I applied this patch on the top of drm-tip: > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c > > index 68c2b0d8f187..0aefcd7be5e9 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > > @@ -930,0 +931,3 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) > > + // HACK: don't do TLB invalidations!!! > > + return; > > + > > > > Regards, > > Mauro > > > > Chris Wilson (4): > > drm/i915/gt: Ignore TLB invalidations on idle engines > > drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations > > drm/i915/gt: Skip TLB invalidations once wedged > > drm/i915/gt: Batch TLB invalidations > > > > Mauro Carvalho Chehab (2): > > drm/i915/gt: document with_intel_gt_pm_if_awake() > > drm/i915/gt: describe the new tlb parameter at i915_vma_resource > > > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 25 +++--- > > drivers/gpu/drm/i915/gt/intel_gt.c | 77 +++++++++++++++---- > > drivers/gpu/drm/i915/gt/intel_gt.h | 12 ++- > > drivers/gpu/drm/i915/gt/intel_gt_pm.h | 11 +++ > > drivers/gpu/drm/i915/gt/intel_gt_types.h | 18 ++++- > > drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +- > > drivers/gpu/drm/i915/i915_vma.c | 33 ++++++-- > > drivers/gpu/drm/i915/i915_vma.h | 1 + > > drivers/gpu/drm/i915/i915_vma_resource.c | 9 ++- > > drivers/gpu/drm/i915/i915_vma_resource.h | 6 +- > > 11 files changed, 163 insertions(+), 40 deletions(-) > > > > -- > > 2.36.1 > > ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2022-07-28 12:34 UTC | newest] Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-07-27 12:29 [PATCH v3 0/6] drm/i915: reduce TLB performance regressions Mauro Carvalho Chehab 2022-07-27 12:29 ` [Intel-gfx] " Mauro Carvalho Chehab 2022-07-27 12:29 ` Mauro Carvalho Chehab 2022-07-27 12:29 ` [PATCH v3 1/6] drm/i915/gt: Ignore TLB invalidations on idle engines Mauro Carvalho Chehab 2022-07-27 12:29 ` [Intel-gfx] " Mauro Carvalho Chehab 2022-07-27 12:29 ` Mauro Carvalho Chehab 2022-07-27 12:29 ` [PATCH v3 2/6] drm/i915/gt: document with_intel_gt_pm_if_awake() Mauro Carvalho Chehab 2022-07-27 12:29 ` [Intel-gfx] " Mauro Carvalho Chehab 2022-07-27 12:29 ` Mauro Carvalho Chehab 2022-07-27 14:18 ` [Intel-gfx] " Andi Shyti 2022-07-27 14:18 ` Andi Shyti 2022-07-27 12:29 ` [PATCH v3 3/6] drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations Mauro Carvalho Chehab 2022-07-27 12:29 ` [Intel-gfx] " Mauro Carvalho Chehab 2022-07-27 12:29 ` Mauro Carvalho Chehab 2022-07-27 12:29 ` [PATCH v3 4/6] drm/i915/gt: Skip TLB invalidations once wedged Mauro Carvalho Chehab 2022-07-27 12:29 ` [Intel-gfx] " Mauro Carvalho Chehab 2022-07-27 12:29 ` Mauro Carvalho Chehab 2022-07-27 12:29 ` [PATCH v3 5/6] drm/i915/gt: Batch TLB invalidations Mauro Carvalho Chehab 2022-07-27 12:29 ` [Intel-gfx] " Mauro Carvalho Chehab 2022-07-27 12:29 ` Mauro Carvalho Chehab 2022-07-27 14:25 ` Andi Shyti 2022-07-27 14:25 ` [Intel-gfx] " Andi Shyti 2022-07-27 14:25 ` Andi Shyti 2022-07-27 12:29 ` [PATCH v3 6/6] drm/i915/gt: describe the new tlb parameter at i915_vma_resource Mauro Carvalho Chehab 2022-07-27 12:29 ` [Intel-gfx] " Mauro Carvalho Chehab 2022-07-27 12:29 ` Mauro Carvalho Chehab 2022-07-27 14:20 ` [Intel-gfx] " Andi Shyti 2022-07-27 13:12 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: reduce TLB performance regressions Patchwork 2022-07-27 13:37 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork 2022-07-27 15:52 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork 2022-07-28 12:08 ` [Intel-gfx] [PATCH v3 0/6] " Andi Shyti 2022-07-28 12:08 ` Andi Shyti 2022-07-28 12:08 ` Andi Shyti 2022-07-28 12:34 ` Mauro Carvalho Chehab 2022-07-28 12:34 ` Mauro Carvalho Chehab
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.