[Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path
@ 2023-07-06 16:20 Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 01/10] drm/xe/tlb: drop unnecessary smp_wmb() Matthew Auld
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

In various test cases and workloads that put the system under a heavy load, we
can sometimes see errors with missed TLB invalidations. If we handle the TLB
completions directly from IRQ we can see a big improvement.

v4:
  - Various tweaks. Big thing is to make sure we grab the fast_lock for the g2h
    reservation when doing the CT send, which closes one big race with the CT
    fast-path.
v5:
  - Various improvements. Fix the race with adding a fence after its seqno has
    already been signalled. (Matt Brost)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 01/10] drm/xe/tlb: drop unnecessary smp_wmb()
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 18:44   ` Matthew Brost
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 02/10] drm/xe/tlb: ensure we access seqno_recv once Matthew Auld
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

wake_up_all() and wait_event_timeout() already have the correct barriers
as per https://www.kernel.org/doc/Documentation/memory-barriers.txt.
This should ensure that the seqno_recv write can't be re-ordered wrt to
the actual wake_up_all() i.e we get woken up but there is no write.  The
reader side with wait_event_timeout() also has the correct barriers.
With that drop the hand rolled smp_wmb(), which is anyway missing some
kind of matching barrier on the reader side.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 2fcb477604e2..08432f472e2d 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -333,8 +333,11 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 			expected_seqno, msg[0]);
 	}
 
+	/*
+	 * wake_up_all() and wait_event_timeout() already have the correct
+	 * barriers.
+	 */
 	gt->tlb_invalidation.seqno_recv = msg[0];
-	smp_wmb();
 	wake_up_all(&guc->ct.wq);
 
 	fence = list_first_entry_or_null(&gt->tlb_invalidation.pending_fences,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 02/10] drm/xe/tlb: ensure we access seqno_recv once
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 01/10] drm/xe/tlb: drop unnecessary smp_wmb() Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 18:53   ` Matthew Brost
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 03/10] drm/xe: hold mem_access.ref for CT fast-path Matthew Auld
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

Ensure we load gt->tlb_invalidation.seqno_recv once, and use that for
our seqno checking. The gt->tlb_invalidation_seqno_past is a shared
global variable and can potentially change at any point here.  However
the checks here need to operate on a stable version of seqno_recv for
this to make any sense.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 08432f472e2d..c67abc67d9fc 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -257,15 +257,15 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
 
 static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
 {
-	if (seqno - gt->tlb_invalidation.seqno_recv <
-	    -(TLB_INVALIDATION_SEQNO_MAX / 2))
+	int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
+
+	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
 		return false;
 
-	if (seqno - gt->tlb_invalidation.seqno_recv >
-	    (TLB_INVALIDATION_SEQNO_MAX / 2))
+	if (seqno - seqno_recv > (TLB_INVALIDATION_SEQNO_MAX / 2))
 		return true;
 
-	return gt->tlb_invalidation.seqno_recv >= seqno;
+	return seqno_recv >= seqno;
 }
 
 /**
@@ -337,7 +337,7 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	 * wake_up_all() and wait_event_timeout() already have the correct
 	 * barriers.
 	 */
-	gt->tlb_invalidation.seqno_recv = msg[0];
+	WRITE_ONCE(gt->tlb_invalidation.seqno_recv, msg[0]);
 	wake_up_all(&guc->ct.wq);
 
 	fence = list_first_entry_or_null(&gt->tlb_invalidation.pending_fences,
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 03/10] drm/xe: hold mem_access.ref for CT fast-path
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 01/10] drm/xe/tlb: drop unnecessary smp_wmb() Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 02/10] drm/xe/tlb: ensure we access seqno_recv once Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 04/10] drm/xe/ct: hold fast_lock when reserving space for g2h Matthew Auld
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

Just checking xe_device_mem_access_ongoing() is not enough, we also need
to hold the reference otherwise the ref can transition from 1 -> 0 as we
enter g2h_read(), leading to warnings. While we can't do a full rpm sync
in the IRQ, we can keep the device awake if the ref is non-zero.
Introduce a new helper for this and set it to work in for the CT
fast-path.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c | 5 +++++
 drivers/gpu/drm/xe/xe_device.h | 1 +
 drivers/gpu/drm/xe/xe_guc_ct.c | 5 ++++-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 07ae208af809..94b0089b0dee 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -412,6 +412,11 @@ u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
 		DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0;
 }
 
+bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe)
+{
+	return atomic_inc_not_zero(&xe->mem_access.ref);
+}
+
 void xe_device_mem_access_get(struct xe_device *xe)
 {
 	bool resumed = xe_pm_runtime_resume_if_suspended(xe);
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 779f71d066e6..8e01bbadb149 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -138,6 +138,7 @@ static inline struct xe_force_wake * gt_to_fw(struct xe_gt *gt)
 }
 
 void xe_device_mem_access_get(struct xe_device *xe);
+bool xe_device_mem_access_get_if_ongoing(struct xe_device *xe);
 void xe_device_mem_access_put(struct xe_device *xe);
 
 static inline bool xe_device_mem_access_ongoing(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index e71d069158dc..dcce6ed34370 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1044,7 +1044,8 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
 	struct xe_device *xe = ct_to_xe(ct);
 	int len;
 
-	if (!xe_device_in_fault_mode(xe) || !xe_device_mem_access_ongoing(xe))
+	if (!xe_device_in_fault_mode(xe) ||
+	    !xe_device_mem_access_get_if_ongoing(xe))
 		return;
 
 	spin_lock(&ct->fast_lock);
@@ -1054,6 +1055,8 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
 			g2h_fast_path(ct, ct->fast_msg, len);
 	} while (len > 0);
 	spin_unlock(&ct->fast_lock);
+
+	xe_device_mem_access_put(xe);
 }
 
 /* Returns less than zero on error, 0 on done, 1 on more available */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 04/10] drm/xe/ct: hold fast_lock when reserving space for g2h
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (2 preceding siblings ...)
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 03/10] drm/xe: hold mem_access.ref for CT fast-path Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 05/10] drm/xe/tlb: increment next seqno after successful CT send Matthew Auld
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

Reserving and checking for space on the g2h side relies on the
fast_lock, and not the CT lock since we need to release space from the
fast CT path. Make sure we hold it when checking for space and reserving
it. The main concern is calling __g2h_release_space() as we are reserving
something and since the info.space and info.g2h_outstanding operations
are not atomic we can get some nonsense values back.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index dcce6ed34370..ba89db1dcfdb 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -346,7 +346,10 @@ static bool h2g_has_room(struct xe_guc_ct *ct, u32 cmd_len)
 
 static bool g2h_has_room(struct xe_guc_ct *ct, u32 g2h_len)
 {
-	lockdep_assert_held(&ct->lock);
+	if (!g2h_len)
+		return true;
+
+	lockdep_assert_held(&ct->fast_lock);
 
 	return ct->ctbs.g2h.info.space > g2h_len;
 }
@@ -367,15 +370,15 @@ static void h2g_reserve_space(struct xe_guc_ct *ct, u32 cmd_len)
 	ct->ctbs.h2g.info.space -= cmd_len;
 }
 
-static void g2h_reserve_space(struct xe_guc_ct *ct, u32 g2h_len, u32 num_g2h)
+static void __g2h_reserve_space(struct xe_guc_ct *ct, u32 g2h_len, u32 num_g2h)
 {
 	XE_BUG_ON(g2h_len > ct->ctbs.g2h.info.space);
 
 	if (g2h_len) {
-		spin_lock_irq(&ct->fast_lock);
+		lockdep_assert_held(&ct->fast_lock);
+
 		ct->ctbs.g2h.info.space -= g2h_len;
 		ct->g2h_outstanding += num_g2h;
-		spin_unlock_irq(&ct->fast_lock);
 	}
 }
 
@@ -505,21 +508,26 @@ static int __guc_ct_send_locked(struct xe_guc_ct *ct, const u32 *action,
 		}
 	}
 
+	if (g2h_len)
+		spin_lock_irq(&ct->fast_lock);
 retry:
 	ret = has_room(ct, len + GUC_CTB_HDR_LEN, g2h_len);
 	if (unlikely(ret))
-		goto out;
+		goto out_unlock;
 
 	ret = h2g_write(ct, action, len, g2h_fence ? g2h_fence->seqno : 0,
 			!!g2h_fence);
 	if (unlikely(ret)) {
 		if (ret == -EAGAIN)
 			goto retry;
-		goto out;
+		goto out_unlock;
 	}
 
-	g2h_reserve_space(ct, g2h_len, num_g2h);
+	__g2h_reserve_space(ct, g2h_len, num_g2h);
 	xe_guc_notify(ct_to_guc(ct));
+out_unlock:
+	if (g2h_len)
+		spin_unlock_irq(&ct->fast_lock);
 out:
 	return ret;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 05/10] drm/xe/tlb: increment next seqno after successful CT send
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (3 preceding siblings ...)
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 04/10] drm/xe/ct: hold fast_lock when reserving space for g2h Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 06/10] drm/xe/ct: serialise fast_lock during CT disable Matthew Auld
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

If we are in the middle of a GT reset or similar the CT might be
disabled, such that the CT send fails. However we already incremented
gt->tlb_invalidation.seqno which might lead to warnings, since we
effectively just skipped a seqno:

    0000:00:02.0: drm_WARN_ON(expected_seqno != msg[0])

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index c67abc67d9fc..de65b0b69679 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -124,10 +124,6 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 		trace_xe_gt_tlb_invalidation_fence_send(fence);
 	}
 	action[1] = seqno;
-	gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
-		TLB_INVALIDATION_SEQNO_MAX;
-	if (!gt->tlb_invalidation.seqno)
-		gt->tlb_invalidation.seqno = 1;
 	ret = xe_guc_ct_send_locked(&guc->ct, action, len,
 				    G2H_LEN_DW_TLB_INVALIDATE, 1);
 	if (!ret && fence) {
@@ -137,8 +133,13 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 					   &gt->tlb_invalidation.fence_tdr,
 					   TLB_TIMEOUT);
 	}
-	if (!ret)
+	if (!ret) {
+		gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
+			TLB_INVALIDATION_SEQNO_MAX;
+		if (!gt->tlb_invalidation.seqno)
+			gt->tlb_invalidation.seqno = 1;
 		ret = seqno;
+	}
 	if (ret < 0 && fence)
 		invalidation_fence_signal(fence);
 	mutex_unlock(&guc->ct.lock);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 06/10] drm/xe/ct: serialise fast_lock during CT disable
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (4 preceding siblings ...)
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 05/10] drm/xe/tlb: increment next seqno after successful CT send Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 07/10] drm/xe/gt: tweak placement for signalling TLB fences after GT reset Matthew Auld
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

The fast-path CT could be running as we enter a runtime-suspend or
potentially a GT reset, however here we only use the ct->fast_lock and
not the full ct->lock. Before disabling the CT, also serialise against
the fast_lock to ensure any in-progress work finishes before we start
nuking the CT related stuff. Once we disable ct->enabled and drop the
lock, any new work should fail gracefully, and anything that was in
progress should be finished.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index ba89db1dcfdb..acf488b00b66 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -301,8 +301,10 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct)
 		goto err_out;
 
 	mutex_lock(&ct->lock);
+	spin_lock_irq(&ct->fast_lock);
 	ct->g2h_outstanding = 0;
 	ct->enabled = true;
+	spin_unlock_irq(&ct->fast_lock);
 	mutex_unlock(&ct->lock);
 
 	smp_mb();
@@ -319,8 +321,10 @@ int xe_guc_ct_enable(struct xe_guc_ct *ct)
 
 void xe_guc_ct_disable(struct xe_guc_ct *ct)
 {
-	mutex_lock(&ct->lock);
-	ct->enabled = false;
+	mutex_lock(&ct->lock); /* Serialise dequeue_one_g2h() */
+	spin_lock_irq(&ct->fast_lock); /* Serialise CT fast-path */
+	ct->enabled = false; /* Finally disable CT communication */
+	spin_unlock_irq(&ct->fast_lock);
 	mutex_unlock(&ct->lock);
 
 	xa_destroy(&ct->fence_lookup);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 07/10] drm/xe/gt: tweak placement for signalling TLB fences after GT reset
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (5 preceding siblings ...)
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 06/10] drm/xe/ct: serialise fast_lock during CT disable Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 08/10] drm/xe/tlb: also update seqno_recv during reset Matthew Auld
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

Assumption here is that submission is disabled along with CT, and full
GT reset will also nuke TLBs, so should be safe to signal all in-flight
TLB fences, but only after the actual reset so move the placement
slightly.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index bc76678a8276..a21d44bfe9e8 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -519,7 +519,6 @@ static int gt_reset(struct xe_gt *gt)
 
 	xe_uc_stop_prepare(&gt->uc);
 	xe_gt_pagefault_reset(gt);
-	xe_gt_tlb_invalidation_reset(gt);
 
 	err = xe_uc_stop(&gt->uc);
 	if (err)
@@ -529,6 +528,8 @@ static int gt_reset(struct xe_gt *gt)
 	if (err)
 		goto err_out;
 
+	xe_gt_tlb_invalidation_reset(gt);
+
 	err = do_gt_restart(gt);
 	if (err)
 		goto err_out;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 08/10] drm/xe/tlb: also update seqno_recv during reset
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (6 preceding siblings ...)
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 07/10] drm/xe/gt: tweak placement for signalling TLB fences after GT reset Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 09/10] drm/xe: handle TLB invalidations from CT fast-path Matthew Auld
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

We might have various kworkers waiting for TLB flushes to complete which
are not tracked with an explicit TLB fence, however at this stage that
will never happen since the CT is already disabled, so make sure we
signal them here under the assumption that we have completed a full GT
reset.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index de65b0b69679..e1906ec7c8be 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -86,13 +86,28 @@ invalidation_fence_signal(struct xe_gt_tlb_invalidation_fence *fence)
  *
  * Signal any pending invalidation fences, should be called during a GT reset
  */
- void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
+void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
 {
 	struct xe_gt_tlb_invalidation_fence *fence, *next;
+	struct xe_guc *guc = &gt->uc.guc;
 
+	/*
+	 * CT channel is already disabled at this point. No new TLB requests can
+	 * appear.
+	 */
+
+	mutex_lock(&gt->uc.guc.ct.lock);
 	cancel_delayed_work(&gt->tlb_invalidation.fence_tdr);
+	/*
+	 * We might have various kworkers waiting for TLB flushes to complete
+	 * which are not tracked with an explicit TLB fence, however at this
+	 * stage that will never happen since the CT is already disabled, so
+	 * make sure we signal them here under the assumption that we have
+	 * completed a full GT reset.
+	 */
+	WRITE_ONCE(gt->tlb_invalidation.seqno_recv, gt->tlb_invalidation.seqno);
+	wake_up_all(&guc->ct.wq);
 
-	mutex_lock(&gt->uc.guc.ct.lock);
 	list_for_each_entry_safe(fence, next,
 				 &gt->tlb_invalidation.pending_fences, link)
 		invalidation_fence_signal(fence);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 09/10] drm/xe: handle TLB invalidations from CT fast-path
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (7 preceding siblings ...)
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 08/10] drm/xe/tlb: also update seqno_recv during reset Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 10/10] drm/xe/tlb: print seqno_recv on fence TLB timeout Matthew Auld
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

In various test cases that put the system under a heavy load, we can
sometimes see errors with missed TLB invalidations. In such cases we see
the interrupt arrive for the invalidation from the GuC, however the
actual processing of the completion is pushed onto a workqueue and
handled with all the other CT stuff, which might take longer than
expected. Since we expect TLB invalidations to complete within a
reasonable amount of time (at most ~250ms), and they do seem pretty
critical, allow handling directly from the CT fast-path.

v2 (José):
  - Actually use the correct spinlock/unlock_irq, since pending_lock is
    grabbed from IRQ.
v3:
  - Don't publish the TLB fence on the list until after we fully
    initialize it and successfully do the CT send. The list is now only
    protected by the spin_lock pending_lock and we can't hold that
    across the entire TLB send operation.
v4 (Matt Brost):
  - Be careful with racing against fast CT path writing the seqno,
    before we have actually published the fence.

References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/297
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/320
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/449
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 135 ++++++++++++--------
 drivers/gpu/drm/xe/xe_gt_types.h            |   5 +
 drivers/gpu/drm/xe/xe_guc_ct.c              |  12 +-
 3 files changed, 92 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index e1906ec7c8be..94fa24f360f5 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -25,7 +25,7 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
 					tlb_invalidation.fence_tdr.work);
 	struct xe_gt_tlb_invalidation_fence *fence, *next;
 
-	mutex_lock(&gt->uc.guc.ct.lock);
+	spin_lock_irq(&gt->tlb_invalidation.pending_lock);
 	list_for_each_entry_safe(fence, next,
 				 &gt->tlb_invalidation.pending_fences, link) {
 		s64 since_inval_ms = ktime_ms_delta(ktime_get(),
@@ -47,7 +47,7 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
 		queue_delayed_work(system_wq,
 				   &gt->tlb_invalidation.fence_tdr,
 				   TLB_TIMEOUT);
-	mutex_unlock(&gt->uc.guc.ct.lock);
+	spin_unlock_irq(&gt->tlb_invalidation.pending_lock);
 }
 
 /**
@@ -63,6 +63,7 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 {
 	gt->tlb_invalidation.seqno = 1;
 	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
+	spin_lock_init(&gt->tlb_invalidation.pending_lock);
 	spin_lock_init(&gt->tlb_invalidation.lock);
 	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
 	INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
@@ -72,14 +73,20 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 }
 
 static void
-invalidation_fence_signal(struct xe_gt_tlb_invalidation_fence *fence)
+__invalidation_fence_signal(struct xe_gt_tlb_invalidation_fence *fence)
 {
 	trace_xe_gt_tlb_invalidation_fence_signal(fence);
-	list_del(&fence->link);
 	dma_fence_signal(&fence->base);
 	dma_fence_put(&fence->base);
 }
 
+static void
+invalidation_fence_signal(struct xe_gt_tlb_invalidation_fence *fence)
+{
+	__invalidation_fence_signal(fence);
+	list_del(&fence->link);
+}
+
 /**
  * xe_gt_tlb_invalidation_reset - Initialize GT TLB invalidation reset
  * @gt: graphics tile
@@ -97,6 +104,7 @@ void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
 	 */
 
 	mutex_lock(&gt->uc.guc.ct.lock);
+	spin_lock_irq(&gt->tlb_invalidation.pending_lock);
 	cancel_delayed_work(&gt->tlb_invalidation.fence_tdr);
 	/*
 	 * We might have various kworkers waiting for TLB flushes to complete
@@ -111,9 +119,23 @@ void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
 	list_for_each_entry_safe(fence, next,
 				 &gt->tlb_invalidation.pending_fences, link)
 		invalidation_fence_signal(fence);
+	spin_unlock_irq(&gt->tlb_invalidation.pending_lock);
 	mutex_unlock(&gt->uc.guc.ct.lock);
 }
 
+static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
+{
+	int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
+
+	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
+		return false;
+
+	if (seqno - seqno_recv > (TLB_INVALIDATION_SEQNO_MAX / 2))
+		return true;
+
+	return seqno_recv >= seqno;
+}
+
 static int send_tlb_invalidation(struct xe_guc *guc,
 				 struct xe_gt_tlb_invalidation_fence *fence,
 				 u32 *action, int len)
@@ -121,7 +143,6 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 	struct xe_gt *gt = guc_to_gt(guc);
 	int seqno;
 	int ret;
-	bool queue_work;
 
 	/*
 	 * XXX: The seqno algorithm relies on TLB invalidation being processed
@@ -132,21 +153,36 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 	mutex_lock(&guc->ct.lock);
 	seqno = gt->tlb_invalidation.seqno;
 	if (fence) {
-		queue_work = list_empty(&gt->tlb_invalidation.pending_fences);
 		fence->seqno = seqno;
-		list_add_tail(&fence->link,
-			      &gt->tlb_invalidation.pending_fences);
 		trace_xe_gt_tlb_invalidation_fence_send(fence);
 	}
 	action[1] = seqno;
 	ret = xe_guc_ct_send_locked(&guc->ct, action, len,
 				    G2H_LEN_DW_TLB_INVALIDATE, 1);
 	if (!ret && fence) {
-		fence->invalidation_time = ktime_get();
-		if (queue_work)
-			queue_delayed_work(system_wq,
-					   &gt->tlb_invalidation.fence_tdr,
-					   TLB_TIMEOUT);
+		spin_lock_irq(&gt->tlb_invalidation.pending_lock);
+		/*
+		 * We haven't actually published the TLB fence as per
+		 * pending_fences, but in theory our seqno could have already
+		 * been written as we acquired the pending_lock. In such a case
+		 * we can just go ahead and signal the fence here.
+		 */
+		if (tlb_invalidation_seqno_past(gt, seqno)) {
+			__invalidation_fence_signal(fence);
+		} else {
+			fence->invalidation_time = ktime_get();
+			list_add_tail(&fence->link,
+				      &gt->tlb_invalidation.pending_fences);
+
+			if (list_is_singular(&gt->tlb_invalidation.pending_fences))
+				queue_delayed_work(system_wq,
+						   &gt->tlb_invalidation.fence_tdr,
+						   TLB_TIMEOUT);
+
+		}
+		spin_unlock_irq(&gt->tlb_invalidation.pending_lock);
+	} else if (ret < 0 && fence) {
+		__invalidation_fence_signal(fence);
 	}
 	if (!ret) {
 		gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
@@ -155,8 +191,6 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 			gt->tlb_invalidation.seqno = 1;
 		ret = seqno;
 	}
-	if (ret < 0 && fence)
-		invalidation_fence_signal(fence);
 	mutex_unlock(&guc->ct.lock);
 
 	return ret;
@@ -271,19 +305,6 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
 	return ret;
 }
 
-static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
-{
-	int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
-
-	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
-		return false;
-
-	if (seqno - seqno_recv > (TLB_INVALIDATION_SEQNO_MAX / 2))
-		return true;
-
-	return seqno_recv >= seqno;
-}
-
 /**
  * xe_gt_tlb_invalidation_wait - Wait for TLB to complete
  * @gt: graphics tile
@@ -331,22 +352,31 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
 int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 {
 	struct xe_gt *gt = guc_to_gt(guc);
-	struct xe_gt_tlb_invalidation_fence *fence;
-	int expected_seqno;
-
-	lockdep_assert_held(&guc->ct.lock);
+	struct xe_gt_tlb_invalidation_fence *fence, *next;
+	unsigned long flags;
 
 	if (unlikely(len != 1))
 		return -EPROTO;
 
-	/* Sanity check on seqno */
-	expected_seqno = (gt->tlb_invalidation.seqno_recv + 1) %
-		TLB_INVALIDATION_SEQNO_MAX;
-	if (!expected_seqno)
-		expected_seqno = 1;
-	if (drm_WARN_ON(&gt_to_xe(gt)->drm, expected_seqno != msg[0])) {
-		drm_err(&gt_to_xe(gt)->drm, "TLB expected_seqno(%d) != msg(%u)\n",
-			expected_seqno, msg[0]);
+	/*
+	 * This can also be run both directly from the IRQ handler and also in
+	 * process_g2h_msg(). Only one may process any individual CT message,
+	 * however the order they are processed here could result in skipping a
+	 * seqno. To handle that we just process all the seqnos from the last
+	 * seqno_recv up to and including the one in msg[0]. The delta should be
+	 * very small so there shouldn't be much of pending_fences we actually
+	 * need to iterate over here.
+	 *
+	 * From GuC POV we expect the seqnos to always appear in-order, so if we
+	 * see something later in the timeline we can be sure that anything
+	 * appearing earlier has already signalled, just that we have yet to
+	 * officially process the CT message like if racing against
+	 * process_g2h_msg().
+	 */
+	spin_lock_irqsave(&gt->tlb_invalidation.pending_lock, flags);
+	if (tlb_invalidation_seqno_past(gt, msg[0])) {
+		spin_unlock_irqrestore(&gt->tlb_invalidation.pending_lock, flags);
+		return 0;
 	}
 
 	/*
@@ -356,19 +386,24 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	WRITE_ONCE(gt->tlb_invalidation.seqno_recv, msg[0]);
 	wake_up_all(&guc->ct.wq);
 
-	fence = list_first_entry_or_null(&gt->tlb_invalidation.pending_fences,
-					 typeof(*fence), link);
-	if (fence)
+	list_for_each_entry_safe(fence, next,
+				 &gt->tlb_invalidation.pending_fences, link) {
 		trace_xe_gt_tlb_invalidation_fence_recv(fence);
-	if (fence && tlb_invalidation_seqno_past(gt, fence->seqno)) {
+
+		if (!tlb_invalidation_seqno_past(gt, fence->seqno))
+			break;
+
 		invalidation_fence_signal(fence);
-		if (!list_empty(&gt->tlb_invalidation.pending_fences))
-			mod_delayed_work(system_wq,
-					 &gt->tlb_invalidation.fence_tdr,
-					 TLB_TIMEOUT);
-		else
-			cancel_delayed_work(&gt->tlb_invalidation.fence_tdr);
 	}
 
+	if (!list_empty(&gt->tlb_invalidation.pending_fences))
+		mod_delayed_work(system_wq,
+				 &gt->tlb_invalidation.fence_tdr,
+				 TLB_TIMEOUT);
+	else
+		cancel_delayed_work(&gt->tlb_invalidation.fence_tdr);
+
+	spin_unlock_irqrestore(&gt->tlb_invalidation.pending_lock, flags);
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 7d4de019f9a5..28b8e8a86fc9 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -163,6 +163,11 @@ struct xe_gt {
 		 * invaliations, protected by CT lock
 		 */
 		struct list_head pending_fences;
+		/**
+		 * @pending_lock: protects @pending_fences and updating
+		 * @seqno_recv.
+		 */
+		spinlock_t pending_lock;
 		/**
 		 * @fence_tdr: schedules a delayed call to
 		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index acf488b00b66..e35971d84e06 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -994,15 +994,8 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path)
 			return 0;
 
 		switch (FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[1])) {
-		/*
-		 * FIXME: We really should process
-		 * XE_GUC_ACTION_TLB_INVALIDATION_DONE here in the fast-path as
-		 * these critical for page fault performance. We currently can't
-		 * due to TLB invalidation done algorithm expecting the seqno
-		 * returned in-order. With some small changes to the algorithm
-		 * and locking we should be able to support out-of-order seqno.
-		 */
 		case XE_GUC_ACTION_REPORT_PAGE_FAULT_REQ_DESC:
+		case XE_GUC_ACTION_TLB_INVALIDATION_DONE:
 			break;	/* Process these in fast-path */
 		default:
 			return 0;
@@ -1056,8 +1049,7 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
 	struct xe_device *xe = ct_to_xe(ct);
 	int len;
 
-	if (!xe_device_in_fault_mode(xe) ||
-	    !xe_device_mem_access_get_if_ongoing(xe))
+	if (!xe_device_mem_access_get_if_ongoing(xe))
 		return;
 
 	spin_lock(&ct->fast_lock);
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] [PATCH v5 10/10] drm/xe/tlb: print seqno_recv on fence TLB timeout
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (8 preceding siblings ...)
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 09/10] drm/xe: handle TLB invalidations from CT fast-path Matthew Auld
@ 2023-07-06 16:20 ` Matthew Auld
  2023-07-06 18:54   ` Matthew Brost
  2023-07-06 18:43 ` [Intel-xe] ✓ CI.Patch_applied: success for Try to handle TLB invalidations from CT fast-path (rev3) Patchwork
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 17+ messages in thread
From: Matthew Auld @ 2023-07-06 16:20 UTC (permalink / raw)
  To: intel-xe

To help debugging, sample the current seqno_recv and dump it out if we
encounter a TLB timeout for the fences path.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 94fa24f360f5..94040a4defb3 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -35,8 +35,8 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
 			break;
 
 		trace_xe_gt_tlb_invalidation_fence_timeout(fence);
-		drm_err(&gt_to_xe(gt)->drm, "gt%d: TLB invalidation fence timeout, seqno=%d",
-			gt->info.id, fence->seqno);
+		drm_err(&gt_to_xe(gt)->drm, "gt%d: TLB invalidation fence timeout, seqno=%d recv=%d",
+			gt->info.id, fence->seqno, gt->tlb_invalidation.seqno_recv);
 
 		list_del(&fence->link);
 		fence->base.error = -ETIME;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Intel-xe] ✓ CI.Patch_applied: success for Try to handle TLB invalidations from CT fast-path (rev3)
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (9 preceding siblings ...)
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 10/10] drm/xe/tlb: print seqno_recv on fence TLB timeout Matthew Auld
@ 2023-07-06 18:43 ` Patchwork
  2023-07-06 18:43 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
  2023-07-07 13:47 ` [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Souza, Jose
  12 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2023-07-06 18:43 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

== Series Details ==

Series: Try to handle TLB invalidations from CT fast-path (rev3)
URL   : https://patchwork.freedesktop.org/series/120174/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: abcb88850 drm/xe: make kobject type struct as constant
=== git am output follows ===
Applying: drm/xe/tlb: drop unnecessary smp_wmb()
Applying: drm/xe/tlb: ensure we access seqno_recv once
Applying: drm/xe: hold mem_access.ref for CT fast-path
Applying: drm/xe/ct: hold fast_lock when reserving space for g2h
Applying: drm/xe/tlb: increment next seqno after successful CT send
Applying: drm/xe/ct: serialise fast_lock during CT disable
Applying: drm/xe/gt: tweak placement for signalling TLB fences after GT reset
Applying: drm/xe/tlb: also update seqno_recv during reset
Applying: drm/xe: handle TLB invalidations from CT fast-path
Applying: drm/xe/tlb: print seqno_recv on fence TLB timeout



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Intel-xe] ✗ CI.checkpatch: warning for Try to handle TLB invalidations from CT fast-path (rev3)
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (10 preceding siblings ...)
  2023-07-06 18:43 ` [Intel-xe] ✓ CI.Patch_applied: success for Try to handle TLB invalidations from CT fast-path (rev3) Patchwork
@ 2023-07-06 18:43 ` Patchwork
  2023-07-07 13:47 ` [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Souza, Jose
  12 siblings, 0 replies; 17+ messages in thread
From: Patchwork @ 2023-07-06 18:43 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

== Series Details ==

Series: Try to handle TLB invalidations from CT fast-path (rev3)
URL   : https://patchwork.freedesktop.org/series/120174/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
c7d32770e3cd31d9fc134ce41f329b10aa33ee15
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit acf7079ed87bae407c14c047401f2eab92c7681f
Author: Matthew Auld <matthew.auld@intel.com>
Date:   Thu Jul 6 17:20:14 2023 +0100

    drm/xe/tlb: print seqno_recv on fence TLB timeout
    
    To help debugging, sample the current seqno_recv and dump it out if we
    encounter a TLB timeout for the fences path.
    
    Signed-off-by: Matthew Auld <matthew.auld@intel.com>
    Cc: Matthew Brost <matthew.brost@intel.com>
    Cc: José Roberto de Souza <jose.souza@intel.com>
+ /mt/dim checkpatch abcb88850f885254cf7477e1f7a5956136fb547c drm-intel
506d38469 drm/xe/tlb: drop unnecessary smp_wmb()
48c1fe486 drm/xe/tlb: ensure we access seqno_recv once
f79cf87dc drm/xe: hold mem_access.ref for CT fast-path
0905214a5 drm/xe/ct: hold fast_lock when reserving space for g2h
bf675b585 drm/xe/tlb: increment next seqno after successful CT send
fd34513ed drm/xe/ct: serialise fast_lock during CT disable
7216e6fb2 drm/xe/gt: tweak placement for signalling TLB fences after GT reset
9054a087b drm/xe/tlb: also update seqno_recv during reset
1b74b633d drm/xe: handle TLB invalidations from CT fast-path
-:30: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' or 'Closes:' instead
#30: 
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/297

-:31: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' or 'Closes:' instead
#31: 
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/320

-:32: WARNING:COMMIT_LOG_USE_LINK: Unknown link reference 'References:', use 'Link:' or 'Closes:' instead
#32: 
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/449

-:169: CHECK:BRACES: Blank lines aren't necessary before a close brace '}'
#169: FILE: drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c:182:
+
+		}

total: 0 errors, 3 warnings, 1 checks, 266 lines checked
acf7079ed drm/xe/tlb: print seqno_recv on fence TLB timeout



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-xe] [PATCH v5 01/10] drm/xe/tlb: drop unnecessary smp_wmb()
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 01/10] drm/xe/tlb: drop unnecessary smp_wmb() Matthew Auld
@ 2023-07-06 18:44   ` Matthew Brost
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Brost @ 2023-07-06 18:44 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

On Thu, Jul 06, 2023 at 05:20:05PM +0100, Matthew Auld wrote:
> wake_up_all() and wait_event_timeout() already have the correct barriers
> as per https://www.kernel.org/doc/Documentation/memory-barriers.txt.
> This should ensure that the seqno_recv write can't be re-ordered wrt to
> the actual wake_up_all() i.e we get woken up but there is no write.  The
> reader side with wait_event_timeout() also has the correct barriers.
> With that drop the hand rolled smp_wmb(), which is anyway missing some
> kind of matching barrier on the reader side.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: José Roberto de Souza <jose.souza@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 2fcb477604e2..08432f472e2d 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -333,8 +333,11 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>  			expected_seqno, msg[0]);
>  	}
>  
> +	/*
> +	 * wake_up_all() and wait_event_timeout() already have the correct
> +	 * barriers.
> +	 */
>  	gt->tlb_invalidation.seqno_recv = msg[0];
> -	smp_wmb();
>  	wake_up_all(&guc->ct.wq);
>  
>  	fence = list_first_entry_or_null(&gt->tlb_invalidation.pending_fences,
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-xe] [PATCH v5 02/10] drm/xe/tlb: ensure we access seqno_recv once
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 02/10] drm/xe/tlb: ensure we access seqno_recv once Matthew Auld
@ 2023-07-06 18:53   ` Matthew Brost
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Brost @ 2023-07-06 18:53 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

On Thu, Jul 06, 2023 at 05:20:06PM +0100, Matthew Auld wrote:
> Ensure we load gt->tlb_invalidation.seqno_recv once, and use that for
> our seqno checking. The gt->tlb_invalidation_seqno_past is a shared
> global variable and can potentially change at any point here.  However
> the checks here need to operate on a stable version of seqno_recv for
> this to make any sense.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: José Roberto de Souza <jose.souza@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 08432f472e2d..c67abc67d9fc 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -257,15 +257,15 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
>  
>  static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
>  {
> -	if (seqno - gt->tlb_invalidation.seqno_recv <
> -	    -(TLB_INVALIDATION_SEQNO_MAX / 2))
> +	int seqno_recv = READ_ONCE(gt->tlb_invalidation.seqno_recv);
> +
> +	if (seqno - seqno_recv < -(TLB_INVALIDATION_SEQNO_MAX / 2))
>  		return false;
>  
> -	if (seqno - gt->tlb_invalidation.seqno_recv >
> -	    (TLB_INVALIDATION_SEQNO_MAX / 2))
> +	if (seqno - seqno_recv > (TLB_INVALIDATION_SEQNO_MAX / 2))
>  		return true;
>  
> -	return gt->tlb_invalidation.seqno_recv >= seqno;
> +	return seqno_recv >= seqno;
>  }
>  
>  /**
> @@ -337,7 +337,7 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>  	 * wake_up_all() and wait_event_timeout() already have the correct
>  	 * barriers.
>  	 */
> -	gt->tlb_invalidation.seqno_recv = msg[0];
> +	WRITE_ONCE(gt->tlb_invalidation.seqno_recv, msg[0]);
>  	wake_up_all(&guc->ct.wq);
>  
>  	fence = list_first_entry_or_null(&gt->tlb_invalidation.pending_fences,
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-xe] [PATCH v5 10/10] drm/xe/tlb: print seqno_recv on fence TLB timeout
  2023-07-06 16:20 ` [Intel-xe] [PATCH v5 10/10] drm/xe/tlb: print seqno_recv on fence TLB timeout Matthew Auld
@ 2023-07-06 18:54   ` Matthew Brost
  0 siblings, 0 replies; 17+ messages in thread
From: Matthew Brost @ 2023-07-06 18:54 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe

On Thu, Jul 06, 2023 at 05:20:14PM +0100, Matthew Auld wrote:
> To help debugging, sample the current seqno_recv and dump it out if we
> encounter a TLB timeout for the fences path.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: José Roberto de Souza <jose.souza@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 94fa24f360f5..94040a4defb3 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -35,8 +35,8 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
>  			break;
>  
>  		trace_xe_gt_tlb_invalidation_fence_timeout(fence);
> -		drm_err(&gt_to_xe(gt)->drm, "gt%d: TLB invalidation fence timeout, seqno=%d",
> -			gt->info.id, fence->seqno);
> +		drm_err(&gt_to_xe(gt)->drm, "gt%d: TLB invalidation fence timeout, seqno=%d recv=%d",
> +			gt->info.id, fence->seqno, gt->tlb_invalidation.seqno_recv);
>  
>  		list_del(&fence->link);
>  		fence->base.error = -ETIME;
> -- 
> 2.41.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path
  2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
                   ` (11 preceding siblings ...)
  2023-07-06 18:43 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
@ 2023-07-07 13:47 ` Souza, Jose
  12 siblings, 0 replies; 17+ messages in thread
From: Souza, Jose @ 2023-07-07 13:47 UTC (permalink / raw)
  To: intel-xe, Auld,  Matthew

On Thu, 2023-07-06 at 17:20 +0100, Matthew Auld wrote:
> In various test cases and workloads that put the system under a heavy load, we
> can sometimes see errors with missed TLB invalidations. If we handle the TLB
> completions directly from IRQ we can see a big improvement.
> 
> v4:
>   - Various tweaks. Big thing is to make sure we grab the fast_lock for the g2h
>     reservation when doing the CT send, which closes one big race with the CT
>     fast-path.
> v5:
>   - Various improvements. Fix the race with adding a fence after its seqno has
>     already been signalled. (Matt Brost)
> 

Tested-by: José Roberto de Souza <jose.souza@intel.com>

Regressions fixed in this version.
Still getting some TLB timeouts like this one(https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/289) but this series reduced the number of TLB
timeouts.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-07-07 13:47 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06 16:20 [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Matthew Auld
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 01/10] drm/xe/tlb: drop unnecessary smp_wmb() Matthew Auld
2023-07-06 18:44   ` Matthew Brost
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 02/10] drm/xe/tlb: ensure we access seqno_recv once Matthew Auld
2023-07-06 18:53   ` Matthew Brost
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 03/10] drm/xe: hold mem_access.ref for CT fast-path Matthew Auld
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 04/10] drm/xe/ct: hold fast_lock when reserving space for g2h Matthew Auld
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 05/10] drm/xe/tlb: increment next seqno after successful CT send Matthew Auld
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 06/10] drm/xe/ct: serialise fast_lock during CT disable Matthew Auld
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 07/10] drm/xe/gt: tweak placement for signalling TLB fences after GT reset Matthew Auld
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 08/10] drm/xe/tlb: also update seqno_recv during reset Matthew Auld
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 09/10] drm/xe: handle TLB invalidations from CT fast-path Matthew Auld
2023-07-06 16:20 ` [Intel-xe] [PATCH v5 10/10] drm/xe/tlb: print seqno_recv on fence TLB timeout Matthew Auld
2023-07-06 18:54   ` Matthew Brost
2023-07-06 18:43 ` [Intel-xe] ✓ CI.Patch_applied: success for Try to handle TLB invalidations from CT fast-path (rev3) Patchwork
2023-07-06 18:43 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-07-07 13:47 ` [Intel-xe] [PATCH v5 00/10] Try to handle TLB invalidations from CT fast-path Souza, Jose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.