All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH 00/22] TLB Invalidation
@ 2023-02-03 20:23 Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 01/22] drm/xe: Don't process TLB invalidation done in CT fast-path Rodrigo Vivi
                   ` (22 more replies)
  0 siblings, 23 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

Let's just confirm the reviews on this patch and get them
merged to drm-xe-next.

Matthew Brost (22):
  drm/xe: Don't process TLB invalidation done in CT fast-path
  drm/xe: Break of TLB invalidation into its own file
  drm/xe: Move TLB invalidation variable to own sub-structure in GT
  drm/xe: Add TLB invalidation fence
  drm/xe: Invalidate TLB after unbind is complete
  drm/xe: Kernel doc GT TLB invalidations
  drm/xe: Add TLB invalidation fence ftrace
  drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
  drm/xe: Add TDR for invalidation fence timeout cleanup
  drm/xe: Only set VM->asid for platforms that support a ASID
  drm/xe: Delete debugfs entry to issue TLB invalidation
  drm/xe: Add has_range_tlb_invalidation device attribute
  drm/xe: Add range based TLB invalidations
  drm/xe: Propagate error from bind operations to async fence
  drm/xe: Use GuC to do GGTT invalidations for the GuC firmware
  drm/xe: Coalesce GGTT invalidations
  drm/xe: Lock GGTT on when restoring kernel BOs
  drm/xe: Propagate VM unbind error to invalidation fence
  drm/xe: Signal invalidation fence immediately if CT send fails
  drm/xe: Add has_asid to device info
  drm/xe: Add TLB invalidation fence after rebinds issued from execs
  drm/xe: Drop TLB invalidation from ring operations

 drivers/gpu/drm/xe/Makefile                   |   1 +
 drivers/gpu/drm/xe/xe_bo_evict.c              |   5 +-
 drivers/gpu/drm/xe/xe_device.c                |  14 +
 drivers/gpu/drm/xe/xe_device_types.h          |   4 +
 drivers/gpu/drm/xe/xe_ggtt.c                  |  23 +-
 drivers/gpu/drm/xe/xe_ggtt_types.h            |   2 +
 drivers/gpu/drm/xe/xe_gt.c                    |  19 +
 drivers/gpu/drm/xe/xe_gt.h                    |   1 +
 drivers/gpu/drm/xe/xe_gt_debugfs.c            |  21 --
 drivers/gpu/drm/xe/xe_gt_pagefault.c          | 104 +-----
 drivers/gpu/drm/xe/xe_gt_pagefault.h          |   3 -
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c   | 342 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h   |  26 ++
 .../gpu/drm/xe/xe_gt_tlb_invalidation_types.h |  28 ++
 drivers/gpu/drm/xe/xe_gt_types.h              |  41 ++-
 drivers/gpu/drm/xe/xe_guc.c                   |   2 +
 drivers/gpu/drm/xe/xe_guc_ct.c                |  10 +-
 drivers/gpu/drm/xe/xe_guc_types.h             |   2 +
 drivers/gpu/drm/xe/xe_lrc.c                   |   4 +-
 drivers/gpu/drm/xe/xe_pci.c                   |   7 +
 drivers/gpu/drm/xe/xe_pt.c                    | 130 +++++++
 drivers/gpu/drm/xe/xe_ring_ops.c              |  40 +-
 drivers/gpu/drm/xe/xe_trace.h                 |  55 +++
 drivers/gpu/drm/xe/xe_uc.c                    |   9 +-
 drivers/gpu/drm/xe/xe_uc.h                    |   1 +
 drivers/gpu/drm/xe/xe_vm.c                    |  42 ++-
 26 files changed, 736 insertions(+), 200 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
 create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
 create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h

-- 
2.39.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 01/22] drm/xe: Don't process TLB invalidation done in CT fast-path
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 02/22] drm/xe: Break of TLB invalidation into its own file Rodrigo Vivi
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

We can't currently do this due to TLB invalidation done handler
expecting the seqno being received in-order, with the fast-path a TLB
invalidation done could pass one being processed in the slow-path in an
extreme corner case. Remove TLB invalidation done from the fast-path for
now and in a follow up reenable this once the TLB invalidation done
handler can deal with out of order seqno.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_guc_ct.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index f48eb01847ef..6e25c1d5d43e 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -966,7 +966,14 @@ static int g2h_read(struct xe_guc_ct *ct, u32 *msg, bool fast_path)
 			return 0;
 
 		switch (FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, msg[1])) {
-		case XE_GUC_ACTION_TLB_INVALIDATION_DONE:
+		/*
+		 * FIXME: We really should process
+		 * XE_GUC_ACTION_TLB_INVALIDATION_DONE here in the fast-path as
+		 * these critical for page fault performance. We currently can't
+		 * due to TLB invalidation done algorithm expecting the seqno
+		 * returned in-order. With some small changes to the algorithm
+		 * and locking we should be able to support out-of-order seqno.
+		 */
 		case XE_GUC_ACTION_REPORT_PAGE_FAULT_REQ_DESC:
 			break;	/* Process these in fast-path */
 		default:
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 02/22] drm/xe: Break of TLB invalidation into its own file
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 01/22] drm/xe: Don't process TLB invalidation done in CT fast-path Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 03/22] drm/xe: Move TLB invalidation variable to own sub-structure in GT Rodrigo Vivi
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

TLB invalidation is used by more than USM (page faults) so break this
code out into its own file.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/Makefile                 |   1 +
 drivers/gpu/drm/xe/xe_gt.c                  |   5 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c        |  99 +----------------
 drivers/gpu/drm/xe/xe_gt_pagefault.h        |   3 -
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 115 ++++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h |  19 ++++
 drivers/gpu/drm/xe/xe_guc_ct.c              |   1 +
 drivers/gpu/drm/xe/xe_vm.c                  |   1 +
 8 files changed, 145 insertions(+), 99 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
 create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 74fa741b1937..d36d0b12f6ff 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -56,6 +56,7 @@ xe-y += xe_bb.o \
 	xe_gt_mcr.o \
 	xe_gt_pagefault.o \
 	xe_gt_sysfs.o \
+	xe_gt_tlb_invalidation.o \
 	xe_gt_topology.o \
 	xe_guc.o \
 	xe_guc_ads.o \
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 8a26b01e7d9b..1eb280c4f5f4 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -19,6 +19,7 @@
 #include "xe_gt_mcr.h"
 #include "xe_gt_pagefault.h"
 #include "xe_gt_sysfs.h"
+#include "xe_gt_tlb_invalidation.h"
 #include "xe_gt_topology.h"
 #include "xe_hw_fence.h"
 #include "xe_irq.h"
@@ -571,6 +572,10 @@ int xe_gt_init(struct xe_gt *gt)
 		xe_hw_fence_irq_init(&gt->fence_irq[i]);
 	}
 
+	err = xe_gt_tlb_invalidation_init(gt);
+	if (err)
+		return err;
+
 	err = xe_gt_pagefault_init(gt);
 	if (err)
 		return err;
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 7125113b7390..93a8efe5d0a0 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -10,9 +10,10 @@
 
 #include "xe_bo.h"
 #include "xe_gt.h"
+#include "xe_gt_pagefault.h"
+#include "xe_gt_tlb_invalidation.h"
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
-#include "xe_gt_pagefault.h"
 #include "xe_migrate.h"
 #include "xe_pt.h"
 #include "xe_trace.h"
@@ -61,40 +62,6 @@ guc_to_gt(struct xe_guc *guc)
 	return container_of(guc, struct xe_gt, uc.guc);
 }
 
-static int send_tlb_invalidation(struct xe_guc *guc)
-{
-	struct xe_gt *gt = guc_to_gt(guc);
-	u32 action[] = {
-		XE_GUC_ACTION_TLB_INVALIDATION,
-		0,
-		XE_GUC_TLB_INVAL_FULL << XE_GUC_TLB_INVAL_TYPE_SHIFT |
-		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT |
-		XE_GUC_TLB_INVAL_FLUSH_CACHE,
-	};
-	int seqno;
-	int ret;
-
-	/*
-	 * XXX: The seqno algorithm relies on TLB invalidation being processed
-	 * in order which they currently are, if that changes the algorithm will
-	 * need to be updated.
-	 */
-	mutex_lock(&guc->ct.lock);
-	seqno = gt->usm.tlb_invalidation_seqno;
-	action[1] = seqno;
-	gt->usm.tlb_invalidation_seqno = (gt->usm.tlb_invalidation_seqno + 1) %
-		TLB_INVALIDATION_SEQNO_MAX;
-	if (!gt->usm.tlb_invalidation_seqno)
-		gt->usm.tlb_invalidation_seqno = 1;
-	ret = xe_guc_ct_send_locked(&guc->ct, action, ARRAY_SIZE(action),
-				    G2H_LEN_DW_TLB_INVALIDATE, 1);
-	if (!ret)
-		ret = seqno;
-	mutex_unlock(&guc->ct.lock);
-
-	return ret;
-}
-
 static bool access_is_atomic(enum access_type access_type)
 {
 	return access_type == ACCESS_TYPE_ATOMIC;
@@ -278,7 +245,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 		 * defer TLB invalidate + fault response to a callback of fence
 		 * too
 		 */
-		ret = send_tlb_invalidation(&gt->uc.guc);
+		ret = xe_gt_tlb_invalidation(gt);
 		if (ret >= 0)
 			ret = 0;
 	}
@@ -433,7 +400,6 @@ int xe_gt_pagefault_init(struct xe_gt *gt)
 	if (!xe->info.supports_usm)
 		return 0;
 
-	gt->usm.tlb_invalidation_seqno = 1;
 	for (i = 0; i < NUM_PF_QUEUE; ++i) {
 		gt->usm.pf_queue[i].gt = gt;
 		spin_lock_init(&gt->usm.pf_queue[i].lock);
@@ -482,65 +448,6 @@ void xe_gt_pagefault_reset(struct xe_gt *gt)
 	}
 }
 
-int xe_gt_tlb_invalidation(struct xe_gt *gt)
-{
-	return send_tlb_invalidation(&gt->uc.guc);
-}
-
-static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
-{
-	if (gt->usm.tlb_invalidation_seqno_recv >= seqno)
-		return true;
-
-	if (seqno - gt->usm.tlb_invalidation_seqno_recv >
-	    (TLB_INVALIDATION_SEQNO_MAX / 2))
-		return true;
-
-	return false;
-}
-
-int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
-{
-	struct xe_device *xe = gt_to_xe(gt);
-	struct xe_guc *guc = &gt->uc.guc;
-	int ret;
-
-	/*
-	 * XXX: See above, this algorithm only works if seqno are always in
-	 * order
-	 */
-	ret = wait_event_timeout(guc->ct.wq,
-				 tlb_invalidation_seqno_past(gt, seqno),
-				 HZ / 5);
-	if (!ret) {
-		drm_err(&xe->drm, "TLB invalidation time'd out, seqno=%d, recv=%d\n",
-			seqno, gt->usm.tlb_invalidation_seqno_recv);
-		return -ETIME;
-	}
-
-	return 0;
-}
-
-int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
-{
-	struct xe_gt *gt = guc_to_gt(guc);
-	int expected_seqno;
-
-	if (unlikely(len != 1))
-		return -EPROTO;
-
-	/* Sanity check on seqno */
-	expected_seqno = (gt->usm.tlb_invalidation_seqno_recv + 1) %
-		TLB_INVALIDATION_SEQNO_MAX;
-	XE_WARN_ON(expected_seqno != msg[0]);
-
-	gt->usm.tlb_invalidation_seqno_recv = msg[0];
-	smp_wmb();
-	wake_up_all(&guc->ct.wq);
-
-	return 0;
-}
-
 static int granularity_in_byte(int val)
 {
 	switch (val) {
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.h b/drivers/gpu/drm/xe/xe_gt_pagefault.h
index 35f68027cc9c..839c065a5e4c 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.h
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.h
@@ -13,10 +13,7 @@ struct xe_guc;
 
 int xe_gt_pagefault_init(struct xe_gt *gt);
 void xe_gt_pagefault_reset(struct xe_gt *gt);
-int xe_gt_tlb_invalidation(struct xe_gt *gt);
-int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno);
 int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len);
-int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
 int xe_guc_access_counter_notify_handler(struct xe_guc *guc, u32 *msg, u32 len);
 
 #endif	/* _XE_GT_PAGEFAULT_ */
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
new file mode 100644
index 000000000000..fea7a557d213
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include "xe_gt.h"
+#include "xe_gt_tlb_invalidation.h"
+#include "xe_guc.h"
+#include "xe_guc_ct.h"
+
+static struct xe_gt *
+guc_to_gt(struct xe_guc *guc)
+{
+	return container_of(guc, struct xe_gt, uc.guc);
+}
+
+int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
+{
+	gt->usm.tlb_invalidation_seqno = 1;
+
+	return 0;
+}
+
+static int send_tlb_invalidation(struct xe_guc *guc)
+{
+	struct xe_gt *gt = guc_to_gt(guc);
+	u32 action[] = {
+		XE_GUC_ACTION_TLB_INVALIDATION,
+		0,
+		XE_GUC_TLB_INVAL_FULL << XE_GUC_TLB_INVAL_TYPE_SHIFT |
+		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT |
+		XE_GUC_TLB_INVAL_FLUSH_CACHE,
+	};
+	int seqno;
+	int ret;
+
+	/*
+	 * XXX: The seqno algorithm relies on TLB invalidation being processed
+	 * in order which they currently are, if that changes the algorithm will
+	 * need to be updated.
+	 */
+	mutex_lock(&guc->ct.lock);
+	seqno = gt->usm.tlb_invalidation_seqno;
+	action[1] = seqno;
+	gt->usm.tlb_invalidation_seqno = (gt->usm.tlb_invalidation_seqno + 1) %
+		TLB_INVALIDATION_SEQNO_MAX;
+	if (!gt->usm.tlb_invalidation_seqno)
+		gt->usm.tlb_invalidation_seqno = 1;
+	ret = xe_guc_ct_send_locked(&guc->ct, action, ARRAY_SIZE(action),
+				    G2H_LEN_DW_TLB_INVALIDATE, 1);
+	if (!ret)
+		ret = seqno;
+	mutex_unlock(&guc->ct.lock);
+
+	return ret;
+}
+
+int xe_gt_tlb_invalidation(struct xe_gt *gt)
+{
+	return send_tlb_invalidation(&gt->uc.guc);
+}
+
+static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
+{
+	if (gt->usm.tlb_invalidation_seqno_recv >= seqno)
+		return true;
+
+	if (seqno - gt->usm.tlb_invalidation_seqno_recv >
+	    (TLB_INVALIDATION_SEQNO_MAX / 2))
+		return true;
+
+	return false;
+}
+
+int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
+{
+	struct xe_device *xe = gt_to_xe(gt);
+	struct xe_guc *guc = &gt->uc.guc;
+	int ret;
+
+	/*
+	 * XXX: See above, this algorithm only works if seqno are always in
+	 * order
+	 */
+	ret = wait_event_timeout(guc->ct.wq,
+				 tlb_invalidation_seqno_past(gt, seqno),
+				 HZ / 5);
+	if (!ret) {
+		drm_err(&xe->drm, "TLB invalidation time'd out, seqno=%d, recv=%d\n",
+			seqno, gt->usm.tlb_invalidation_seqno_recv);
+		return -ETIME;
+	}
+
+	return 0;
+}
+
+int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
+{
+	struct xe_gt *gt = guc_to_gt(guc);
+	int expected_seqno;
+
+	if (unlikely(len != 1))
+		return -EPROTO;
+
+	/* Sanity check on seqno */
+	expected_seqno = (gt->usm.tlb_invalidation_seqno_recv + 1) %
+		TLB_INVALIDATION_SEQNO_MAX;
+	XE_WARN_ON(expected_seqno != msg[0]);
+
+	gt->usm.tlb_invalidation_seqno_recv = msg[0];
+	smp_wmb();
+	wake_up_all(&guc->ct.wq);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
new file mode 100644
index 000000000000..f1c3b34b1993
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GT_TLB_INVALIDATION_H_
+#define _XE_GT_TLB_INVALIDATION_H_
+
+#include <linux/types.h>
+
+struct xe_gt;
+struct xe_guc;
+
+int xe_gt_tlb_invalidation_init(struct xe_gt *gt);
+int xe_gt_tlb_invalidation(struct xe_gt *gt);
+int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno);
+int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
+
+#endif	/* _XE_GT_TLB_INVALIDATION_ */
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 6e25c1d5d43e..84d4302d4e72 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -15,6 +15,7 @@
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
 #include "xe_gt_pagefault.h"
+#include "xe_gt_tlb_invalidation.h"
 #include "xe_guc_submit.h"
 #include "xe_map.h"
 #include "xe_trace.h"
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index e999b3aafb09..92ecc7fc55b6 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -19,6 +19,7 @@
 #include "xe_engine.h"
 #include "xe_gt.h"
 #include "xe_gt_pagefault.h"
+#include "xe_gt_tlb_invalidation.h"
 #include "xe_migrate.h"
 #include "xe_pm.h"
 #include "xe_preempt_fence.h"
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 03/22] drm/xe: Move TLB invalidation variable to own sub-structure in GT
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 01/22] drm/xe: Don't process TLB invalidation done in CT fast-path Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 02/22] drm/xe: Break of TLB invalidation into its own file Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 04/22] drm/xe: Add TLB invalidation fence Rodrigo Vivi
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

TLB invalidations no longer just restricted to USM, move the variables
to own sub-structure.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 20 +++++++++----------
 drivers/gpu/drm/xe/xe_gt_types.h            | 22 ++++++++++-----------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index fea7a557d213..a39a2fb163ae 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -16,7 +16,7 @@ guc_to_gt(struct xe_guc *guc)
 
 int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 {
-	gt->usm.tlb_invalidation_seqno = 1;
+	gt->tlb_invalidation.seqno = 1;
 
 	return 0;
 }
@@ -40,12 +40,12 @@ static int send_tlb_invalidation(struct xe_guc *guc)
 	 * need to be updated.
 	 */
 	mutex_lock(&guc->ct.lock);
-	seqno = gt->usm.tlb_invalidation_seqno;
+	seqno = gt->tlb_invalidation.seqno;
 	action[1] = seqno;
-	gt->usm.tlb_invalidation_seqno = (gt->usm.tlb_invalidation_seqno + 1) %
+	gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
 		TLB_INVALIDATION_SEQNO_MAX;
-	if (!gt->usm.tlb_invalidation_seqno)
-		gt->usm.tlb_invalidation_seqno = 1;
+	if (!gt->tlb_invalidation.seqno)
+		gt->tlb_invalidation.seqno = 1;
 	ret = xe_guc_ct_send_locked(&guc->ct, action, ARRAY_SIZE(action),
 				    G2H_LEN_DW_TLB_INVALIDATE, 1);
 	if (!ret)
@@ -62,10 +62,10 @@ int xe_gt_tlb_invalidation(struct xe_gt *gt)
 
 static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
 {
-	if (gt->usm.tlb_invalidation_seqno_recv >= seqno)
+	if (gt->tlb_invalidation.seqno_recv >= seqno)
 		return true;
 
-	if (seqno - gt->usm.tlb_invalidation_seqno_recv >
+	if (seqno - gt->tlb_invalidation.seqno_recv >
 	    (TLB_INVALIDATION_SEQNO_MAX / 2))
 		return true;
 
@@ -87,7 +87,7 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
 				 HZ / 5);
 	if (!ret) {
 		drm_err(&xe->drm, "TLB invalidation time'd out, seqno=%d, recv=%d\n",
-			seqno, gt->usm.tlb_invalidation_seqno_recv);
+			seqno, gt->tlb_invalidation.seqno_recv);
 		return -ETIME;
 	}
 
@@ -103,11 +103,11 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 		return -EPROTO;
 
 	/* Sanity check on seqno */
-	expected_seqno = (gt->usm.tlb_invalidation_seqno_recv + 1) %
+	expected_seqno = (gt->tlb_invalidation.seqno_recv + 1) %
 		TLB_INVALIDATION_SEQNO_MAX;
 	XE_WARN_ON(expected_seqno != msg[0]);
 
-	gt->usm.tlb_invalidation_seqno_recv = msg[0];
+	gt->tlb_invalidation.seqno_recv = msg[0];
 	smp_wmb();
 	wake_up_all(&guc->ct.wq);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 2dbc8cedd630..3bfce7abe857 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -160,6 +160,17 @@ struct xe_gt {
 		struct work_struct worker;
 	} reset;
 
+	/** @tlb_invalidation: TLB invalidation state */
+	struct {
+		/** @seqno: TLB invalidation seqno, protected by CT lock */
+#define TLB_INVALIDATION_SEQNO_MAX	0x100000
+		int seqno;
+		/**
+		 * @seqno_recv: last received TLB invalidation seqno, protected by CT lock
+		 */
+		int seqno_recv;
+	} tlb_invalidation;
+
 	/** @usm: unified shared memory state */
 	struct {
 		/**
@@ -175,17 +186,6 @@ struct xe_gt {
 		 * operations (e.g. mmigrations, fixing page tables)
 		 */
 		u16 reserved_bcs_instance;
-		/**
-		 * @tlb_invalidation_seqno: TLB invalidation seqno, protected by
-		 * CT lock
-		 */
-#define TLB_INVALIDATION_SEQNO_MAX	0x100000
-		int tlb_invalidation_seqno;
-		/**
-		 * @tlb_invalidation_seqno_recv: last received TLB invalidation
-		 * seqno, protected by CT lock
-		 */
-		int tlb_invalidation_seqno_recv;
 		/** @pf_wq: page fault work queue, unbound, high priority */
 		struct workqueue_struct *pf_wq;
 		/** @acc_wq: access counter work queue, unbound, high priority */
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 04/22] drm/xe: Add TLB invalidation fence
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (2 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 03/22] drm/xe: Move TLB invalidation variable to own sub-structure in GT Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 05/22] drm/xe: Invalidate TLB after unbind is complete Rodrigo Vivi
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

Fence will be signaled when TLB invalidation completion.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c                    |  1 +
 drivers/gpu/drm/xe/xe_gt_debugfs.c            |  2 +-
 drivers/gpu/drm/xe/xe_gt_pagefault.c          |  2 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c   | 43 +++++++++++++++++--
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h   |  6 ++-
 .../gpu/drm/xe/xe_gt_tlb_invalidation_types.h | 26 +++++++++++
 drivers/gpu/drm/xe/xe_gt_types.h              |  5 +++
 drivers/gpu/drm/xe/xe_vm.c                    |  2 +-
 8 files changed, 80 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 1eb280c4f5f4..0be75f8afe4b 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -668,6 +668,7 @@ static int gt_reset(struct xe_gt *gt)
 
 	xe_uc_stop_prepare(&gt->uc);
 	xe_gt_pagefault_reset(gt);
+	xe_gt_tlb_invalidation_reset(gt);
 
 	err = xe_uc_stop(&gt->uc);
 	if (err)
diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c
index cd1888784141..30058c6100ab 100644
--- a/drivers/gpu/drm/xe/xe_gt_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c
@@ -96,7 +96,7 @@ static int invalidate_tlb(struct seq_file *m, void *data)
 	int seqno;
 	int ret = 0;
 
-	seqno = xe_gt_tlb_invalidation(gt);
+	seqno = xe_gt_tlb_invalidation(gt, NULL);
 	XE_WARN_ON(seqno < 0);
 	if (seqno > 0)
 		ret = xe_gt_tlb_invalidation_wait(gt, seqno);
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 93a8efe5d0a0..705093cb63d7 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -245,7 +245,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 		 * defer TLB invalidate + fault response to a callback of fence
 		 * too
 		 */
-		ret = xe_gt_tlb_invalidation(gt);
+		ret = xe_gt_tlb_invalidation(gt, NULL);
 		if (ret >= 0)
 			ret = 0;
 	}
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index a39a2fb163ae..0058a155eeb9 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -17,11 +17,27 @@ guc_to_gt(struct xe_guc *guc)
 int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 {
 	gt->tlb_invalidation.seqno = 1;
+	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
 
 	return 0;
 }
 
-static int send_tlb_invalidation(struct xe_guc *guc)
+void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
+{
+	struct xe_gt_tlb_invalidation_fence *fence, *next;
+
+	mutex_lock(&gt->uc.guc.ct.lock);
+	list_for_each_entry_safe(fence, next,
+				 &gt->tlb_invalidation.pending_fences, link) {
+		list_del(&fence->link);
+		dma_fence_signal(&fence->base);
+		dma_fence_put(&fence->base);
+	}
+	mutex_unlock(&gt->uc.guc.ct.lock);
+}
+
+static int send_tlb_invalidation(struct xe_guc *guc,
+				 struct xe_gt_tlb_invalidation_fence *fence)
 {
 	struct xe_gt *gt = guc_to_gt(guc);
 	u32 action[] = {
@@ -41,6 +57,15 @@ static int send_tlb_invalidation(struct xe_guc *guc)
 	 */
 	mutex_lock(&guc->ct.lock);
 	seqno = gt->tlb_invalidation.seqno;
+	if (fence) {
+		/*
+		 * FIXME: How to deal TLB invalidation timeout, right now we
+		 * just have an endless fence which isn't ideal.
+		 */
+		fence->seqno = seqno;
+		list_add_tail(&fence->link,
+			      &gt->tlb_invalidation.pending_fences);
+	}
 	action[1] = seqno;
 	gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
 		TLB_INVALIDATION_SEQNO_MAX;
@@ -55,9 +80,10 @@ static int send_tlb_invalidation(struct xe_guc *guc)
 	return ret;
 }
 
-int xe_gt_tlb_invalidation(struct xe_gt *gt)
+int xe_gt_tlb_invalidation(struct xe_gt *gt,
+			   struct xe_gt_tlb_invalidation_fence *fence)
 {
-	return send_tlb_invalidation(&gt->uc.guc);
+	return send_tlb_invalidation(&gt->uc.guc, fence);
 }
 
 static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
@@ -97,8 +123,11 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
 int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 {
 	struct xe_gt *gt = guc_to_gt(guc);
+	struct xe_gt_tlb_invalidation_fence *fence;
 	int expected_seqno;
 
+	lockdep_assert_held(&guc->ct.lock);
+
 	if (unlikely(len != 1))
 		return -EPROTO;
 
@@ -111,5 +140,13 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	smp_wmb();
 	wake_up_all(&guc->ct.wq);
 
+	fence = list_first_entry_or_null(&gt->tlb_invalidation.pending_fences,
+					 typeof(*fence), link);
+	if (fence && tlb_invalidation_seqno_past(gt, fence->seqno)) {
+		list_del(&fence->link);
+		dma_fence_signal(&fence->base);
+		dma_fence_put(&fence->base);
+	}
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
index f1c3b34b1993..7e6fbf46f0e3 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
@@ -8,11 +8,15 @@
 
 #include <linux/types.h>
 
+#include "xe_gt_tlb_invalidation_types.h"
+
 struct xe_gt;
 struct xe_guc;
 
 int xe_gt_tlb_invalidation_init(struct xe_gt *gt);
-int xe_gt_tlb_invalidation(struct xe_gt *gt);
+void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
+int xe_gt_tlb_invalidation(struct xe_gt *gt,
+			   struct xe_gt_tlb_invalidation_fence *fence);
 int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno);
 int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
new file mode 100644
index 000000000000..ab57c14c6d14
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_GT_TLB_INVALIDATION_TYPES_H_
+#define _XE_GT_TLB_INVALIDATION_TYPES_H_
+
+#include <linux/dma-fence.h>
+
+/**
+ * struct xe_gt_tlb_invalidation_fence - XE GT TLB invalidation fence
+ *
+ * Optionally passed to xe_gt_tlb_invalidation and will be signaled upon TLB
+ * invalidation completion.
+ */
+struct xe_gt_tlb_invalidation_fence {
+	/** @base: dma fence base */
+	struct dma_fence base;
+	/** @link: link into list of pending tlb fences */
+	struct list_head link;
+	/** @seqno: seqno of TLB invalidation to signal fence one */
+	int seqno;
+};
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 3bfce7abe857..a755e3a86552 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -169,6 +169,11 @@ struct xe_gt {
 		 * @seqno_recv: last received TLB invalidation seqno, protected by CT lock
 		 */
 		int seqno_recv;
+		/**
+		 * @pending_fences: list of pending fences waiting TLB
+		 * invaliations, protected by CT lock
+		 */
+		struct list_head pending_fences;
 	} tlb_invalidation;
 
 	/** @usm: unified shared memory state */
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 92ecc7fc55b6..4c0080d081f3 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3321,7 +3321,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 		if (xe_pt_zap_ptes(gt, vma)) {
 			gt_needs_invalidate |= BIT(id);
 			xe_device_wmb(xe);
-			seqno[id] = xe_gt_tlb_invalidation(gt);
+			seqno[id] = xe_gt_tlb_invalidation(gt, NULL);
 			if (seqno[id] < 0)
 				return seqno[id];
 		}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 05/22] drm/xe: Invalidate TLB after unbind is complete
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (3 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 04/22] drm/xe: Add TLB invalidation fence Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 06/22] drm/xe: Kernel doc GT TLB invalidations Rodrigo Vivi
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

This gets tricky as we can't do the TLB invalidation until the unbind
operation is done on the hardware and we can't signal the unbind as
complete until the TLB invalidation is done. To work around this we
create an unbind fence which does a TLB invalidation after unbind is
done on the hardware, signals on TLB invalidation completion, and this
fence is installed in the BO dma-resv slot and installed in out-syncs
for the unbind operation.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Suggested-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com
Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |  2 +
 drivers/gpu/drm/xe/xe_gt_types.h            |  9 ++
 drivers/gpu/drm/xe/xe_pt.c                  | 96 +++++++++++++++++++++
 3 files changed, 107 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 0058a155eeb9..23094d364583 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -18,6 +18,8 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 {
 	gt->tlb_invalidation.seqno = 1;
 	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
+	spin_lock_init(&gt->tlb_invalidation.lock);
+	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index a755e3a86552..3b2d9842add7 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -174,6 +174,15 @@ struct xe_gt {
 		 * invaliations, protected by CT lock
 		 */
 		struct list_head pending_fences;
+		/** @fence_context: context for TLB invalidation fences */
+		u64 fence_context;
+		/**
+		 * @fence_seqno: seqno to TLB invalidation fences, protected by
+		 * tlb_invalidation.lock
+		 */
+		u32 fence_seqno;
+		/** @lock: protects TLB invalidation fences */
+		spinlock_t lock;
 	} tlb_invalidation;
 
 	/** @usm: unified shared memory state */
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 3c0cea02279c..3a1a7145effc 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -8,6 +8,7 @@
 #include "xe_bo.h"
 #include "xe_device.h"
 #include "xe_gt.h"
+#include "xe_gt_tlb_invalidation.h"
 #include "xe_migrate.h"
 #include "xe_pt.h"
 #include "xe_pt_types.h"
@@ -1465,6 +1466,83 @@ static const struct xe_migrate_pt_update_ops userptr_unbind_ops = {
 	.pre_commit = xe_pt_userptr_pre_commit,
 };
 
+struct invalidation_fence {
+	struct xe_gt_tlb_invalidation_fence base;
+	struct xe_gt *gt;
+	struct dma_fence *fence;
+	struct dma_fence_cb cb;
+	struct work_struct work;
+};
+
+static const char *
+invalidation_fence_get_driver_name(struct dma_fence *dma_fence)
+{
+	return "xe";
+}
+
+static const char *
+invalidation_fence_get_timeline_name(struct dma_fence *dma_fence)
+{
+	return "invalidation_fence";
+}
+
+static const struct dma_fence_ops invalidation_fence_ops = {
+	.get_driver_name = invalidation_fence_get_driver_name,
+	.get_timeline_name = invalidation_fence_get_timeline_name,
+};
+
+static void invalidation_fence_cb(struct dma_fence *fence,
+				  struct dma_fence_cb *cb)
+{
+	struct invalidation_fence *ifence =
+		container_of(cb, struct invalidation_fence, cb);
+
+	queue_work(system_wq, &ifence->work);
+	dma_fence_put(ifence->fence);
+}
+
+static void invalidation_fence_work_func(struct work_struct *w)
+{
+	struct invalidation_fence *ifence =
+		container_of(w, struct invalidation_fence, work);
+
+	xe_gt_tlb_invalidation(ifence->gt, &ifence->base);
+}
+
+static int invalidation_fence_init(struct xe_gt *gt,
+				   struct invalidation_fence *ifence,
+				   struct dma_fence *fence)
+{
+	int ret;
+
+	spin_lock_irq(&gt->tlb_invalidation.lock);
+	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
+		       &gt->tlb_invalidation.lock,
+		       gt->tlb_invalidation.fence_context,
+		       ++gt->tlb_invalidation.fence_seqno);
+	spin_unlock_irq(&gt->tlb_invalidation.lock);
+
+	INIT_LIST_HEAD(&ifence->base.link);
+
+	dma_fence_get(&ifence->base.base);	/* Ref for caller */
+	ifence->fence = fence;
+	ifence->gt = gt;
+
+	INIT_WORK(&ifence->work, invalidation_fence_work_func);
+	ret = dma_fence_add_callback(fence, &ifence->cb, invalidation_fence_cb);
+	if (ret == -ENOENT) {
+		dma_fence_put(ifence->fence);	/* Usually dropped in CB */
+		invalidation_fence_work_func(&ifence->work);
+	} else if (ret) {
+		dma_fence_put(&ifence->base.base);	/* Caller ref */
+		dma_fence_put(&ifence->base.base);	/* Creation ref */
+	}
+
+	XE_WARN_ON(ret && ret != -ENOENT);
+
+	return ret && ret != -ENOENT ? ret : 0;
+}
+
 /**
  * __xe_pt_unbind_vma() - Disconnect and free a page-table tree for the vma
  * address range.
@@ -1500,6 +1578,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 	struct xe_vm *vm = vma->vm;
 	u32 num_entries;
 	struct dma_fence *fence = NULL;
+	struct invalidation_fence *ifence;
 	LLIST_HEAD(deferred);
 
 	xe_bo_assert_held(vma->bo);
@@ -1515,6 +1594,10 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 
 	xe_vm_dbg_print_entries(gt_to_xe(gt), entries, num_entries);
 
+	ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
+	if (!ifence)
+		return ERR_PTR(-ENOMEM);
+
 	/*
 	 * Even if we were already evicted and unbind to destroy, we need to
 	 * clear again here. The eviction may have updated pagetables at a
@@ -1527,6 +1610,17 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 					   syncs, num_syncs,
 					   &unbind_pt_update.base);
 	if (!IS_ERR(fence)) {
+		int err;
+
+		/* TLB invalidation must be done before signaling unbind */
+		err = invalidation_fence_init(gt, ifence, fence);
+		if (err) {
+			dma_fence_put(fence);
+			kfree(ifence);
+			return ERR_PTR(err);
+		}
+		fence = &ifence->base.base;
+
 		/* add shared fence now for pagetable delayed destroy */
 		dma_resv_add_fence(&vm->resv, fence,
 				   DMA_RESV_USAGE_BOOKKEEP);
@@ -1538,6 +1632,8 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 		xe_pt_commit_unbind(vma, entries, num_entries,
 				    unbind_pt_update.locked ? &deferred : NULL);
 		vma->gt_present &= ~BIT(gt->info.id);
+	} else {
+		kfree(ifence);
 	}
 
 	if (!vma->gt_present)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 06/22] drm/xe: Kernel doc GT TLB invalidations
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (4 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 05/22] drm/xe: Invalidate TLB after unbind is complete Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-13 23:21   ` Matt Roper
  2023-02-03 20:23 ` [Intel-xe] [PATCH 07/22] drm/xe: Add TLB invalidation fence ftrace Rodrigo Vivi
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

Document all exported functions.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 52 ++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 23094d364583..1cb4d3a6bc57 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -14,6 +14,15 @@ guc_to_gt(struct xe_guc *guc)
 	return container_of(guc, struct xe_gt, uc.guc);
 }
 
+/**
+ * xe_gt_tlb_invalidation_init - Initialize GT TLB invalidation state
+ * @gt: graphics tile
+ *
+ * Initialize GT TLB invalidation state, purely software initialization, should
+ * be called once during driver load.
+ *
+ * Return: 0 on success, negative error code on error.
+ */
 int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 {
 	gt->tlb_invalidation.seqno = 1;
@@ -24,7 +33,13 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 	return 0;
 }
 
-void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
+/**
+ * xe_gt_tlb_invalidation_reset - Initialize GT TLB invalidation reset
+ * @gt: graphics tile
+ *
+ * Signal any pending invalidation fences, should be called during a GT reset
+ */
+ void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
 {
 	struct xe_gt_tlb_invalidation_fence *fence, *next;
 
@@ -82,6 +97,19 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 	return ret;
 }
 
+/**
+ * xe_gt_tlb_invalidation - Issue a TLB invalidation on this GT
+ * @gt: graphics tile
+ * @fence: invalidation fence which will be signal on TLB invalidation
+ * completion, can be NULL
+ *
+ * Issue a full TLB invalidation on the GT. Completion of TLB is asynchronous
+ * and caller can either use the invalidation fence or seqno +
+ * xe_gt_tlb_invalidation_wait to wait for completion.
+ *
+ * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success,
+ * negative error code on error.
+ */
 int xe_gt_tlb_invalidation(struct xe_gt *gt,
 			   struct xe_gt_tlb_invalidation_fence *fence)
 {
@@ -100,6 +128,16 @@ static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
 	return false;
 }
 
+/**
+ * xe_gt_tlb_invalidation_wait - Wait for TLB to complete
+ * @gt: graphics tile
+ * @seqno: seqno to wait which was returned from xe_gt_tlb_invalidation
+ *
+ * Wait for 200ms for a TLB invalidation to complete, in practice we always
+ * should receive the TLB invalidation within 200ms.
+ *
+ * Return: 0 on success, -ETIME on TLB invalidation timeout
+ */
 int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
 {
 	struct xe_device *xe = gt_to_xe(gt);
@@ -122,6 +160,18 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
 	return 0;
 }
 
+/**
+ * xe_guc_tlb_invalidation_done_handler - TLB invalidation done handler
+ * @guc: guc
+ * @msg: message indicating TLB invalidation done
+ * @len: length of message
+ *
+ * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
+ * invalidation fences for seqno. Algorithm for this depends on seqno being
+ * received in-order and asserts this assumption.
+ *
+ * Return: 0 on success, -EPROTO for malformed messages.
+ */
 int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 {
 	struct xe_gt *gt = guc_to_gt(guc);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 07/22] drm/xe: Add TLB invalidation fence ftrace
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (5 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 06/22] drm/xe: Kernel doc GT TLB invalidations Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 08/22] drm/xe: Fix build for CONFIG_DRM_XE_DEBUG Rodrigo Vivi
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

This will help debug issues with TLB invalidation fences.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |  5 +++
 drivers/gpu/drm/xe/xe_pt.c                  |  5 +++
 drivers/gpu/drm/xe/xe_trace.h               | 50 +++++++++++++++++++++
 3 files changed, 60 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 1cb4d3a6bc57..4d179357ce65 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -7,6 +7,7 @@
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_guc.h"
 #include "xe_guc_ct.h"
+#include "xe_trace.h"
 
 static struct xe_gt *
 guc_to_gt(struct xe_guc *guc)
@@ -82,6 +83,7 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 		fence->seqno = seqno;
 		list_add_tail(&fence->link,
 			      &gt->tlb_invalidation.pending_fences);
+		trace_xe_gt_tlb_invalidation_fence_send(fence);
 	}
 	action[1] = seqno;
 	gt->tlb_invalidation.seqno = (gt->tlb_invalidation.seqno + 1) %
@@ -194,7 +196,10 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 
 	fence = list_first_entry_or_null(&gt->tlb_invalidation.pending_fences,
 					 typeof(*fence), link);
+	if (fence)
+		trace_xe_gt_tlb_invalidation_fence_recv(fence);
 	if (fence && tlb_invalidation_seqno_past(gt, fence->seqno)) {
+		trace_xe_gt_tlb_invalidation_fence_signal(fence);
 		list_del(&fence->link);
 		dma_fence_signal(&fence->base);
 		dma_fence_put(&fence->base);
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 3a1a7145effc..2e33d9eaf550 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -14,6 +14,7 @@
 #include "xe_pt_types.h"
 #include "xe_vm.h"
 #include "xe_res_cursor.h"
+#include "xe_trace.h"
 #include "xe_ttm_stolen_mgr.h"
 
 struct xe_pt_dir {
@@ -1497,6 +1498,7 @@ static void invalidation_fence_cb(struct dma_fence *fence,
 	struct invalidation_fence *ifence =
 		container_of(cb, struct invalidation_fence, cb);
 
+	trace_xe_gt_tlb_invalidation_fence_cb(&ifence->base);
 	queue_work(system_wq, &ifence->work);
 	dma_fence_put(ifence->fence);
 }
@@ -1506,6 +1508,7 @@ static void invalidation_fence_work_func(struct work_struct *w)
 	struct invalidation_fence *ifence =
 		container_of(w, struct invalidation_fence, work);
 
+	trace_xe_gt_tlb_invalidation_fence_work_func(&ifence->base);
 	xe_gt_tlb_invalidation(ifence->gt, &ifence->base);
 }
 
@@ -1515,6 +1518,8 @@ static int invalidation_fence_init(struct xe_gt *gt,
 {
 	int ret;
 
+	trace_xe_gt_tlb_invalidation_fence_create(&ifence->base);
+
 	spin_lock_irq(&gt->tlb_invalidation.lock);
 	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
 		       &gt->tlb_invalidation.lock,
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index a00d4b210c3b..373b0825ec79 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -15,10 +15,60 @@
 #include "xe_bo_types.h"
 #include "xe_engine_types.h"
 #include "xe_gt_types.h"
+#include "xe_gt_tlb_invalidation_types.h"
 #include "xe_guc_engine_types.h"
 #include "xe_sched_job.h"
 #include "xe_vm_types.h"
 
+DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
+		    TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
+		    TP_ARGS(fence),
+
+		    TP_STRUCT__entry(
+			     __field(u64, fence)
+			     __field(int, seqno)
+			     ),
+
+		    TP_fast_assign(
+			   __entry->fence = (u64)fence;
+			   __entry->seqno = fence->seqno;
+			   ),
+
+		    TP_printk("fence=0x%016llx, seqno=%d",
+			      __entry->fence, __entry->seqno)
+);
+
+DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_create,
+	     TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
+	     TP_ARGS(fence)
+);
+
+DEFINE_EVENT(xe_gt_tlb_invalidation_fence,
+	     xe_gt_tlb_invalidation_fence_work_func,
+	     TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
+	     TP_ARGS(fence)
+);
+
+DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_cb,
+	     TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
+	     TP_ARGS(fence)
+);
+
+DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_send,
+	     TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
+	     TP_ARGS(fence)
+);
+
+DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_recv,
+	     TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
+	     TP_ARGS(fence)
+);
+
+DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_signal,
+	     TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
+	     TP_ARGS(fence)
+);
+
 DECLARE_EVENT_CLASS(xe_bo,
 		    TP_PROTO(struct xe_bo *bo),
 		    TP_ARGS(bo),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 08/22] drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (6 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 07/22] drm/xe: Add TLB invalidation fence ftrace Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-13 23:22   ` Matt Roper
  2023-02-03 20:23 ` [Intel-xe] [PATCH 09/22] drm/xe: Add TDR for invalidation fence timeout cleanup Rodrigo Vivi
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

GT TLB invalidation functions are now in header
xe_gt_tlb_invalidation.h, include that file in xe_gt_debugfs.c if
CONFIG_DRM_XE_DEBUG is set.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_debugfs.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c
index 30058c6100ab..946398f08bb5 100644
--- a/drivers/gpu/drm/xe/xe_gt_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c
@@ -11,12 +11,15 @@
 #include "xe_gt.h"
 #include "xe_gt_debugfs.h"
 #include "xe_gt_mcr.h"
-#include "xe_gt_pagefault.h"
 #include "xe_gt_topology.h"
 #include "xe_hw_engine.h"
 #include "xe_macros.h"
 #include "xe_uc_debugfs.h"
 
+#ifdef CONFIG_DRM_XE_DEBUG
+#include "xe_gt_tlb_invalidation.h"
+#endif
+
 static struct xe_gt *node_to_gt(struct drm_info_node *node)
 {
 	return node->info_ent->data;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 09/22] drm/xe: Add TDR for invalidation fence timeout cleanup
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (7 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 08/22] drm/xe: Fix build for CONFIG_DRM_XE_DEBUG Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 10/22] drm/xe: Only set VM->asid for platforms that support a ASID Rodrigo Vivi
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

Endless fences are not good, add a TDR to cleanup any invalidation
fences which have not received an invalidation message within a timeout
period.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c   | 58 +++++++++++++++++--
 .../gpu/drm/xe/xe_gt_tlb_invalidation_types.h |  2 +
 drivers/gpu/drm/xe/xe_gt_types.h              |  5 ++
 drivers/gpu/drm/xe/xe_trace.h                 |  5 ++
 4 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 4d179357ce65..9e026fd0a45d 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -9,12 +9,45 @@
 #include "xe_guc_ct.h"
 #include "xe_trace.h"
 
+#define TLB_TIMEOUT	(HZ / 4)
+
 static struct xe_gt *
 guc_to_gt(struct xe_guc *guc)
 {
 	return container_of(guc, struct xe_gt, uc.guc);
 }
 
+static void xe_gt_tlb_fence_timeout(struct work_struct *work)
+{
+	struct xe_gt *gt = container_of(work, struct xe_gt,
+					tlb_invalidation.fence_tdr.work);
+	struct xe_gt_tlb_invalidation_fence *fence, *next;
+
+	mutex_lock(&gt->uc.guc.ct.lock);
+	list_for_each_entry_safe(fence, next,
+				 &gt->tlb_invalidation.pending_fences, link) {
+		s64 since_inval_ms = ktime_ms_delta(ktime_get(),
+						    fence->invalidation_time);
+
+		if (msecs_to_jiffies(since_inval_ms) < TLB_TIMEOUT)
+			break;
+
+		trace_xe_gt_tlb_invalidation_fence_timeout(fence);
+		drm_err(&gt_to_xe(gt)->drm, "TLB invalidation fence timeout, seqno=%d",
+			fence->seqno);
+
+		list_del(&fence->link);
+		fence->base.error = -ETIME;
+		dma_fence_signal(&fence->base);
+		dma_fence_put(&fence->base);
+	}
+	if (!list_empty(&gt->tlb_invalidation.pending_fences))
+		queue_delayed_work(system_wq,
+				   &gt->tlb_invalidation.fence_tdr,
+				   TLB_TIMEOUT);
+	mutex_unlock(&gt->uc.guc.ct.lock);
+}
+
 /**
  * xe_gt_tlb_invalidation_init - Initialize GT TLB invalidation state
  * @gt: graphics tile
@@ -30,6 +63,8 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 	INIT_LIST_HEAD(&gt->tlb_invalidation.pending_fences);
 	spin_lock_init(&gt->tlb_invalidation.lock);
 	gt->tlb_invalidation.fence_context = dma_fence_context_alloc(1);
+	INIT_DELAYED_WORK(&gt->tlb_invalidation.fence_tdr,
+			  xe_gt_tlb_fence_timeout);
 
 	return 0;
 }
@@ -44,6 +79,8 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 {
 	struct xe_gt_tlb_invalidation_fence *fence, *next;
 
+	cancel_delayed_work(&gt->tlb_invalidation.fence_tdr);
+
 	mutex_lock(&gt->uc.guc.ct.lock);
 	list_for_each_entry_safe(fence, next,
 				 &gt->tlb_invalidation.pending_fences, link) {
@@ -67,6 +104,7 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 	};
 	int seqno;
 	int ret;
+	bool queue_work;
 
 	/*
 	 * XXX: The seqno algorithm relies on TLB invalidation being processed
@@ -76,10 +114,7 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 	mutex_lock(&guc->ct.lock);
 	seqno = gt->tlb_invalidation.seqno;
 	if (fence) {
-		/*
-		 * FIXME: How to deal TLB invalidation timeout, right now we
-		 * just have an endless fence which isn't ideal.
-		 */
+		queue_work = list_empty(&gt->tlb_invalidation.pending_fences);
 		fence->seqno = seqno;
 		list_add_tail(&fence->link,
 			      &gt->tlb_invalidation.pending_fences);
@@ -92,6 +127,13 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 		gt->tlb_invalidation.seqno = 1;
 	ret = xe_guc_ct_send_locked(&guc->ct, action, ARRAY_SIZE(action),
 				    G2H_LEN_DW_TLB_INVALIDATE, 1);
+	if (!ret && fence) {
+		fence->invalidation_time = ktime_get();
+		if (queue_work)
+			queue_delayed_work(system_wq,
+					   &gt->tlb_invalidation.fence_tdr,
+					   TLB_TIMEOUT);
+	}
 	if (!ret)
 		ret = seqno;
 	mutex_unlock(&guc->ct.lock);
@@ -152,7 +194,7 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
 	 */
 	ret = wait_event_timeout(guc->ct.wq,
 				 tlb_invalidation_seqno_past(gt, seqno),
-				 HZ / 5);
+				 TLB_TIMEOUT);
 	if (!ret) {
 		drm_err(&xe->drm, "TLB invalidation time'd out, seqno=%d, recv=%d\n",
 			seqno, gt->tlb_invalidation.seqno_recv);
@@ -201,6 +243,12 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	if (fence && tlb_invalidation_seqno_past(gt, fence->seqno)) {
 		trace_xe_gt_tlb_invalidation_fence_signal(fence);
 		list_del(&fence->link);
+		if (!list_empty(&gt->tlb_invalidation.pending_fences))
+			mod_delayed_work(system_wq,
+					 &gt->tlb_invalidation.fence_tdr,
+					 TLB_TIMEOUT);
+		else
+			cancel_delayed_work(&gt->tlb_invalidation.fence_tdr);
 		dma_fence_signal(&fence->base);
 		dma_fence_put(&fence->base);
 	}
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
index ab57c14c6d14..934c828efe31 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
@@ -21,6 +21,8 @@ struct xe_gt_tlb_invalidation_fence {
 	struct list_head link;
 	/** @seqno: seqno of TLB invalidation to signal fence one */
 	int seqno;
+	/** @invalidation_time: time of TLB invalidation */
+	ktime_t invalidation_time;
 };
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 3b2d9842add7..a40fab262ac9 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -174,6 +174,11 @@ struct xe_gt {
 		 * invaliations, protected by CT lock
 		 */
 		struct list_head pending_fences;
+		/**
+		 * @fence_tdr: schedules a delayed call to
+		 * xe_gt_tlb_fence_timeout after the timeut interval is over.
+		 */
+		struct delayed_work fence_tdr;
 		/** @fence_context: context for TLB invalidation fences */
 		u64 fence_context;
 		/**
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 373b0825ec79..1774658b18b7 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -69,6 +69,11 @@ DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_signal,
 	     TP_ARGS(fence)
 );
 
+DEFINE_EVENT(xe_gt_tlb_invalidation_fence, xe_gt_tlb_invalidation_fence_timeout,
+	     TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
+	     TP_ARGS(fence)
+);
+
 DECLARE_EVENT_CLASS(xe_bo,
 		    TP_PROTO(struct xe_bo *bo),
 		    TP_ARGS(bo),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 10/22] drm/xe: Only set VM->asid for platforms that support a ASID
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (8 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 09/22] drm/xe: Add TDR for invalidation fence timeout cleanup Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 11/22] drm/xe: Delete debugfs entry to issue TLB invalidation Rodrigo Vivi
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

This will help with TLB invalidation as the ASID in TLB invalidate
should be zero for platforms that do not support a ASID.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 4c0080d081f3..2b1e79e65dbf 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1371,10 +1371,12 @@ static void vm_destroy_work_func(struct work_struct *w)
 		xe_device_mem_access_put(xe);
 		xe_pm_runtime_put(xe);
 
-		mutex_lock(&xe->usm.lock);
-		lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
-		XE_WARN_ON(lookup != vm);
-		mutex_unlock(&xe->usm.lock);
+		if (xe->info.supports_usm) {
+			mutex_lock(&xe->usm.lock);
+			lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
+			XE_WARN_ON(lookup != vm);
+			mutex_unlock(&xe->usm.lock);
+		}
 	}
 
 	/*
@@ -1859,16 +1861,18 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 		return err;
 	}
 
-	mutex_lock(&xe->usm.lock);
-	err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
-			      XA_LIMIT(0, XE_MAX_ASID - 1),
-			      &xe->usm.next_asid, GFP_KERNEL);
-	mutex_unlock(&xe->usm.lock);
-	if (err) {
-		xe_vm_close_and_put(vm);
-		return err;
+	if (xe->info.supports_usm) {
+		mutex_lock(&xe->usm.lock);
+		err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
+				      XA_LIMIT(0, XE_MAX_ASID - 1),
+				      &xe->usm.next_asid, GFP_KERNEL);
+		mutex_unlock(&xe->usm.lock);
+		if (err) {
+			xe_vm_close_and_put(vm);
+			return err;
+		}
+		vm->usm.asid = asid;
 	}
-	vm->usm.asid = asid;
 
 	args->vm_id = id;
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 11/22] drm/xe: Delete debugfs entry to issue TLB invalidation
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (9 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 10/22] drm/xe: Only set VM->asid for platforms that support a ASID Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:23 ` [Intel-xe] [PATCH 12/22] drm/xe: Add has_range_tlb_invalidation device attribute Rodrigo Vivi
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

Not used, let's remove this.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_debugfs.c | 24 ------------------------
 1 file changed, 24 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c
index 946398f08bb5..daae42d3ab3b 100644
--- a/drivers/gpu/drm/xe/xe_gt_debugfs.c
+++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c
@@ -16,10 +16,6 @@
 #include "xe_macros.h"
 #include "xe_uc_debugfs.h"
 
-#ifdef CONFIG_DRM_XE_DEBUG
-#include "xe_gt_tlb_invalidation.h"
-#endif
-
 static struct xe_gt *node_to_gt(struct drm_info_node *node)
 {
 	return node->info_ent->data;
@@ -92,32 +88,12 @@ static int steering(struct seq_file *m, void *data)
 	return 0;
 }
 
-#ifdef CONFIG_DRM_XE_DEBUG
-static int invalidate_tlb(struct seq_file *m, void *data)
-{
-	struct xe_gt *gt = node_to_gt(m->private);
-	int seqno;
-	int ret = 0;
-
-	seqno = xe_gt_tlb_invalidation(gt, NULL);
-	XE_WARN_ON(seqno < 0);
-	if (seqno > 0)
-		ret = xe_gt_tlb_invalidation_wait(gt, seqno);
-	XE_WARN_ON(ret < 0);
-
-	return 0;
-}
-#endif
-
 static const struct drm_info_list debugfs_list[] = {
 	{"hw_engines", hw_engines, 0},
 	{"force_reset", force_reset, 0},
 	{"sa_info", sa_info, 0},
 	{"topology", topology, 0},
 	{"steering", steering, 0},
-#ifdef CONFIG_DRM_XE_DEBUG
-	{"invalidate_tlb", invalidate_tlb, 0},
-#endif
 };
 
 void xe_gt_debugfs_register(struct xe_gt *gt)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 12/22] drm/xe: Add has_range_tlb_invalidation device attribute
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (10 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 11/22] drm/xe: Delete debugfs entry to issue TLB invalidation Rodrigo Vivi
@ 2023-02-03 20:23 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 13/22] drm/xe: Add range based TLB invalidations Rodrigo Vivi
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:23 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

This will help implementing range based TLB invalidations.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h | 2 ++
 drivers/gpu/drm/xe/xe_pci.c          | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index eba224236c86..6d13587bfa7b 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -92,6 +92,8 @@ struct xe_device {
 		bool has_flat_ccs;
 		/** @has_4tile: Whether tile-4 tiling is supported */
 		bool has_4tile;
+		/** @has_range_tlb_invalidation: Has range based TLB invalidations */
+		bool has_range_tlb_invalidation;
 
 		struct xe_device_display_info {
 			u8 ver;
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index a96004aa34aa..53e87b27fcde 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -77,6 +77,7 @@ struct xe_device_desc {
 	bool supports_usm;
 	bool has_flat_ccs;
 	bool has_4tile;
+	bool has_range_tlb_invalidation;
 };
 
 #define PLATFORM(x)		\
@@ -213,6 +214,7 @@ static const struct xe_device_desc dg1_desc = {
 	.require_force_probe = true, \
 	.graphics_ver = 12, \
 	.graphics_rel = 50, \
+	.has_range_tlb_invalidation = true, \
 	.has_flat_ccs = true, \
 	.dma_mask_size = 46, \
 	.max_tiles = 1, \
@@ -332,6 +334,7 @@ static const struct xe_device_desc mtl_desc = {
 	.max_tiles = 2,
 	.vm_max_level = 3,
 	.media_ver = 13,
+	.has_range_tlb_invalidation = true,
 	PLATFORM(XE_METEORLAKE),
 	.extra_gts = xelpmp_gts,
 	.platform_engine_mask = MTL_MAIN_ENGINES,
@@ -496,6 +499,7 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	xe->info.has_flat_ccs = desc->has_flat_ccs;
 	xe->info.has_4tile = desc->has_4tile;
 	xe->info.display = desc->display;
+	xe->info.has_range_tlb_invalidation = desc->has_range_tlb_invalidation;
 
 	spd = subplatform_get(xe, desc);
 	xe->info.subplatform = spd ? spd->subplatform : XE_SUBPLATFORM_NONE;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 13/22] drm/xe: Add range based TLB invalidations
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (11 preceding siblings ...)
  2023-02-03 20:23 ` [Intel-xe] [PATCH 12/22] drm/xe: Add has_range_tlb_invalidation device attribute Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 14/22] drm/xe: Propagate error from bind operations to async fence Rodrigo Vivi
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

If the platform supports range based TLB invalidations use them. Hide
these details in the xe_gt_tlb_invalidation layer.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c        |  7 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 87 +++++++++++++++++----
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h |  4 +-
 drivers/gpu/drm/xe/xe_pt.c                  |  9 ++-
 drivers/gpu/drm/xe/xe_vm.c                  |  2 +-
 5 files changed, 84 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 705093cb63d7..e1a5a3a70c92 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -240,12 +240,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 		goto retry_userptr;
 
 	if (!ret) {
-		/*
-		 * FIXME: Doing a full TLB invalidation for now, likely could
-		 * defer TLB invalidate + fault response to a callback of fence
-		 * too
-		 */
-		ret = xe_gt_tlb_invalidation(gt, NULL);
+		ret = xe_gt_tlb_invalidation(gt, NULL, vma);
 		if (ret >= 0)
 			ret = 0;
 	}
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 9e026fd0a45d..0b37cd09a59a 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -92,16 +92,10 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 }
 
 static int send_tlb_invalidation(struct xe_guc *guc,
-				 struct xe_gt_tlb_invalidation_fence *fence)
+				 struct xe_gt_tlb_invalidation_fence *fence,
+				 u32 *action, int len)
 {
 	struct xe_gt *gt = guc_to_gt(guc);
-	u32 action[] = {
-		XE_GUC_ACTION_TLB_INVALIDATION,
-		0,
-		XE_GUC_TLB_INVAL_FULL << XE_GUC_TLB_INVAL_TYPE_SHIFT |
-		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT |
-		XE_GUC_TLB_INVAL_FLUSH_CACHE,
-	};
 	int seqno;
 	int ret;
 	bool queue_work;
@@ -125,7 +119,7 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 		TLB_INVALIDATION_SEQNO_MAX;
 	if (!gt->tlb_invalidation.seqno)
 		gt->tlb_invalidation.seqno = 1;
-	ret = xe_guc_ct_send_locked(&guc->ct, action, ARRAY_SIZE(action),
+	ret = xe_guc_ct_send_locked(&guc->ct, action, len,
 				    G2H_LEN_DW_TLB_INVALIDATE, 1);
 	if (!ret && fence) {
 		fence->invalidation_time = ktime_get();
@@ -146,18 +140,83 @@ static int send_tlb_invalidation(struct xe_guc *guc,
  * @gt: graphics tile
  * @fence: invalidation fence which will be signal on TLB invalidation
  * completion, can be NULL
+ * @vma: VMA to invalidate
  *
- * Issue a full TLB invalidation on the GT. Completion of TLB is asynchronous
- * and caller can either use the invalidation fence or seqno +
- * xe_gt_tlb_invalidation_wait to wait for completion.
+ * Issue a range based TLB invalidation if supported, if not fallback to a full
+ * TLB invalidation. Completion of TLB is asynchronous and caller can either use
+ * the invalidation fence or seqno + xe_gt_tlb_invalidation_wait to wait for
+ * completion.
  *
  * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success,
  * negative error code on error.
  */
 int xe_gt_tlb_invalidation(struct xe_gt *gt,
-			   struct xe_gt_tlb_invalidation_fence *fence)
+			   struct xe_gt_tlb_invalidation_fence *fence,
+			   struct xe_vma *vma)
 {
-	return send_tlb_invalidation(&gt->uc.guc, fence);
+	struct xe_device *xe = gt_to_xe(gt);
+#define MAX_TLB_INVALIDATION_LEN	7
+	u32 action[MAX_TLB_INVALIDATION_LEN];
+	int len = 0;
+
+	XE_BUG_ON(!vma);
+
+	if (!xe->info.has_range_tlb_invalidation) {
+		action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
+		action[len++] = 0; /* seqno, replaced in send_tlb_invalidation */
+#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
+		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
+		XE_GUC_TLB_INVAL_FLUSH_CACHE)
+		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
+	} else {
+		u64 start = vma->start;
+		u64 length = vma->end - vma->start + 1;
+		u64 align, end;
+
+		if (length < SZ_4K)
+			length = SZ_4K;
+
+		/*
+		 * We need to invalidate a higher granularity if start address
+		 * is not aligned to length. When start is not aligned with
+		 * length we need to find the length large enough to create an
+		 * address mask covering the required range.
+		 */
+		align = roundup_pow_of_two(length);
+		start = ALIGN_DOWN(vma->start, align);
+		end = ALIGN(vma->start + length, align);
+		length = align;
+		while (start + length < end) {
+			length <<= 1;
+			start = ALIGN_DOWN(vma->start, length);
+		}
+
+		/*
+		 * Minimum invalidation size for a 2MB page that the hardware
+		 * expects is 16MB
+		 */
+		if (length >= SZ_2M) {
+			length = max_t(u64, SZ_16M, length);
+			start = ALIGN_DOWN(vma->start, length);
+		}
+
+		XE_BUG_ON(length < SZ_4K);
+		XE_BUG_ON(!is_power_of_2(length));
+		XE_BUG_ON(length & GENMASK(ilog2(SZ_16M) - 1, ilog2(SZ_2M) + 1));
+		XE_BUG_ON(!IS_ALIGNED(start, length));
+
+		action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
+		action[len++] = 0; /* seqno, replaced in send_tlb_invalidation */
+		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
+		action[len++] = vma->vm->usm.asid;
+		action[len++] = lower_32_bits(start);
+		action[len++] = upper_32_bits(start);
+		action[len++] = ilog2(length) - ilog2(SZ_4K);
+	}
+
+	XE_BUG_ON(len > MAX_TLB_INVALIDATION_LEN);
+
+	return send_tlb_invalidation(&gt->uc.guc, fence, action, len);
 }
 
 static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
index 7e6fbf46f0e3..b4c4f717bc8a 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
@@ -12,11 +12,13 @@
 
 struct xe_gt;
 struct xe_guc;
+struct xe_vma;
 
 int xe_gt_tlb_invalidation_init(struct xe_gt *gt);
 void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
 int xe_gt_tlb_invalidation(struct xe_gt *gt,
-			   struct xe_gt_tlb_invalidation_fence *fence);
+			   struct xe_gt_tlb_invalidation_fence *fence,
+			   struct xe_vma *vma);
 int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno);
 int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 2e33d9eaf550..5a3a0ca224e9 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1470,6 +1470,7 @@ static const struct xe_migrate_pt_update_ops userptr_unbind_ops = {
 struct invalidation_fence {
 	struct xe_gt_tlb_invalidation_fence base;
 	struct xe_gt *gt;
+	struct xe_vma *vma;
 	struct dma_fence *fence;
 	struct dma_fence_cb cb;
 	struct work_struct work;
@@ -1509,12 +1510,13 @@ static void invalidation_fence_work_func(struct work_struct *w)
 		container_of(w, struct invalidation_fence, work);
 
 	trace_xe_gt_tlb_invalidation_fence_work_func(&ifence->base);
-	xe_gt_tlb_invalidation(ifence->gt, &ifence->base);
+	xe_gt_tlb_invalidation(ifence->gt, &ifence->base, ifence->vma);
 }
 
 static int invalidation_fence_init(struct xe_gt *gt,
 				   struct invalidation_fence *ifence,
-				   struct dma_fence *fence)
+				   struct dma_fence *fence,
+				   struct xe_vma *vma)
 {
 	int ret;
 
@@ -1532,6 +1534,7 @@ static int invalidation_fence_init(struct xe_gt *gt,
 	dma_fence_get(&ifence->base.base);	/* Ref for caller */
 	ifence->fence = fence;
 	ifence->gt = gt;
+	ifence->vma = vma;
 
 	INIT_WORK(&ifence->work, invalidation_fence_work_func);
 	ret = dma_fence_add_callback(fence, &ifence->cb, invalidation_fence_cb);
@@ -1618,7 +1621,7 @@ __xe_pt_unbind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 		int err;
 
 		/* TLB invalidation must be done before signaling unbind */
-		err = invalidation_fence_init(gt, ifence, fence);
+		err = invalidation_fence_init(gt, ifence, fence, vma);
 		if (err) {
 			dma_fence_put(fence);
 			kfree(ifence);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 2b1e79e65dbf..ca2f7d084ceb 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3325,7 +3325,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 		if (xe_pt_zap_ptes(gt, vma)) {
 			gt_needs_invalidate |= BIT(id);
 			xe_device_wmb(xe);
-			seqno[id] = xe_gt_tlb_invalidation(gt, NULL);
+			seqno[id] = xe_gt_tlb_invalidation(gt, NULL, vma);
 			if (seqno[id] < 0)
 				return seqno[id];
 		}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 14/22] drm/xe: Propagate error from bind operations to async fence
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (12 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 13/22] drm/xe: Add range based TLB invalidations Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 15/22] drm/xe: Use GuC to do GGTT invalidations for the GuC firmware Rodrigo Vivi
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

If an bind operation fails we need to report it via the async fence.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index ca2f7d084ceb..1bc680cdc249 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1583,6 +1583,7 @@ xe_vm_bind_vma(struct xe_vma *vma, struct xe_engine *e,
 
 struct async_op_fence {
 	struct dma_fence fence;
+	struct dma_fence *wait_fence;
 	struct dma_fence_cb cb;
 	struct xe_vm *vm;
 	wait_queue_head_t wq;
@@ -1610,8 +1611,10 @@ static void async_op_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
 	struct async_op_fence *afence =
 		container_of(cb, struct async_op_fence, cb);
 
+	afence->fence.error = afence->wait_fence->error;
 	dma_fence_signal(&afence->fence);
 	xe_vm_put(afence->vm);
+	dma_fence_put(afence->wait_fence);
 	dma_fence_put(&afence->fence);
 }
 
@@ -1627,13 +1630,17 @@ static void add_async_op_fence_cb(struct xe_vm *vm,
 		wake_up_all(&afence->wq);
 	}
 
+	afence->wait_fence = dma_fence_get(fence);
 	afence->vm = xe_vm_get(vm);
 	dma_fence_get(&afence->fence);
 	ret = dma_fence_add_callback(fence, &afence->cb, async_op_fence_cb);
-	if (ret == -ENOENT)
+	if (ret == -ENOENT) {
+		afence->fence.error = afence->wait_fence->error;
 		dma_fence_signal(&afence->fence);
+	}
 	if (ret) {
 		xe_vm_put(vm);
+		dma_fence_put(afence->wait_fence);
 		dma_fence_put(&afence->fence);
 	}
 	XE_WARN_ON(ret && ret != -ENOENT);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 15/22] drm/xe: Use GuC to do GGTT invalidations for the GuC firmware
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (13 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 14/22] drm/xe: Propagate error from bind operations to async fence Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 16/22] drm/xe: Coalesce GGTT invalidations Rodrigo Vivi
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

Only the GuC should be issuing TLB invalidations if it is enabled. Part
of this patch is sanitize the device on driver unload to ensure we do
not send GuC based TLB invalidations during driver unload.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c              | 14 +++++++
 drivers/gpu/drm/xe/xe_ggtt.c                | 12 +++++-
 drivers/gpu/drm/xe/xe_gt.c                  | 13 +++++++
 drivers/gpu/drm/xe/xe_gt.h                  |  1 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c        |  2 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 43 +++++++++++++++------
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h |  7 ++--
 drivers/gpu/drm/xe/xe_guc.c                 |  2 +
 drivers/gpu/drm/xe/xe_guc_types.h           |  2 +
 drivers/gpu/drm/xe/xe_pt.c                  |  2 +-
 drivers/gpu/drm/xe/xe_uc.c                  |  9 ++++-
 drivers/gpu/drm/xe/xe_uc.h                  |  1 +
 drivers/gpu/drm/xe/xe_vm.c                  |  2 +-
 13 files changed, 89 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index d7b2b41bc7a8..62f54b4806dc 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -380,6 +380,16 @@ static void xe_device_unlink_display(struct xe_device *xe)
 #endif
 }
 
+static void xe_device_sanitize(struct drm_device *drm, void *arg)
+{
+	struct xe_device *xe = arg;
+	struct xe_gt *gt;
+	u8 id;
+
+	for_each_gt(gt, xe, id)
+		xe_gt_sanitize(gt);
+}
+
 int xe_device_probe(struct xe_device *xe)
 {
 	struct xe_gt *gt;
@@ -466,6 +476,10 @@ int xe_device_probe(struct xe_device *xe)
 
 	xe_debugfs_register(xe);
 
+	err = drmm_add_action_or_reset(&xe->drm, xe_device_sanitize, xe);
+	if (err)
+		return err;
+
 	return 0;
 
 err_fini_display:
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index baa080cd1133..20450ed8400b 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -13,6 +13,7 @@
 #include "xe_device.h"
 #include "xe_bo.h"
 #include "xe_gt.h"
+#include "xe_gt_tlb_invalidation.h"
 #include "xe_map.h"
 #include "xe_mmio.h"
 #include "xe_wopcm.h"
@@ -200,10 +201,17 @@ void xe_ggtt_invalidate(struct xe_gt *gt)
 	 * therefore flushing WC buffers.  Is that really true here?
 	 */
 	xe_mmio_write32(gt, GFX_FLSH_CNTL_GEN6.reg, GFX_FLSH_CNTL_EN);
-	if (xe_device_guc_submission_enabled(gt_to_xe(gt))) {
+
+	if (gt->uc.guc.submission_state.enabled) {
+		int seqno;
+
+		seqno = xe_gt_tlb_invalidation_guc(gt);
+		XE_WARN_ON(seqno <= 0);
+		if (seqno > 0)
+			xe_gt_tlb_invalidation_wait(gt, seqno);
+	} else if (xe_device_guc_submission_enabled(gt_to_xe(gt))) {
 		struct xe_device *xe = gt_to_xe(gt);
 
-		/* TODO: also use vfunc here */
 		if (xe->info.platform == XE_PVC) {
 			xe_mmio_write32(gt, PVC_GUC_TLB_INV_DESC1.reg,
 					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 0be75f8afe4b..36b5da2c5977 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -196,6 +196,15 @@ static int gt_ttm_mgr_init(struct xe_gt *gt)
 	return 0;
 }
 
+void xe_gt_sanitize(struct xe_gt *gt)
+{
+	/*
+	 * FIXME: if xe_uc_sanitize is called here, on TGL driver will not
+	 * reload
+	 */
+	gt->uc.guc.submission_state.enabled = false;
+}
+
 static void gt_fini(struct drm_device *drm, void *arg)
 {
 	struct xe_gt *gt = arg;
@@ -661,6 +670,8 @@ static int gt_reset(struct xe_gt *gt)
 
 	drm_info(&xe->drm, "GT reset started\n");
 
+	xe_gt_sanitize(gt);
+
 	xe_device_mem_access_get(gt_to_xe(gt));
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (err)
@@ -741,6 +752,8 @@ int xe_gt_suspend(struct xe_gt *gt)
 	if (!xe_device_guc_submission_enabled(gt_to_xe(gt)))
 		return -ENODEV;
 
+	xe_gt_sanitize(gt);
+
 	xe_device_mem_access_get(gt_to_xe(gt));
 	err = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
 	if (err)
diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index 5dc08a993cfe..5635f2803170 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -26,6 +26,7 @@ int xe_gt_suspend(struct xe_gt *gt);
 int xe_gt_resume(struct xe_gt *gt);
 void xe_gt_reset_async(struct xe_gt *gt);
 void xe_gt_migrate_wait(struct xe_gt *gt);
+void xe_gt_sanitize(struct xe_gt *gt);
 
 struct xe_gt *xe_find_full_gt(struct xe_gt *gt);
 
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index e1a5a3a70c92..ce79eb48feb8 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -240,7 +240,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 		goto retry_userptr;
 
 	if (!ret) {
-		ret = xe_gt_tlb_invalidation(gt, NULL, vma);
+		ret = xe_gt_tlb_invalidation_vma(gt, NULL, vma);
 		if (ret >= 0)
 			ret = 0;
 	}
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 0b37cd09a59a..f6a2dd26cad4 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -135,8 +135,34 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 	return ret;
 }
 
+#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
+		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
+		XE_GUC_TLB_INVAL_FLUSH_CACHE)
+
 /**
- * xe_gt_tlb_invalidation - Issue a TLB invalidation on this GT
+ * xe_gt_tlb_invalidation_guc - Issue a TLB invalidation on this GT for the GuC
+ * @gt: graphics tile
+ *
+ * Issue a TLB invalidation for the GuC. Completion of TLB is asynchronous and
+ * caller can use seqno + xe_gt_tlb_invalidation_wait to wait for completion.
+ *
+ * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success,
+ * negative error code on error.
+ */
+int xe_gt_tlb_invalidation_guc(struct xe_gt *gt)
+{
+	u32 action[] = {
+		XE_GUC_ACTION_TLB_INVALIDATION,
+		0,  /* seqno, replaced in send_tlb_invalidation */
+		MAKE_INVAL_OP(XE_GUC_TLB_INVAL_GUC),
+	};
+
+	return send_tlb_invalidation(&gt->uc.guc, NULL, action,
+				     ARRAY_SIZE(action));
+}
+
+/**
+ * xe_gt_tlb_invalidation_vma - Issue a TLB invalidation on this GT for a VMA
  * @gt: graphics tile
  * @fence: invalidation fence which will be signal on TLB invalidation
  * completion, can be NULL
@@ -150,9 +176,9 @@ static int send_tlb_invalidation(struct xe_guc *guc,
  * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success,
  * negative error code on error.
  */
-int xe_gt_tlb_invalidation(struct xe_gt *gt,
-			   struct xe_gt_tlb_invalidation_fence *fence,
-			   struct xe_vma *vma)
+int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
+			       struct xe_gt_tlb_invalidation_fence *fence,
+			       struct xe_vma *vma)
 {
 	struct xe_device *xe = gt_to_xe(gt);
 #define MAX_TLB_INVALIDATION_LEN	7
@@ -161,12 +187,9 @@ int xe_gt_tlb_invalidation(struct xe_gt *gt,
 
 	XE_BUG_ON(!vma);
 
+	action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
+	action[len++] = 0; /* seqno, replaced in send_tlb_invalidation */
 	if (!xe->info.has_range_tlb_invalidation) {
-		action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
-		action[len++] = 0; /* seqno, replaced in send_tlb_invalidation */
-#define MAKE_INVAL_OP(type)	((type << XE_GUC_TLB_INVAL_TYPE_SHIFT) | \
-		XE_GUC_TLB_INVAL_MODE_HEAVY << XE_GUC_TLB_INVAL_MODE_SHIFT | \
-		XE_GUC_TLB_INVAL_FLUSH_CACHE)
 		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL);
 	} else {
 		u64 start = vma->start;
@@ -205,8 +228,6 @@ int xe_gt_tlb_invalidation(struct xe_gt *gt,
 		XE_BUG_ON(length & GENMASK(ilog2(SZ_16M) - 1, ilog2(SZ_2M) + 1));
 		XE_BUG_ON(!IS_ALIGNED(start, length));
 
-		action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
-		action[len++] = 0; /* seqno, replaced in send_tlb_invalidation */
 		action[len++] = MAKE_INVAL_OP(XE_GUC_TLB_INVAL_PAGE_SELECTIVE);
 		action[len++] = vma->vm->usm.asid;
 		action[len++] = lower_32_bits(start);
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
index b4c4f717bc8a..b333c1709397 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
@@ -16,9 +16,10 @@ struct xe_vma;
 
 int xe_gt_tlb_invalidation_init(struct xe_gt *gt);
 void xe_gt_tlb_invalidation_reset(struct xe_gt *gt);
-int xe_gt_tlb_invalidation(struct xe_gt *gt,
-			   struct xe_gt_tlb_invalidation_fence *fence,
-			   struct xe_vma *vma);
+int xe_gt_tlb_invalidation_guc(struct xe_gt *gt);
+int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
+			       struct xe_gt_tlb_invalidation_fence *fence,
+			       struct xe_vma *vma);
 int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno);
 int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
 
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index 88a3a96da084..5cdfdfd0de40 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -309,6 +309,7 @@ int xe_guc_init_post_hwconfig(struct xe_guc *guc)
 int xe_guc_post_load_init(struct xe_guc *guc)
 {
 	xe_guc_ads_populate_post_load(&guc->ads);
+	guc->submission_state.enabled = true;
 
 	return 0;
 }
@@ -795,6 +796,7 @@ void xe_guc_sanitize(struct xe_guc *guc)
 {
 	xe_uc_fw_change_status(&guc->fw, XE_UC_FIRMWARE_LOADABLE);
 	xe_guc_ct_disable(&guc->ct);
+	guc->submission_state.enabled = false;
 }
 
 int xe_guc_reset_prepare(struct xe_guc *guc)
diff --git a/drivers/gpu/drm/xe/xe_guc_types.h b/drivers/gpu/drm/xe/xe_guc_types.h
index c2a484282ef2..ac7eec28934d 100644
--- a/drivers/gpu/drm/xe/xe_guc_types.h
+++ b/drivers/gpu/drm/xe/xe_guc_types.h
@@ -60,6 +60,8 @@ struct xe_guc {
 			/** @patch: patch version of GuC submission */
 			u32 patch;
 		} version;
+		/** @enabled: submission is enabled */
+		bool enabled;
 	} submission_state;
 	/** @hwconfig: Hardware config state */
 	struct {
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 5a3a0ca224e9..9da5ee4b31f8 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1510,7 +1510,7 @@ static void invalidation_fence_work_func(struct work_struct *w)
 		container_of(w, struct invalidation_fence, work);
 
 	trace_xe_gt_tlb_invalidation_fence_work_func(&ifence->base);
-	xe_gt_tlb_invalidation(ifence->gt, &ifence->base, ifence->vma);
+	xe_gt_tlb_invalidation_vma(ifence->gt, &ifence->base, ifence->vma);
 }
 
 static int invalidation_fence_init(struct xe_gt *gt,
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index 938d14698003..7886c8b85397 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -88,10 +88,15 @@ static int uc_reset(struct xe_uc *uc)
 	return 0;
 }
 
-static int uc_sanitize(struct xe_uc *uc)
+void xe_uc_sanitize(struct xe_uc *uc)
 {
 	xe_huc_sanitize(&uc->huc);
 	xe_guc_sanitize(&uc->guc);
+}
+
+static int xe_uc_sanitize_reset(struct xe_uc *uc)
+{
+	xe_uc_sanitize(uc);
 
 	return uc_reset(uc);
 }
@@ -129,7 +134,7 @@ int xe_uc_init_hw(struct xe_uc *uc)
 	if (!xe_device_guc_submission_enabled(uc_to_xe(uc)))
 		return 0;
 
-	ret = uc_sanitize(uc);
+	ret = xe_uc_sanitize_reset(uc);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
index 380e722f95fc..d6efc9ef00d3 100644
--- a/drivers/gpu/drm/xe/xe_uc.h
+++ b/drivers/gpu/drm/xe/xe_uc.h
@@ -17,5 +17,6 @@ void xe_uc_stop_prepare(struct xe_uc *uc);
 int xe_uc_stop(struct xe_uc *uc);
 int xe_uc_start(struct xe_uc *uc);
 int xe_uc_suspend(struct xe_uc *uc);
+void xe_uc_sanitize(struct xe_uc *uc);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 1bc680cdc249..541629293683 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3332,7 +3332,7 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
 		if (xe_pt_zap_ptes(gt, vma)) {
 			gt_needs_invalidate |= BIT(id);
 			xe_device_wmb(xe);
-			seqno[id] = xe_gt_tlb_invalidation(gt, NULL, vma);
+			seqno[id] = xe_gt_tlb_invalidation_vma(gt, NULL, vma);
 			if (seqno[id] < 0)
 				return seqno[id];
 		}
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 16/22] drm/xe: Coalesce GGTT invalidations
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (14 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 15/22] drm/xe: Use GuC to do GGTT invalidations for the GuC firmware Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 17/22] drm/xe: Lock GGTT on when restoring kernel BOs Rodrigo Vivi
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

No need to invalidate the GGTT on every allocation / deallocation,
rather just invalidate the GGTT on an allocation after a deallocation.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_ggtt.c       | 11 +++++++++--
 drivers/gpu/drm/xe/xe_ggtt_types.h |  2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 20450ed8400b..81c7eb68bc46 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -264,15 +264,22 @@ int xe_ggtt_insert_special_node(struct xe_ggtt *ggtt, struct drm_mm_node *node,
 
 void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
 {
+	struct xe_device *xe = gt_to_xe(ggtt->gt);
 	u64 start = bo->ggtt_node.start;
 	u64 offset, pte;
 
+	lockdep_assert_held(&ggtt->lock);
+
 	for (offset = 0; offset < bo->size; offset += GEN8_PAGE_SIZE) {
 		pte = xe_ggtt_pte_encode(bo, offset);
 		xe_ggtt_set_pte(ggtt, start + offset, pte);
 	}
 
-	xe_ggtt_invalidate(ggtt->gt);
+	/* XXX: Without doing this everytime on integrated driver load fails */
+	if (ggtt->invalidate || !IS_DGFX(xe)) {
+		xe_ggtt_invalidate(ggtt->gt);
+		ggtt->invalidate = false;
+	}
 }
 
 static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
@@ -330,7 +337,7 @@ void xe_ggtt_remove_node(struct xe_ggtt *ggtt, struct drm_mm_node *node)
 	drm_mm_remove_node(node);
 	node->size = 0;
 
-	xe_ggtt_invalidate(ggtt->gt);
+	ggtt->invalidate = true;
 
 	mutex_unlock(&ggtt->lock);
 }
diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
index ea70aaef4b31..8198aa784654 100644
--- a/drivers/gpu/drm/xe/xe_ggtt_types.h
+++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
@@ -26,6 +26,8 @@ struct xe_ggtt {
 	u64 __iomem *gsm;
 
 	struct drm_mm mm;
+
+	bool invalidate;
 };
 
 #endif
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 17/22] drm/xe: Lock GGTT on when restoring kernel BOs
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (15 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 16/22] drm/xe: Coalesce GGTT invalidations Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 18/22] drm/xe: Propagate VM unbind error to invalidation fence Rodrigo Vivi
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

Make lockdep happy as we required to hold the GGTT when calling
xe_ggtt_map_bo.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_bo_evict.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c
index 7046dc203138..3fb3c8c77efa 100644
--- a/drivers/gpu/drm/xe/xe_bo_evict.c
+++ b/drivers/gpu/drm/xe/xe_bo_evict.c
@@ -147,8 +147,11 @@ int xe_bo_restore_kernel(struct xe_device *xe)
 			return ret;
 		}
 
-		if (bo->flags & XE_BO_CREATE_GGTT_BIT)
+		if (bo->flags & XE_BO_CREATE_GGTT_BIT) {
+			mutex_lock(&bo->gt->mem.ggtt->lock);
 			xe_ggtt_map_bo(bo->gt->mem.ggtt, bo);
+			mutex_unlock(&bo->gt->mem.ggtt->lock);
+		}
 
 		/*
 		 * We expect validate to trigger a move VRAM and our move code
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 18/22] drm/xe: Propagate VM unbind error to invalidation fence
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (16 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 17/22] drm/xe: Lock GGTT on when restoring kernel BOs Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 19/22] drm/xe: Signal invalidation fence immediately if CT send fails Rodrigo Vivi
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

If a VM unbind hits an error, do not issue a TLB invalidation and
propagate the error the invalidation fence.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 9da5ee4b31f8..0c2cc9d772fa 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1500,7 +1500,13 @@ static void invalidation_fence_cb(struct dma_fence *fence,
 		container_of(cb, struct invalidation_fence, cb);
 
 	trace_xe_gt_tlb_invalidation_fence_cb(&ifence->base);
-	queue_work(system_wq, &ifence->work);
+	if (!ifence->fence->error) {
+		queue_work(system_wq, &ifence->work);
+	} else {
+		ifence->base.base.error = ifence->fence->error;
+		dma_fence_signal(&ifence->base.base);
+		dma_fence_put(&ifence->base.base);
+	}
 	dma_fence_put(ifence->fence);
 }
 
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 19/22] drm/xe: Signal invalidation fence immediately if CT send fails
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (17 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 18/22] drm/xe: Propagate VM unbind error to invalidation fence Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 20/22] drm/xe: Add has_asid to device info Rodrigo Vivi
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

This means we are in the middle of a GT reset and no need to do TLB
invalidation so just signal invalidation fence immediately.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 23 +++++++++++++--------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index f6a2dd26cad4..2521c8a65690 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -69,6 +69,15 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 	return 0;
 }
 
+static void
+invalidation_fence_signal(struct xe_gt_tlb_invalidation_fence *fence)
+{
+	trace_xe_gt_tlb_invalidation_fence_signal(fence);
+	list_del(&fence->link);
+	dma_fence_signal(&fence->base);
+	dma_fence_put(&fence->base);
+}
+
 /**
  * xe_gt_tlb_invalidation_reset - Initialize GT TLB invalidation reset
  * @gt: graphics tile
@@ -83,11 +92,8 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
 
 	mutex_lock(&gt->uc.guc.ct.lock);
 	list_for_each_entry_safe(fence, next,
-				 &gt->tlb_invalidation.pending_fences, link) {
-		list_del(&fence->link);
-		dma_fence_signal(&fence->base);
-		dma_fence_put(&fence->base);
-	}
+				 &gt->tlb_invalidation.pending_fences, link)
+		invalidation_fence_signal(fence);
 	mutex_unlock(&gt->uc.guc.ct.lock);
 }
 
@@ -130,6 +136,8 @@ static int send_tlb_invalidation(struct xe_guc *guc,
 	}
 	if (!ret)
 		ret = seqno;
+	if (ret < 0 && fence)
+		invalidation_fence_signal(fence);
 	mutex_unlock(&guc->ct.lock);
 
 	return ret;
@@ -321,16 +329,13 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
 	if (fence)
 		trace_xe_gt_tlb_invalidation_fence_recv(fence);
 	if (fence && tlb_invalidation_seqno_past(gt, fence->seqno)) {
-		trace_xe_gt_tlb_invalidation_fence_signal(fence);
-		list_del(&fence->link);
+		invalidation_fence_signal(fence);
 		if (!list_empty(&gt->tlb_invalidation.pending_fences))
 			mod_delayed_work(system_wq,
 					 &gt->tlb_invalidation.fence_tdr,
 					 TLB_TIMEOUT);
 		else
 			cancel_delayed_work(&gt->tlb_invalidation.fence_tdr);
-		dma_fence_signal(&fence->base);
-		dma_fence_put(&fence->base);
 	}
 
 	return 0;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 20/22] drm/xe: Add has_asid to device info
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (18 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 19/22] drm/xe: Signal invalidation fence immediately if CT send fails Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 21/22] drm/xe: Add TLB invalidation fence after rebinds issued from execs Rodrigo Vivi
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

Rather than alias supports_usm to ASIS support, add an explicit
variable to indicate ASID support.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h | 2 ++
 drivers/gpu/drm/xe/xe_lrc.c          | 4 ++--
 drivers/gpu/drm/xe/xe_pci.c          | 3 +++
 drivers/gpu/drm/xe/xe_vm.c           | 4 ++--
 4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 6d13587bfa7b..a8d48987b2d8 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -86,6 +86,8 @@ struct xe_device {
 		u8 media_ver;
 		/** @supports_usm: Supports unified shared memory */
 		bool supports_usm;
+		/** @has_asid: Has address space ID */
+		bool has_asid;
 		/** @enable_guc: GuC submission enabled */
 		bool enable_guc;
 		/** @has_flat_ccs: Whether flat CCS metadata is used */
diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c
index 056c2c5a0b81..347ff9b34494 100644
--- a/drivers/gpu/drm/xe/xe_lrc.c
+++ b/drivers/gpu/drm/xe/xe_lrc.c
@@ -682,14 +682,14 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
 	xe_lrc_write_ctx_reg(lrc, CTX_RING_TAIL, lrc->ring.tail);
 	xe_lrc_write_ctx_reg(lrc, CTX_RING_CTL,
 			     RING_CTL_SIZE(lrc->ring.size) | RING_VALID);
-	if (xe->info.supports_usm && vm) {
+	if (xe->info.has_asid && vm)
 		xe_lrc_write_ctx_reg(lrc, PVC_CTX_ASID,
 				     (e->usm.acc_granularity <<
 				      ACC_GRANULARITY_S) | vm->usm.asid);
+	if (xe->info.supports_usm && vm)
 		xe_lrc_write_ctx_reg(lrc, PVC_CTX_ACC_CTR_THOLD,
 				     (e->usm.acc_notify << ACC_NOTIFY_S) |
 				     e->usm.acc_trigger);
-	}
 
 	lrc->desc = GEN8_CTX_VALID;
 	lrc->desc |= INTEL_LEGACY_64B_CONTEXT << GEN8_CTX_ADDRESSING_MODE_SHIFT;
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 53e87b27fcde..927a050f3b8f 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -78,6 +78,7 @@ struct xe_device_desc {
 	bool has_flat_ccs;
 	bool has_4tile;
 	bool has_range_tlb_invalidation;
+	bool has_asid;
 };
 
 #define PLATFORM(x)		\
@@ -302,6 +303,7 @@ static const struct xe_device_desc pvc_desc = {
 	.max_tiles = 2,
 	.vm_max_level = 4,
 	.supports_usm = true,
+	.has_asid = true,
 };
 
 #define MTL_MEDIA_ENGINES \
@@ -496,6 +498,7 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	xe->info.vm_max_level = desc->vm_max_level;
 	xe->info.media_ver = desc->media_ver;
 	xe->info.supports_usm = desc->supports_usm;
+	xe->info.has_asid = desc->has_asid;
 	xe->info.has_flat_ccs = desc->has_flat_ccs;
 	xe->info.has_4tile = desc->has_4tile;
 	xe->info.display = desc->display;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 541629293683..7276a375e2e0 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1371,7 +1371,7 @@ static void vm_destroy_work_func(struct work_struct *w)
 		xe_device_mem_access_put(xe);
 		xe_pm_runtime_put(xe);
 
-		if (xe->info.supports_usm) {
+		if (xe->info.has_asid) {
 			mutex_lock(&xe->usm.lock);
 			lookup = xa_erase(&xe->usm.asid_to_vm, vm->usm.asid);
 			XE_WARN_ON(lookup != vm);
@@ -1868,7 +1868,7 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 		return err;
 	}
 
-	if (xe->info.supports_usm) {
+	if (xe->info.has_asid) {
 		mutex_lock(&xe->usm.lock);
 		err = xa_alloc_cyclic(&xe->usm.asid_to_vm, &asid, vm,
 				      XA_LIMIT(0, XE_MAX_ASID - 1),
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 21/22] drm/xe: Add TLB invalidation fence after rebinds issued from execs
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (19 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 20/22] drm/xe: Add has_asid to device info Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-03 20:24 ` [Intel-xe] [PATCH 22/22] drm/xe: Drop TLB invalidation from ring operations Rodrigo Vivi
  2023-02-06 22:39 ` [Intel-xe] [PATCH 00/22] TLB Invalidation Niranjana Vishwanathapura
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

If we add an TLB invalidation fence for rebinds issued from execs we
should be able to drop the TLB invalidation from the ring operations.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 200 ++++++++++++++++++++-----------------
 1 file changed, 110 insertions(+), 90 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 0c2cc9d772fa..435cc30d88c9 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1160,6 +1160,96 @@ static const struct xe_migrate_pt_update_ops userptr_bind_ops = {
 	.pre_commit = xe_pt_userptr_pre_commit,
 };
 
+struct invalidation_fence {
+	struct xe_gt_tlb_invalidation_fence base;
+	struct xe_gt *gt;
+	struct xe_vma *vma;
+	struct dma_fence *fence;
+	struct dma_fence_cb cb;
+	struct work_struct work;
+};
+
+static const char *
+invalidation_fence_get_driver_name(struct dma_fence *dma_fence)
+{
+	return "xe";
+}
+
+static const char *
+invalidation_fence_get_timeline_name(struct dma_fence *dma_fence)
+{
+	return "invalidation_fence";
+}
+
+static const struct dma_fence_ops invalidation_fence_ops = {
+	.get_driver_name = invalidation_fence_get_driver_name,
+	.get_timeline_name = invalidation_fence_get_timeline_name,
+};
+
+static void invalidation_fence_cb(struct dma_fence *fence,
+				  struct dma_fence_cb *cb)
+{
+	struct invalidation_fence *ifence =
+		container_of(cb, struct invalidation_fence, cb);
+
+	trace_xe_gt_tlb_invalidation_fence_cb(&ifence->base);
+	if (!ifence->fence->error) {
+		queue_work(system_wq, &ifence->work);
+	} else {
+		ifence->base.base.error = ifence->fence->error;
+		dma_fence_signal(&ifence->base.base);
+		dma_fence_put(&ifence->base.base);
+	}
+	dma_fence_put(ifence->fence);
+}
+
+static void invalidation_fence_work_func(struct work_struct *w)
+{
+	struct invalidation_fence *ifence =
+		container_of(w, struct invalidation_fence, work);
+
+	trace_xe_gt_tlb_invalidation_fence_work_func(&ifence->base);
+	xe_gt_tlb_invalidation_vma(ifence->gt, &ifence->base, ifence->vma);
+}
+
+static int invalidation_fence_init(struct xe_gt *gt,
+				   struct invalidation_fence *ifence,
+				   struct dma_fence *fence,
+				   struct xe_vma *vma)
+{
+	int ret;
+
+	trace_xe_gt_tlb_invalidation_fence_create(&ifence->base);
+
+	spin_lock_irq(&gt->tlb_invalidation.lock);
+	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
+		       &gt->tlb_invalidation.lock,
+		       gt->tlb_invalidation.fence_context,
+		       ++gt->tlb_invalidation.fence_seqno);
+	spin_unlock_irq(&gt->tlb_invalidation.lock);
+
+	INIT_LIST_HEAD(&ifence->base.link);
+
+	dma_fence_get(&ifence->base.base);	/* Ref for caller */
+	ifence->fence = fence;
+	ifence->gt = gt;
+	ifence->vma = vma;
+
+	INIT_WORK(&ifence->work, invalidation_fence_work_func);
+	ret = dma_fence_add_callback(fence, &ifence->cb, invalidation_fence_cb);
+	if (ret == -ENOENT) {
+		dma_fence_put(ifence->fence);	/* Usually dropped in CB */
+		invalidation_fence_work_func(&ifence->work);
+	} else if (ret) {
+		dma_fence_put(&ifence->base.base);	/* Caller ref */
+		dma_fence_put(&ifence->base.base);	/* Creation ref */
+	}
+
+	XE_WARN_ON(ret && ret != -ENOENT);
+
+	return ret && ret != -ENOENT ? ret : 0;
+}
+
 /**
  * __xe_pt_bind_vma() - Build and connect a page-table tree for the vma
  * address range.
@@ -1198,6 +1288,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 	struct xe_vm *vm = vma->vm;
 	u32 num_entries;
 	struct dma_fence *fence;
+	struct invalidation_fence *ifence = NULL;
 	int err;
 
 	bind_pt_update.locked = false;
@@ -1216,6 +1307,12 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 
 	xe_vm_dbg_print_entries(gt_to_xe(gt), entries, num_entries);
 
+	if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
+		ifence = kzalloc(sizeof(*ifence), GFP_KERNEL);
+		if (!ifence)
+			return ERR_PTR(-ENOMEM);
+	}
+
 	fence = xe_migrate_update_pgtables(gt->migrate,
 					   vm, vma->bo,
 					   e ? e : vm->eng[gt->info.id],
@@ -1225,6 +1322,18 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 	if (!IS_ERR(fence)) {
 		LLIST_HEAD(deferred);
 
+		/* TLB invalidation must be done before signaling rebind */
+		if (rebind && !xe_vm_no_dma_fences(vma->vm)) {
+			int err = invalidation_fence_init(gt, ifence, fence,
+							  vma);
+			if (err) {
+				dma_fence_put(fence);
+				kfree(ifence);
+				return ERR_PTR(err);
+			}
+			fence = &ifence->base.base;
+		}
+
 		/* add shared fence now for pagetable delayed destroy */
 		dma_resv_add_fence(&vm->resv, fence, !rebind &&
 				   vma->last_munmap_rebind ?
@@ -1250,6 +1359,7 @@ __xe_pt_bind_vma(struct xe_gt *gt, struct xe_vma *vma, struct xe_engine *e,
 			queue_work(vm->xe->ordered_wq,
 				   &vm->preempt.rebind_work);
 	} else {
+		kfree(ifence);
 		if (bind_pt_update.locked)
 			up_read(&vm->userptr.notifier_lock);
 		xe_pt_abort_bind(vma, entries, num_entries);
@@ -1467,96 +1577,6 @@ static const struct xe_migrate_pt_update_ops userptr_unbind_ops = {
 	.pre_commit = xe_pt_userptr_pre_commit,
 };
 
-struct invalidation_fence {
-	struct xe_gt_tlb_invalidation_fence base;
-	struct xe_gt *gt;
-	struct xe_vma *vma;
-	struct dma_fence *fence;
-	struct dma_fence_cb cb;
-	struct work_struct work;
-};
-
-static const char *
-invalidation_fence_get_driver_name(struct dma_fence *dma_fence)
-{
-	return "xe";
-}
-
-static const char *
-invalidation_fence_get_timeline_name(struct dma_fence *dma_fence)
-{
-	return "invalidation_fence";
-}
-
-static const struct dma_fence_ops invalidation_fence_ops = {
-	.get_driver_name = invalidation_fence_get_driver_name,
-	.get_timeline_name = invalidation_fence_get_timeline_name,
-};
-
-static void invalidation_fence_cb(struct dma_fence *fence,
-				  struct dma_fence_cb *cb)
-{
-	struct invalidation_fence *ifence =
-		container_of(cb, struct invalidation_fence, cb);
-
-	trace_xe_gt_tlb_invalidation_fence_cb(&ifence->base);
-	if (!ifence->fence->error) {
-		queue_work(system_wq, &ifence->work);
-	} else {
-		ifence->base.base.error = ifence->fence->error;
-		dma_fence_signal(&ifence->base.base);
-		dma_fence_put(&ifence->base.base);
-	}
-	dma_fence_put(ifence->fence);
-}
-
-static void invalidation_fence_work_func(struct work_struct *w)
-{
-	struct invalidation_fence *ifence =
-		container_of(w, struct invalidation_fence, work);
-
-	trace_xe_gt_tlb_invalidation_fence_work_func(&ifence->base);
-	xe_gt_tlb_invalidation_vma(ifence->gt, &ifence->base, ifence->vma);
-}
-
-static int invalidation_fence_init(struct xe_gt *gt,
-				   struct invalidation_fence *ifence,
-				   struct dma_fence *fence,
-				   struct xe_vma *vma)
-{
-	int ret;
-
-	trace_xe_gt_tlb_invalidation_fence_create(&ifence->base);
-
-	spin_lock_irq(&gt->tlb_invalidation.lock);
-	dma_fence_init(&ifence->base.base, &invalidation_fence_ops,
-		       &gt->tlb_invalidation.lock,
-		       gt->tlb_invalidation.fence_context,
-		       ++gt->tlb_invalidation.fence_seqno);
-	spin_unlock_irq(&gt->tlb_invalidation.lock);
-
-	INIT_LIST_HEAD(&ifence->base.link);
-
-	dma_fence_get(&ifence->base.base);	/* Ref for caller */
-	ifence->fence = fence;
-	ifence->gt = gt;
-	ifence->vma = vma;
-
-	INIT_WORK(&ifence->work, invalidation_fence_work_func);
-	ret = dma_fence_add_callback(fence, &ifence->cb, invalidation_fence_cb);
-	if (ret == -ENOENT) {
-		dma_fence_put(ifence->fence);	/* Usually dropped in CB */
-		invalidation_fence_work_func(&ifence->work);
-	} else if (ret) {
-		dma_fence_put(&ifence->base.base);	/* Caller ref */
-		dma_fence_put(&ifence->base.base);	/* Creation ref */
-	}
-
-	XE_WARN_ON(ret && ret != -ENOENT);
-
-	return ret && ret != -ENOENT ? ret : 0;
-}
-
 /**
  * __xe_pt_unbind_vma() - Disconnect and free a page-table tree for the vma
  * address range.
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Intel-xe] [PATCH 22/22] drm/xe: Drop TLB invalidation from ring operations
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (20 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 21/22] drm/xe: Add TLB invalidation fence after rebinds issued from execs Rodrigo Vivi
@ 2023-02-03 20:24 ` Rodrigo Vivi
  2023-02-06 22:39 ` [Intel-xe] [PATCH 00/22] TLB Invalidation Niranjana Vishwanathapura
  22 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-03 20:24 UTC (permalink / raw)
  To: intel-xe; +Cc: niranjana.vishwanathapura, Rodrigo Vivi

From: Matthew Brost <matthew.brost@intel.com>

Now that we issue TLB invalidations on unbinds and rebind from execs we
no longer need to issue TLB invalidations from the ring operations.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
---
 drivers/gpu/drm/xe/xe_ring_ops.c | 40 +-------------------------------
 1 file changed, 1 insertion(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index d08f50e53649..370ab1e729fa 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -83,31 +83,6 @@ static int emit_flush_invalidate(u32 flag, u32 *dw, int i)
 	return i;
 }
 
-static int emit_pipe_invalidate(u32 mask_flags, u32 *dw, int i)
-{
-	u32 flags = PIPE_CONTROL_CS_STALL |
-		PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
-		PIPE_CONTROL_TLB_INVALIDATE |
-		PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
-		PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
-		PIPE_CONTROL_VF_CACHE_INVALIDATE |
-		PIPE_CONTROL_CONST_CACHE_INVALIDATE |
-		PIPE_CONTROL_STATE_CACHE_INVALIDATE |
-		PIPE_CONTROL_QW_WRITE |
-		PIPE_CONTROL_STORE_DATA_INDEX;
-
-	flags &= ~mask_flags;
-
-	dw[i++] = GFX_OP_PIPE_CONTROL(6);
-	dw[i++] = flags;
-	dw[i++] = LRC_PPHWSP_SCRATCH_ADDR;
-	dw[i++] = 0;
-	dw[i++] = 0;
-	dw[i++] = 0;
-
-	return i;
-}
-
 #define MI_STORE_QWORD_IMM_GEN8_POSTED (MI_INSTR(0x20, 3) | (1 << 21))
 
 static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
@@ -148,11 +123,6 @@ static void __emit_job_gen12_copy(struct xe_sched_job *job, struct xe_lrc *lrc,
 	u32 dw[MAX_JOB_SIZE_DW], i = 0;
 	u32 ppgtt_flag = get_ppgtt_flag(job);
 
-	/* XXX: Conditional flushing possible */
-	dw[i++] = preparser_disable(true);
-	i = emit_flush_invalidate(0, dw, i);
-	dw[i++] = preparser_disable(false);
-
 	i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc),
 				seqno, dw, i);
 
@@ -181,9 +151,7 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
 	struct xe_device *xe = gt_to_xe(gt);
 	bool decode = job->engine->class == XE_ENGINE_CLASS_VIDEO_DECODE;
 
-	/* XXX: Conditional flushing possible */
 	dw[i++] = preparser_disable(true);
-	i = emit_flush_invalidate(decode ? MI_INVALIDATE_BSD : 0, dw, i);
 	/* Wa_1809175790 */
 	if (!xe->info.has_flat_ccs) {
 		if (decode)
@@ -244,15 +212,8 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
 	struct xe_gt *gt = job->engine->gt;
 	struct xe_device *xe = gt_to_xe(gt);
 	bool pvc = xe->info.platform == XE_PVC;
-	u32 mask_flags = 0;
 
-	/* XXX: Conditional flushing possible */
 	dw[i++] = preparser_disable(true);
-	if (pvc)
-		mask_flags = PIPE_CONTROL_3D_ARCH_FLAGS;
-	else if (job->engine->class == XE_ENGINE_CLASS_COMPUTE)
-		mask_flags = PIPE_CONTROL_3D_ENGINE_FLAGS;
-	i = emit_pipe_invalidate(mask_flags, dw, i);
 	/* Wa_1809175790 */
 	if (!xe->info.has_flat_ccs)
 		i = emit_aux_table_inv(gt, GEN12_GFX_CCS_AUX_NV.reg, dw, i);
@@ -287,6 +248,7 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
 
 	i = emit_bb_start(job->batch_addr[0], BIT(8), dw, i);
 
+	/* XXX: Do we need this? Leaving for now. */
 	dw[i++] = preparser_disable(true);
 	i = emit_flush_invalidate(0, dw, i);
 	dw[i++] = preparser_disable(false);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [Intel-xe] [PATCH 00/22] TLB Invalidation
  2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
                   ` (21 preceding siblings ...)
  2023-02-03 20:24 ` [Intel-xe] [PATCH 22/22] drm/xe: Drop TLB invalidation from ring operations Rodrigo Vivi
@ 2023-02-06 22:39 ` Niranjana Vishwanathapura
  2023-02-08 17:27   ` Rodrigo Vivi
  22 siblings, 1 reply; 31+ messages in thread
From: Niranjana Vishwanathapura @ 2023-02-06 22:39 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Fri, Feb 03, 2023 at 03:23:47PM -0500, Rodrigo Vivi wrote:
>Let's just confirm the reviews on this patch and get them
>merged to drm-xe-next.
>
>Matthew Brost (22):
>  drm/xe: Don't process TLB invalidation done in CT fast-path
>  drm/xe: Break of TLB invalidation into its own file
>  drm/xe: Move TLB invalidation variable to own sub-structure in GT
>  drm/xe: Add TLB invalidation fence
>  drm/xe: Invalidate TLB after unbind is complete
>  drm/xe: Kernel doc GT TLB invalidations
>  drm/xe: Add TLB invalidation fence ftrace
>  drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
>  drm/xe: Add TDR for invalidation fence timeout cleanup
>  drm/xe: Only set VM->asid for platforms that support a ASID
>  drm/xe: Delete debugfs entry to issue TLB invalidation
>  drm/xe: Add has_range_tlb_invalidation device attribute
>  drm/xe: Add range based TLB invalidations
>  drm/xe: Propagate error from bind operations to async fence
>  drm/xe: Use GuC to do GGTT invalidations for the GuC firmware
>  drm/xe: Coalesce GGTT invalidations
>  drm/xe: Lock GGTT on when restoring kernel BOs
>  drm/xe: Propagate VM unbind error to invalidation fence
>  drm/xe: Signal invalidation fence immediately if CT send fails
>  drm/xe: Add has_asid to device info
>  drm/xe: Add TLB invalidation fence after rebinds issued from execs
>  drm/xe: Drop TLB invalidation from ring operations
>

Looks good to me.
Some minor comments on patch ordering.
Patch #8 can be merged with #10
Patch #10 can be simplified if we move #20 before patch #10
Patch #17 should probably put ahead of #16

In any case,
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>


> drivers/gpu/drm/xe/Makefile                   |   1 +
> drivers/gpu/drm/xe/xe_bo_evict.c              |   5 +-
> drivers/gpu/drm/xe/xe_device.c                |  14 +
> drivers/gpu/drm/xe/xe_device_types.h          |   4 +
> drivers/gpu/drm/xe/xe_ggtt.c                  |  23 +-
> drivers/gpu/drm/xe/xe_ggtt_types.h            |   2 +
> drivers/gpu/drm/xe/xe_gt.c                    |  19 +
> drivers/gpu/drm/xe/xe_gt.h                    |   1 +
> drivers/gpu/drm/xe/xe_gt_debugfs.c            |  21 --
> drivers/gpu/drm/xe/xe_gt_pagefault.c          | 104 +-----
> drivers/gpu/drm/xe/xe_gt_pagefault.h          |   3 -
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c   | 342 ++++++++++++++++++
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h   |  26 ++
> .../gpu/drm/xe/xe_gt_tlb_invalidation_types.h |  28 ++
> drivers/gpu/drm/xe/xe_gt_types.h              |  41 ++-
> drivers/gpu/drm/xe/xe_guc.c                   |   2 +
> drivers/gpu/drm/xe/xe_guc_ct.c                |  10 +-
> drivers/gpu/drm/xe/xe_guc_types.h             |   2 +
> drivers/gpu/drm/xe/xe_lrc.c                   |   4 +-
> drivers/gpu/drm/xe/xe_pci.c                   |   7 +
> drivers/gpu/drm/xe/xe_pt.c                    | 130 +++++++
> drivers/gpu/drm/xe/xe_ring_ops.c              |  40 +-
> drivers/gpu/drm/xe/xe_trace.h                 |  55 +++
> drivers/gpu/drm/xe/xe_uc.c                    |   9 +-
> drivers/gpu/drm/xe/xe_uc.h                    |   1 +
> drivers/gpu/drm/xe/xe_vm.c                    |  42 ++-
> 26 files changed, 736 insertions(+), 200 deletions(-)
> create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
>
>-- 
>2.39.1
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-xe] [PATCH 00/22] TLB Invalidation
  2023-02-06 22:39 ` [Intel-xe] [PATCH 00/22] TLB Invalidation Niranjana Vishwanathapura
@ 2023-02-08 17:27   ` Rodrigo Vivi
  2023-02-09  4:54     ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-08 17:27 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: intel-xe

On Mon, Feb 06, 2023 at 02:39:33PM -0800, Niranjana Vishwanathapura wrote:
> On Fri, Feb 03, 2023 at 03:23:47PM -0500, Rodrigo Vivi wrote:
> > Let's just confirm the reviews on this patch and get them
> > merged to drm-xe-next.
> > 
> > Matthew Brost (22):
> >  drm/xe: Don't process TLB invalidation done in CT fast-path
> >  drm/xe: Break of TLB invalidation into its own file
> >  drm/xe: Move TLB invalidation variable to own sub-structure in GT
> >  drm/xe: Add TLB invalidation fence
> >  drm/xe: Invalidate TLB after unbind is complete
> >  drm/xe: Kernel doc GT TLB invalidations
> >  drm/xe: Add TLB invalidation fence ftrace
> >  drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
> >  drm/xe: Add TDR for invalidation fence timeout cleanup
> >  drm/xe: Only set VM->asid for platforms that support a ASID
> >  drm/xe: Delete debugfs entry to issue TLB invalidation
> >  drm/xe: Add has_range_tlb_invalidation device attribute
> >  drm/xe: Add range based TLB invalidations
> >  drm/xe: Propagate error from bind operations to async fence
> >  drm/xe: Use GuC to do GGTT invalidations for the GuC firmware
> >  drm/xe: Coalesce GGTT invalidations
> >  drm/xe: Lock GGTT on when restoring kernel BOs
> >  drm/xe: Propagate VM unbind error to invalidation fence
> >  drm/xe: Signal invalidation fence immediately if CT send fails
> >  drm/xe: Add has_asid to device info
> >  drm/xe: Add TLB invalidation fence after rebinds issued from execs
> >  drm/xe: Drop TLB invalidation from ring operations
> > 
> 
> Looks good to me.
> Some minor comments on patch ordering.
> Patch #8 can be merged with #10

did you mean squashed together?
but why 8 and 10? 8 is a build fix, so I'd assume the issue
happened in a previous patch, not in patch 10.

08 - drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
10 - drm/xe: Only set VM->asid for platforms that support a ASID

> Patch #10 can be simplified if we move #20 before patch #10

indeed, but at this point I prefer to not touch them...

> Patch #17 should probably put ahead of #16

maybe a squash? but I will probably just leave it as is...

> 
> In any case,
> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>

Thanks you so much

> 
> 
> > drivers/gpu/drm/xe/Makefile                   |   1 +
> > drivers/gpu/drm/xe/xe_bo_evict.c              |   5 +-
> > drivers/gpu/drm/xe/xe_device.c                |  14 +
> > drivers/gpu/drm/xe/xe_device_types.h          |   4 +
> > drivers/gpu/drm/xe/xe_ggtt.c                  |  23 +-
> > drivers/gpu/drm/xe/xe_ggtt_types.h            |   2 +
> > drivers/gpu/drm/xe/xe_gt.c                    |  19 +
> > drivers/gpu/drm/xe/xe_gt.h                    |   1 +
> > drivers/gpu/drm/xe/xe_gt_debugfs.c            |  21 --
> > drivers/gpu/drm/xe/xe_gt_pagefault.c          | 104 +-----
> > drivers/gpu/drm/xe/xe_gt_pagefault.h          |   3 -
> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c   | 342 ++++++++++++++++++
> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h   |  26 ++
> > .../gpu/drm/xe/xe_gt_tlb_invalidation_types.h |  28 ++
> > drivers/gpu/drm/xe/xe_gt_types.h              |  41 ++-
> > drivers/gpu/drm/xe/xe_guc.c                   |   2 +
> > drivers/gpu/drm/xe/xe_guc_ct.c                |  10 +-
> > drivers/gpu/drm/xe/xe_guc_types.h             |   2 +
> > drivers/gpu/drm/xe/xe_lrc.c                   |   4 +-
> > drivers/gpu/drm/xe/xe_pci.c                   |   7 +
> > drivers/gpu/drm/xe/xe_pt.c                    | 130 +++++++
> > drivers/gpu/drm/xe/xe_ring_ops.c              |  40 +-
> > drivers/gpu/drm/xe/xe_trace.h                 |  55 +++
> > drivers/gpu/drm/xe/xe_uc.c                    |   9 +-
> > drivers/gpu/drm/xe/xe_uc.h                    |   1 +
> > drivers/gpu/drm/xe/xe_vm.c                    |  42 ++-
> > 26 files changed, 736 insertions(+), 200 deletions(-)
> > create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
> > create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
> > 
> > -- 
> > 2.39.1
> > 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-xe] [PATCH 00/22] TLB Invalidation
  2023-02-08 17:27   ` Rodrigo Vivi
@ 2023-02-09  4:54     ` Niranjana Vishwanathapura
  2023-02-09  4:59       ` Niranjana Vishwanathapura
  0 siblings, 1 reply; 31+ messages in thread
From: Niranjana Vishwanathapura @ 2023-02-09  4:54 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Wed, Feb 08, 2023 at 12:27:01PM -0500, Rodrigo Vivi wrote:
>On Mon, Feb 06, 2023 at 02:39:33PM -0800, Niranjana Vishwanathapura wrote:
>> On Fri, Feb 03, 2023 at 03:23:47PM -0500, Rodrigo Vivi wrote:
>> > Let's just confirm the reviews on this patch and get them
>> > merged to drm-xe-next.
>> >
>> > Matthew Brost (22):
>> >  drm/xe: Don't process TLB invalidation done in CT fast-path
>> >  drm/xe: Break of TLB invalidation into its own file
>> >  drm/xe: Move TLB invalidation variable to own sub-structure in GT
>> >  drm/xe: Add TLB invalidation fence
>> >  drm/xe: Invalidate TLB after unbind is complete
>> >  drm/xe: Kernel doc GT TLB invalidations
>> >  drm/xe: Add TLB invalidation fence ftrace
>> >  drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
>> >  drm/xe: Add TDR for invalidation fence timeout cleanup
>> >  drm/xe: Only set VM->asid for platforms that support a ASID
>> >  drm/xe: Delete debugfs entry to issue TLB invalidation
>> >  drm/xe: Add has_range_tlb_invalidation device attribute
>> >  drm/xe: Add range based TLB invalidations
>> >  drm/xe: Propagate error from bind operations to async fence
>> >  drm/xe: Use GuC to do GGTT invalidations for the GuC firmware
>> >  drm/xe: Coalesce GGTT invalidations
>> >  drm/xe: Lock GGTT on when restoring kernel BOs
>> >  drm/xe: Propagate VM unbind error to invalidation fence
>> >  drm/xe: Signal invalidation fence immediately if CT send fails
>> >  drm/xe: Add has_asid to device info
>> >  drm/xe: Add TLB invalidation fence after rebinds issued from execs
>> >  drm/xe: Drop TLB invalidation from ring operations
>> >
>>
>> Looks good to me.
>> Some minor comments on patch ordering.
>> Patch #8 can be merged with #10
>
>did you mean squashed together?
>but why 8 and 10? 8 is a build fix, so I'd assume the issue
>happened in a previous patch, not in patch 10.
>
>08 - drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
>10 - drm/xe: Only set VM->asid for platforms that support a ASID
>

Sorry, I meant #8 and #11. Most of the change in #8 is removed by
patch #11. So, #8 can be squashed with #11.

>> Patch #10 can be simplified if we move #20 before patch #10
>
>indeed, but at this point I prefer to not touch them...
>
>> Patch #17 should probably put ahead of #16
>
>maybe a squash? but I will probably just leave it as is...
>

Ok,
Niranjana

>>
>> In any case,
>> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>
>Thanks you so much
>
>>
>>
>> > drivers/gpu/drm/xe/Makefile                   |   1 +
>> > drivers/gpu/drm/xe/xe_bo_evict.c              |   5 +-
>> > drivers/gpu/drm/xe/xe_device.c                |  14 +
>> > drivers/gpu/drm/xe/xe_device_types.h          |   4 +
>> > drivers/gpu/drm/xe/xe_ggtt.c                  |  23 +-
>> > drivers/gpu/drm/xe/xe_ggtt_types.h            |   2 +
>> > drivers/gpu/drm/xe/xe_gt.c                    |  19 +
>> > drivers/gpu/drm/xe/xe_gt.h                    |   1 +
>> > drivers/gpu/drm/xe/xe_gt_debugfs.c            |  21 --
>> > drivers/gpu/drm/xe/xe_gt_pagefault.c          | 104 +-----
>> > drivers/gpu/drm/xe/xe_gt_pagefault.h          |   3 -
>> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c   | 342 ++++++++++++++++++
>> > drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h   |  26 ++
>> > .../gpu/drm/xe/xe_gt_tlb_invalidation_types.h |  28 ++
>> > drivers/gpu/drm/xe/xe_gt_types.h              |  41 ++-
>> > drivers/gpu/drm/xe/xe_guc.c                   |   2 +
>> > drivers/gpu/drm/xe/xe_guc_ct.c                |  10 +-
>> > drivers/gpu/drm/xe/xe_guc_types.h             |   2 +
>> > drivers/gpu/drm/xe/xe_lrc.c                   |   4 +-
>> > drivers/gpu/drm/xe/xe_pci.c                   |   7 +
>> > drivers/gpu/drm/xe/xe_pt.c                    | 130 +++++++
>> > drivers/gpu/drm/xe/xe_ring_ops.c              |  40 +-
>> > drivers/gpu/drm/xe/xe_trace.h                 |  55 +++
>> > drivers/gpu/drm/xe/xe_uc.c                    |   9 +-
>> > drivers/gpu/drm/xe/xe_uc.h                    |   1 +
>> > drivers/gpu/drm/xe/xe_vm.c                    |  42 ++-
>> > 26 files changed, 736 insertions(+), 200 deletions(-)
>> > create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> > create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
>> > create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
>> >
>> > --
>> > 2.39.1
>> >

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-xe] [PATCH 00/22] TLB Invalidation
  2023-02-09  4:54     ` Niranjana Vishwanathapura
@ 2023-02-09  4:59       ` Niranjana Vishwanathapura
  0 siblings, 0 replies; 31+ messages in thread
From: Niranjana Vishwanathapura @ 2023-02-09  4:59 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-xe

On Wed, Feb 08, 2023 at 08:54:04PM -0800, Niranjana Vishwanathapura wrote:
>On Wed, Feb 08, 2023 at 12:27:01PM -0500, Rodrigo Vivi wrote:
>>On Mon, Feb 06, 2023 at 02:39:33PM -0800, Niranjana Vishwanathapura wrote:
>>>On Fri, Feb 03, 2023 at 03:23:47PM -0500, Rodrigo Vivi wrote:
>>>> Let's just confirm the reviews on this patch and get them
>>>> merged to drm-xe-next.
>>>>
>>>> Matthew Brost (22):
>>>>  drm/xe: Don't process TLB invalidation done in CT fast-path
>>>>  drm/xe: Break of TLB invalidation into its own file
>>>>  drm/xe: Move TLB invalidation variable to own sub-structure in GT
>>>>  drm/xe: Add TLB invalidation fence
>>>>  drm/xe: Invalidate TLB after unbind is complete
>>>>  drm/xe: Kernel doc GT TLB invalidations
>>>>  drm/xe: Add TLB invalidation fence ftrace
>>>>  drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
>>>>  drm/xe: Add TDR for invalidation fence timeout cleanup
>>>>  drm/xe: Only set VM->asid for platforms that support a ASID
>>>>  drm/xe: Delete debugfs entry to issue TLB invalidation
>>>>  drm/xe: Add has_range_tlb_invalidation device attribute
>>>>  drm/xe: Add range based TLB invalidations
>>>>  drm/xe: Propagate error from bind operations to async fence
>>>>  drm/xe: Use GuC to do GGTT invalidations for the GuC firmware
>>>>  drm/xe: Coalesce GGTT invalidations
>>>>  drm/xe: Lock GGTT on when restoring kernel BOs
>>>>  drm/xe: Propagate VM unbind error to invalidation fence
>>>>  drm/xe: Signal invalidation fence immediately if CT send fails
>>>>  drm/xe: Add has_asid to device info
>>>>  drm/xe: Add TLB invalidation fence after rebinds issued from execs
>>>>  drm/xe: Drop TLB invalidation from ring operations
>>>>
>>>
>>>Looks good to me.
>>>Some minor comments on patch ordering.
>>>Patch #8 can be merged with #10
>>
>>did you mean squashed together?
>>but why 8 and 10? 8 is a build fix, so I'd assume the issue
>>happened in a previous patch, not in patch 10.
>>
>>08 - drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
>>10 - drm/xe: Only set VM->asid for platforms that support a ASID
>>
>
>Sorry, I meant #8 and #11. Most of the change in #8 is removed by
>patch #11. So, #8 can be squashed with #11.
>
>>>Patch #10 can be simplified if we move #20 before patch #10
>>
>>indeed, but at this point I prefer to not touch them...
>>
>>>Patch #17 should probably put ahead of #16
>>
>>maybe a squash? but I will probably just leave it as is...
>>

Actually, in patch #16, we are adding a lockdep assert in xe_ggtt_map_bo(),
but lock is taken in patch #17. Hence the comment. But it should be fine
as it is a functional dependency and not a compile dependency.

>
>Ok,
>Niranjana
>
>>>
>>>In any case,
>>>Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
>>
>>Thanks you so much
>>
>>>
>>>
>>>> drivers/gpu/drm/xe/Makefile                   |   1 +
>>>> drivers/gpu/drm/xe/xe_bo_evict.c              |   5 +-
>>>> drivers/gpu/drm/xe/xe_device.c                |  14 +
>>>> drivers/gpu/drm/xe/xe_device_types.h          |   4 +
>>>> drivers/gpu/drm/xe/xe_ggtt.c                  |  23 +-
>>>> drivers/gpu/drm/xe/xe_ggtt_types.h            |   2 +
>>>> drivers/gpu/drm/xe/xe_gt.c                    |  19 +
>>>> drivers/gpu/drm/xe/xe_gt.h                    |   1 +
>>>> drivers/gpu/drm/xe/xe_gt_debugfs.c            |  21 --
>>>> drivers/gpu/drm/xe/xe_gt_pagefault.c          | 104 +-----
>>>> drivers/gpu/drm/xe/xe_gt_pagefault.h          |   3 -
>>>> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c   | 342 ++++++++++++++++++
>>>> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h   |  26 ++
>>>> .../gpu/drm/xe/xe_gt_tlb_invalidation_types.h |  28 ++
>>>> drivers/gpu/drm/xe/xe_gt_types.h              |  41 ++-
>>>> drivers/gpu/drm/xe/xe_guc.c                   |   2 +
>>>> drivers/gpu/drm/xe/xe_guc_ct.c                |  10 +-
>>>> drivers/gpu/drm/xe/xe_guc_types.h             |   2 +
>>>> drivers/gpu/drm/xe/xe_lrc.c                   |   4 +-
>>>> drivers/gpu/drm/xe/xe_pci.c                   |   7 +
>>>> drivers/gpu/drm/xe/xe_pt.c                    | 130 +++++++
>>>> drivers/gpu/drm/xe/xe_ring_ops.c              |  40 +-
>>>> drivers/gpu/drm/xe/xe_trace.h                 |  55 +++
>>>> drivers/gpu/drm/xe/xe_uc.c                    |   9 +-
>>>> drivers/gpu/drm/xe/xe_uc.h                    |   1 +
>>>> drivers/gpu/drm/xe/xe_vm.c                    |  42 ++-
>>>> 26 files changed, 736 insertions(+), 200 deletions(-)
>>>> create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
>>>> create mode 100644 drivers/gpu/drm/xe/xe_gt_tlb_invalidation_types.h
>>>>
>>>> --
>>>> 2.39.1
>>>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-xe] [PATCH 06/22] drm/xe: Kernel doc GT TLB invalidations
  2023-02-03 20:23 ` [Intel-xe] [PATCH 06/22] drm/xe: Kernel doc GT TLB invalidations Rodrigo Vivi
@ 2023-02-13 23:21   ` Matt Roper
  2023-02-17 16:22     ` Rodrigo Vivi
  0 siblings, 1 reply; 31+ messages in thread
From: Matt Roper @ 2023-02-13 23:21 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: niranjana.vishwanathapura, intel-xe

On Fri, Feb 03, 2023 at 03:23:53PM -0500, Rodrigo Vivi wrote:
> From: Matthew Brost <matthew.brost@intel.com>
> 
> Document all exported functions.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 52 ++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 23094d364583..1cb4d3a6bc57 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -14,6 +14,15 @@ guc_to_gt(struct xe_guc *guc)
>  	return container_of(guc, struct xe_gt, uc.guc);
>  }
>  
> +/**
> + * xe_gt_tlb_invalidation_init - Initialize GT TLB invalidation state
> + * @gt: graphics tile
> + *
> + * Initialize GT TLB invalidation state, purely software initialization, should
> + * be called once during driver load.
> + *
> + * Return: 0 on success, negative error code on error.
> + */
>  int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
>  {
>  	gt->tlb_invalidation.seqno = 1;
> @@ -24,7 +33,13 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
>  	return 0;
>  }
>  
> -void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
> +/**
> + * xe_gt_tlb_invalidation_reset - Initialize GT TLB invalidation reset

The description here is confusing.  We're not really "initializing"
anything here.

> + * @gt: graphics tile
> + *
> + * Signal any pending invalidation fences, should be called during a GT reset

Is it confirmed that a GDRST-initiated reset implicitly invalidates all
the engine TLBs (and thus just signalling all the fences is sufficient)?
Or does the GuC take care of this itself while it is being
(re)initialized?  I know there are a lot of parts of the GT that don't
actually get reset when requesting GRDOM_FULL, but it's never been very
well documented exactly what those are.


Matt

> + */
> + void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
>  {
>  	struct xe_gt_tlb_invalidation_fence *fence, *next;
>  
> @@ -82,6 +97,19 @@ static int send_tlb_invalidation(struct xe_guc *guc,
>  	return ret;
>  }
>  
> +/**
> + * xe_gt_tlb_invalidation - Issue a TLB invalidation on this GT
> + * @gt: graphics tile
> + * @fence: invalidation fence which will be signal on TLB invalidation
> + * completion, can be NULL
> + *
> + * Issue a full TLB invalidation on the GT. Completion of TLB is asynchronous
> + * and caller can either use the invalidation fence or seqno +
> + * xe_gt_tlb_invalidation_wait to wait for completion.
> + *
> + * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success,
> + * negative error code on error.
> + */
>  int xe_gt_tlb_invalidation(struct xe_gt *gt,
>  			   struct xe_gt_tlb_invalidation_fence *fence)
>  {
> @@ -100,6 +128,16 @@ static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
>  	return false;
>  }
>  
> +/**
> + * xe_gt_tlb_invalidation_wait - Wait for TLB to complete
> + * @gt: graphics tile
> + * @seqno: seqno to wait which was returned from xe_gt_tlb_invalidation
> + *
> + * Wait for 200ms for a TLB invalidation to complete, in practice we always
> + * should receive the TLB invalidation within 200ms.
> + *
> + * Return: 0 on success, -ETIME on TLB invalidation timeout
> + */
>  int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
>  {
>  	struct xe_device *xe = gt_to_xe(gt);
> @@ -122,6 +160,18 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
>  	return 0;
>  }
>  
> +/**
> + * xe_guc_tlb_invalidation_done_handler - TLB invalidation done handler
> + * @guc: guc
> + * @msg: message indicating TLB invalidation done
> + * @len: length of message
> + *
> + * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> + * invalidation fences for seqno. Algorithm for this depends on seqno being
> + * received in-order and asserts this assumption.
> + *
> + * Return: 0 on success, -EPROTO for malformed messages.
> + */
>  int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
>  {
>  	struct xe_gt *gt = guc_to_gt(guc);
> -- 
> 2.39.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-xe] [PATCH 08/22] drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
  2023-02-03 20:23 ` [Intel-xe] [PATCH 08/22] drm/xe: Fix build for CONFIG_DRM_XE_DEBUG Rodrigo Vivi
@ 2023-02-13 23:22   ` Matt Roper
  2023-02-17 16:24     ` Rodrigo Vivi
  0 siblings, 1 reply; 31+ messages in thread
From: Matt Roper @ 2023-02-13 23:22 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: niranjana.vishwanathapura, intel-xe

On Fri, Feb 03, 2023 at 03:23:55PM -0500, Rodrigo Vivi wrote:
> From: Matthew Brost <matthew.brost@intel.com>
> 
> GT TLB invalidation functions are now in header
> xe_gt_tlb_invalidation.h, include that file in xe_gt_debugfs.c if
> CONFIG_DRM_XE_DEBUG is set.
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Should this be squashed into patch #2?


Matt

> ---
>  drivers/gpu/drm/xe/xe_gt_debugfs.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c
> index 30058c6100ab..946398f08bb5 100644
> --- a/drivers/gpu/drm/xe/xe_gt_debugfs.c
> +++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c
> @@ -11,12 +11,15 @@
>  #include "xe_gt.h"
>  #include "xe_gt_debugfs.h"
>  #include "xe_gt_mcr.h"
> -#include "xe_gt_pagefault.h"
>  #include "xe_gt_topology.h"
>  #include "xe_hw_engine.h"
>  #include "xe_macros.h"
>  #include "xe_uc_debugfs.h"
>  
> +#ifdef CONFIG_DRM_XE_DEBUG
> +#include "xe_gt_tlb_invalidation.h"
> +#endif
> +
>  static struct xe_gt *node_to_gt(struct drm_info_node *node)
>  {
>  	return node->info_ent->data;
> -- 
> 2.39.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-xe] [PATCH 06/22] drm/xe: Kernel doc GT TLB invalidations
  2023-02-13 23:21   ` Matt Roper
@ 2023-02-17 16:22     ` Rodrigo Vivi
  0 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-17 16:22 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-xe

On Mon, Feb 13, 2023 at 03:21:36PM -0800, Matt Roper wrote:
> On Fri, Feb 03, 2023 at 03:23:53PM -0500, Rodrigo Vivi wrote:
> > From: Matthew Brost <matthew.brost@intel.com>
> > 
> > Document all exported functions.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 52 ++++++++++++++++++++-
> >  1 file changed, 51 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > index 23094d364583..1cb4d3a6bc57 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> > @@ -14,6 +14,15 @@ guc_to_gt(struct xe_guc *guc)
> >  	return container_of(guc, struct xe_gt, uc.guc);
> >  }
> >  
> > +/**
> > + * xe_gt_tlb_invalidation_init - Initialize GT TLB invalidation state
> > + * @gt: graphics tile
> > + *
> > + * Initialize GT TLB invalidation state, purely software initialization, should
> > + * be called once during driver load.
> > + *
> > + * Return: 0 on success, negative error code on error.
> > + */
> >  int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
> >  {
> >  	gt->tlb_invalidation.seqno = 1;
> > @@ -24,7 +33,13 @@ int xe_gt_tlb_invalidation_init(struct xe_gt *gt)
> >  	return 0;
> >  }
> >  
> > -void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
> > +/**
> > + * xe_gt_tlb_invalidation_reset - Initialize GT TLB invalidation reset
> 
> The description here is confusing.  We're not really "initializing"
> anything here.
> 
> > + * @gt: graphics tile
> > + *
> > + * Signal any pending invalidation fences, should be called during a GT reset
> 
> Is it confirmed that a GDRST-initiated reset implicitly invalidates all
> the engine TLBs (and thus just signalling all the fences is sufficient)?
> Or does the GuC take care of this itself while it is being
> (re)initialized?  I know there are a lot of parts of the GT that don't
> actually get reset when requesting GRDOM_FULL, but it's never been very
> well documented exactly what those are.

Since I'm trying to avoid more rebases with these already reviewed and
merged patches, I created an issue to track the changes needed and to
ensure that you get the answers here:

https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/212

> 
> 
> Matt
> 
> > + */
> > + void xe_gt_tlb_invalidation_reset(struct xe_gt *gt)
> >  {
> >  	struct xe_gt_tlb_invalidation_fence *fence, *next;
> >  
> > @@ -82,6 +97,19 @@ static int send_tlb_invalidation(struct xe_guc *guc,
> >  	return ret;
> >  }
> >  
> > +/**
> > + * xe_gt_tlb_invalidation - Issue a TLB invalidation on this GT
> > + * @gt: graphics tile
> > + * @fence: invalidation fence which will be signal on TLB invalidation
> > + * completion, can be NULL
> > + *
> > + * Issue a full TLB invalidation on the GT. Completion of TLB is asynchronous
> > + * and caller can either use the invalidation fence or seqno +
> > + * xe_gt_tlb_invalidation_wait to wait for completion.
> > + *
> > + * Return: Seqno which can be passed to xe_gt_tlb_invalidation_wait on success,
> > + * negative error code on error.
> > + */
> >  int xe_gt_tlb_invalidation(struct xe_gt *gt,
> >  			   struct xe_gt_tlb_invalidation_fence *fence)
> >  {
> > @@ -100,6 +128,16 @@ static bool tlb_invalidation_seqno_past(struct xe_gt *gt, int seqno)
> >  	return false;
> >  }
> >  
> > +/**
> > + * xe_gt_tlb_invalidation_wait - Wait for TLB to complete
> > + * @gt: graphics tile
> > + * @seqno: seqno to wait which was returned from xe_gt_tlb_invalidation
> > + *
> > + * Wait for 200ms for a TLB invalidation to complete, in practice we always
> > + * should receive the TLB invalidation within 200ms.
> > + *
> > + * Return: 0 on success, -ETIME on TLB invalidation timeout
> > + */
> >  int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
> >  {
> >  	struct xe_device *xe = gt_to_xe(gt);
> > @@ -122,6 +160,18 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
> >  	return 0;
> >  }
> >  
> > +/**
> > + * xe_guc_tlb_invalidation_done_handler - TLB invalidation done handler
> > + * @guc: guc
> > + * @msg: message indicating TLB invalidation done
> > + * @len: length of message
> > + *
> > + * Parse seqno of TLB invalidation, wake any waiters for seqno, and signal any
> > + * invalidation fences for seqno. Algorithm for this depends on seqno being
> > + * received in-order and asserts this assumption.
> > + *
> > + * Return: 0 on success, -EPROTO for malformed messages.
> > + */
> >  int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len)
> >  {
> >  	struct xe_gt *gt = guc_to_gt(guc);
> > -- 
> > 2.39.1
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Intel-xe] [PATCH 08/22] drm/xe: Fix build for CONFIG_DRM_XE_DEBUG
  2023-02-13 23:22   ` Matt Roper
@ 2023-02-17 16:24     ` Rodrigo Vivi
  0 siblings, 0 replies; 31+ messages in thread
From: Rodrigo Vivi @ 2023-02-17 16:24 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-xe

On Mon, Feb 13, 2023 at 03:22:47PM -0800, Matt Roper wrote:
> On Fri, Feb 03, 2023 at 03:23:55PM -0500, Rodrigo Vivi wrote:
> > From: Matthew Brost <matthew.brost@intel.com>
> > 
> > GT TLB invalidation functions are now in header
> > xe_gt_tlb_invalidation.h, include that file in xe_gt_debugfs.c if
> > CONFIG_DRM_XE_DEBUG is set.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> 
> Should this be squashed into patch #2?

It would be better indeed.., I'm trying to avoid the changes
in these merged patches right now. But I will keep in mind
for the next rebase on upstream.

Thanks,
Rodrigo.

> 
> 
> Matt
> 
> > ---
> >  drivers/gpu/drm/xe/xe_gt_debugfs.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt_debugfs.c b/drivers/gpu/drm/xe/xe_gt_debugfs.c
> > index 30058c6100ab..946398f08bb5 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_debugfs.c
> > +++ b/drivers/gpu/drm/xe/xe_gt_debugfs.c
> > @@ -11,12 +11,15 @@
> >  #include "xe_gt.h"
> >  #include "xe_gt_debugfs.h"
> >  #include "xe_gt_mcr.h"
> > -#include "xe_gt_pagefault.h"
> >  #include "xe_gt_topology.h"
> >  #include "xe_hw_engine.h"
> >  #include "xe_macros.h"
> >  #include "xe_uc_debugfs.h"
> >  
> > +#ifdef CONFIG_DRM_XE_DEBUG
> > +#include "xe_gt_tlb_invalidation.h"
> > +#endif
> > +
> >  static struct xe_gt *node_to_gt(struct drm_info_node *node)
> >  {
> >  	return node->info_ent->data;
> > -- 
> > 2.39.1
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-02-17 16:24 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-03 20:23 [Intel-xe] [PATCH 00/22] TLB Invalidation Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 01/22] drm/xe: Don't process TLB invalidation done in CT fast-path Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 02/22] drm/xe: Break of TLB invalidation into its own file Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 03/22] drm/xe: Move TLB invalidation variable to own sub-structure in GT Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 04/22] drm/xe: Add TLB invalidation fence Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 05/22] drm/xe: Invalidate TLB after unbind is complete Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 06/22] drm/xe: Kernel doc GT TLB invalidations Rodrigo Vivi
2023-02-13 23:21   ` Matt Roper
2023-02-17 16:22     ` Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 07/22] drm/xe: Add TLB invalidation fence ftrace Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 08/22] drm/xe: Fix build for CONFIG_DRM_XE_DEBUG Rodrigo Vivi
2023-02-13 23:22   ` Matt Roper
2023-02-17 16:24     ` Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 09/22] drm/xe: Add TDR for invalidation fence timeout cleanup Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 10/22] drm/xe: Only set VM->asid for platforms that support a ASID Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 11/22] drm/xe: Delete debugfs entry to issue TLB invalidation Rodrigo Vivi
2023-02-03 20:23 ` [Intel-xe] [PATCH 12/22] drm/xe: Add has_range_tlb_invalidation device attribute Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 13/22] drm/xe: Add range based TLB invalidations Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 14/22] drm/xe: Propagate error from bind operations to async fence Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 15/22] drm/xe: Use GuC to do GGTT invalidations for the GuC firmware Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 16/22] drm/xe: Coalesce GGTT invalidations Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 17/22] drm/xe: Lock GGTT on when restoring kernel BOs Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 18/22] drm/xe: Propagate VM unbind error to invalidation fence Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 19/22] drm/xe: Signal invalidation fence immediately if CT send fails Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 20/22] drm/xe: Add has_asid to device info Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 21/22] drm/xe: Add TLB invalidation fence after rebinds issued from execs Rodrigo Vivi
2023-02-03 20:24 ` [Intel-xe] [PATCH 22/22] drm/xe: Drop TLB invalidation from ring operations Rodrigo Vivi
2023-02-06 22:39 ` [Intel-xe] [PATCH 00/22] TLB Invalidation Niranjana Vishwanathapura
2023-02-08 17:27   ` Rodrigo Vivi
2023-02-09  4:54     ` Niranjana Vishwanathapura
2023-02-09  4:59       ` Niranjana Vishwanathapura

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.