All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support
@ 2023-09-25 13:21 Matthew Auld
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 1/7] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Matthew Auld @ 2023-09-25 13:21 UTC (permalink / raw)
  To: intel-xe

Branch available here (lightly tested):
https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads

Series still needs some more testing. Also note that the series directly depends
on the WIP patch here: https://patchwork.freedesktop.org/series/122708/

Goal here is to allow userspace to directly control the pat_index when mapping
memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
is very much needed on newer igpu platforms which allow incoherent GT access,
where the choice over the cache level and expected coherency is best left to
userspace depending on their usecase.  In the future there may also be other
stuff encoded in the pat_index, so giving userspace direct control will also be
needed there.

To support this we added new gem_create uAPI for selecting the CPU cache
mode to use for system memory, including the expected GPU coherency mode. There
are various restrictions here for the selected coherency mode and compatible CPU
cache modes.  With that in place the actual pat_index can now be provided as
part of vm_bind. The only restriction is that the coherency mode of the
pat_index must be at least as coherent as the gem_create coherency mode. There
are also some special cases like with userptr and dma-buf.

v2:
  - Loads of improvements/tweaks. Main changes are to now allow
    gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
    exactly. This simplifies the dma-buf policy from userspace pov. Also we now
    only consider COH_NONE and COH_AT_LEAST_1WAY.
v3:
  - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
    fixes.

-- 
2.41.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Intel-xe] [PATCH v3 1/7] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
@ 2023-09-25 13:21 ` Matthew Auld
  2023-09-25 18:56   ` Souza, Jose
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 2/7] drm/xe: move pat_table into device info Matthew Auld
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Matthew Auld @ 2023-09-25 13:21 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper

From: Pallavi Mishra <pallavi.mishra@intel.com>

Allow userspace to specify the CPU caching mode to use for system memory
in addition to coherency modes during object creation. Modify gem create
handler and introduce xe_bo_create_user to replace xe_bo_create. In a
later patch we will support setting the pat_index as part of vm_bind,
where expectation is that the coherency mode extracted from the
pat_index must match the one set at object creation.

v2
  - s/smem_caching/smem_cpu_caching/ and
    s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
  - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
    just cares that zeroing/swap-in can't be bypassed with the given
    smem_caching mode. (Matt Roper)
  - Fix broken range check for coh_mode and smem_cpu_caching and also
    don't use constant value, but the already defined macros. (José)
  - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
  - Add note in kernel-doc for dgpu and coherency modes for system
    memory. (José)
v3 (José):
  - Make sure to reject coh_mode == 0 for VRAM-only.
  - Also make sure to actually pass along the (start, end) for
    __xe_bo_create_locked.

Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Co-authored-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_bo.h       |   3 +-
 drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
 drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
 include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
 5 files changed, 158 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 047eae071d03..f1ba0374901f 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -326,7 +326,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 	struct xe_device *xe = xe_bo_device(bo);
 	struct xe_ttm_tt *tt;
 	unsigned long extra_pages;
-	enum ttm_caching caching = ttm_cached;
+	enum ttm_caching caching;
 	int err;
 
 	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
@@ -340,13 +340,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
 					   PAGE_SIZE);
 
+	switch (bo->smem_cpu_caching) {
+	case XE_GEM_CPU_CACHING_WC:
+		caching = ttm_write_combined;
+		break;
+	case XE_GEM_CPU_CACHING_UC:
+		caching = ttm_uncached;
+		break;
+	default:
+		caching = ttm_cached;
+		break;
+	}
+
 	/*
 	 * Display scanout is always non-coherent with the CPU cache.
 	 *
 	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
 	 * require a CPU:WC mapping.
 	 */
-	if (bo->flags & XE_BO_SCANOUT_BIT ||
+	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
 	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
 		caching = ttm_write_combined;
 
@@ -1190,9 +1202,10 @@ void xe_bo_free(struct xe_bo *bo)
 	kfree(bo);
 }
 
-struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
+struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				    struct xe_tile *tile, struct dma_resv *resv,
 				    struct ttm_lru_bulk_move *bulk, size_t size,
+				    u16 smem_cpu_caching, u16 coh_mode,
 				    enum ttm_bo_type type, u32 flags)
 {
 	struct ttm_operation_ctx ctx = {
@@ -1230,6 +1243,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 	bo->tile = tile;
 	bo->size = size;
 	bo->flags = flags;
+	bo->smem_cpu_caching = smem_cpu_caching;
+	bo->coh_mode = coh_mode;
 	bo->ttm.base.funcs = &xe_gem_object_funcs;
 	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
 	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
@@ -1315,10 +1330,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
 }
 
 struct xe_bo *
-xe_bo_create_locked_range(struct xe_device *xe,
-			  struct xe_tile *tile, struct xe_vm *vm,
-			  size_t size, u64 start, u64 end,
-			  enum ttm_bo_type type, u32 flags)
+__xe_bo_create_locked(struct xe_device *xe,
+		      struct xe_tile *tile, struct xe_vm *vm,
+		      size_t size, u64 start, u64 end,
+		      u16 smem_cpu_caching, u16 coh_mode,
+		      enum ttm_bo_type type, u32 flags)
 {
 	struct xe_bo *bo = NULL;
 	int err;
@@ -1339,10 +1355,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
 		}
 	}
 
-	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
+	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
 				   vm && !xe_vm_in_fault_mode(vm) &&
 				   flags & XE_BO_CREATE_USER_BIT ?
 				   &vm->lru_bulk_move : NULL, size,
+				   smem_cpu_caching, coh_mode,
 				   type, flags);
 	if (IS_ERR(bo))
 		return bo;
@@ -1376,11 +1393,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
 	return ERR_PTR(err);
 }
 
+struct xe_bo *
+xe_bo_create_locked_range(struct xe_device *xe,
+			  struct xe_tile *tile, struct xe_vm *vm,
+			  size_t size, u64 start, u64 end,
+			  enum ttm_bo_type type, u32 flags)
+{
+	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, 0, type, flags);
+}
+
 struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
 				  enum ttm_bo_type type, u32 flags)
 {
-	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
+	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
+}
+
+static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
+				       struct xe_vm *vm, size_t size,
+				       u16 smem_cpu_caching, u16 coh_mode,
+				       enum ttm_bo_type type,
+				       u32 flags)
+{
+	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
+						 smem_cpu_caching, coh_mode, type,
+						 flags | XE_BO_CREATE_USER_BIT);
+	if (!IS_ERR(bo))
+		xe_bo_unlock_vm_held(bo);
+
+	return bo;
 }
 
 struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
@@ -1763,11 +1804,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct drm_xe_gem_create *args = data;
 	struct xe_vm *vm = NULL;
 	struct xe_bo *bo;
-	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
+	unsigned int bo_flags;
 	u32 handle;
 	int err;
 
-	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
+	if (XE_IOCTL_DBG(xe, args->extensions) ||
 	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
 		return -EINVAL;
 
@@ -1809,6 +1850,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
 	}
 
+	if (XE_IOCTL_DBG(xe, !args->coh_mode))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
+		return -EINVAL;
+
+	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
+		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
+			return -EINVAL;
+
+		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
+				 bo_flags & XE_BO_SCANOUT_BIT &&
+				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
+			return -EINVAL;
+
+		if (args->coh_mode == XE_GEM_COH_NONE) {
+			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
+				return -EINVAL;
+		}
+	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
+		return -EINVAL;
+	}
+
 	if (args->vm_id) {
 		vm = xe_vm_lookup(xef, args->vm_id);
 		if (XE_IOCTL_DBG(xe, !vm))
@@ -1820,8 +1887,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
-	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
-			  bo_flags);
+	bo = xe_bo_create_user(xe, NULL, vm, args->size,
+			       args->smem_cpu_caching, args->coh_mode,
+			       ttm_bo_type_device,
+			       bo_flags);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto out_vm;
@@ -2113,10 +2182,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
 	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
 			   page_size);
 
-	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
-			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
-			  XE_BO_NEEDS_CPU_ACCESS);
+	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
+			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
+			       ttm_bo_type_device,
+			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
+			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
+			       XE_BO_NEEDS_CPU_ACCESS);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index e3c90d45e723..b3c7e33ed39a 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -83,9 +83,10 @@ struct sg_table;
 struct xe_bo *xe_bo_alloc(void);
 void xe_bo_free(struct xe_bo *bo);
 
-struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
+struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				    struct xe_tile *tile, struct dma_resv *resv,
 				    struct ttm_lru_bulk_move *bulk, size_t size,
+				    u16 smem_cpu_caching, u16 coh_mode,
 				    enum ttm_bo_type type, u32 flags);
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 051fe990c133..d44ccacc683f 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -76,6 +76,16 @@ struct xe_bo {
 	struct llist_node freed;
 	/** @created: Whether the bo has passed initial creation */
 	bool created;
+	/**
+	 * @coh_mode: Coherency setting. Currently only used for userspace
+	 * objects.
+	 */
+	u16 coh_mode;
+	/**
+	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
+	 * userspace objects.
+	 */
+	u16 smem_cpu_caching;
 };
 
 #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index cfde3be3b0dc..9da5cffeef13 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -214,8 +214,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 	int ret;
 
 	dma_resv_lock(resv, NULL);
-	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
-				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
+	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
+				    0, 0, /* Will require 1way or 2way for vm_bind */
+				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
 	if (IS_ERR(bo)) {
 		ret = PTR_ERR(bo);
 		goto error;
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index d48d8e3c898c..fc2016ebe102 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -456,8 +456,61 @@ struct drm_xe_gem_create {
 	 */
 	__u32 handle;
 
-	/** @pad: MBZ */
-	__u32 pad;
+	/**
+	 * @coh_mode: The coherency mode for this object. This will limit the
+	 * possible @smem_caching values.
+	 *
+	 * Supported values:
+	 *
+	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
+	 * CPU. CPU caches are not snooped.
+	 *
+	 * XE_GEM_COH_AT_LEAST_1WAY:
+	 *
+	 * CPU-GPU coherency must be at least 1WAY.
+	 *
+	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
+	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
+	 * caches.
+	 *
+	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
+	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
+	 *
+	 * Note: On dgpu the GPU device never caches system memory (outside of
+	 * the special system-memory-read-only cache, which is anyway flushed by
+	 * KMD when nuking TLBs for a given object so should be no concern to
+	 * userspace). The device should be thought of as always 1WAY coherent,
+	 * with the addition that the GPU never caches system memory. At least
+	 * on current dgpu HW there is no way to turn off snooping so likely the
+	 * different coherency modes of the pat_index make no difference for
+	 * system memory.
+	 */
+#define XE_GEM_COH_NONE			1
+#define XE_GEM_COH_AT_LEAST_1WAY	2
+	__u16 coh_mode;
+
+	/**
+	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
+	 *
+	 * Supported values:
+	 *
+	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
+	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
+	 * be XE_GEM_COH_AT_LEAST_1WAY.
+	 *
+	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
+	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
+	 * use this.
+	 *
+	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
+	 * is permitted. Scanout surfaces are permitted to use this.
+	 *
+	 * MUST be left as zero for VRAM-only objects.
+	 */
+#define XE_GEM_CPU_CACHING_WB                      1
+#define XE_GEM_CPU_CACHING_WC                      2
+#define XE_GEM_CPU_CACHING_UC                      3
+	__u16 smem_cpu_caching;
 
 	/** @reserved: Reserved */
 	__u64 reserved[2];
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-xe] [PATCH v3 2/7] drm/xe: move pat_table into device info
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 1/7] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
@ 2023-09-25 13:21 ` Matthew Auld
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 3/7] drm/xe/pat: trim the tgl PAT table Matthew Auld
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Matthew Auld @ 2023-09-25 13:21 UTC (permalink / raw)
  To: intel-xe; +Cc: Matt Roper, Lucas De Marchi

We need to able to know the max pat_index range for a given platform, as
well being able to lookup the pat_index for a given platform in upcoming
vm_bind uapi, where userspace can directly provide the pat_index. Move
the platform definition of the pat_table into the device info with the
idea of encoding more information about each pat_index in a future
patch.

v2 (Lucas):
  - s/ENODEV/ENOTSUPP/
  - s/xe_pat_fill_info/xe_pat_init_early/
  - Prefer new pat substruct in xe_info.
v3 (Matt Roper):
  - Some small tweaks

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h | 15 +++++++++++
 drivers/gpu/drm/xe/xe_pat.c          | 39 ++++++++++++++++++----------
 drivers/gpu/drm/xe/xe_pat.h          |  1 +
 drivers/gpu/drm/xe/xe_pci.c          |  7 +++++
 4 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 27edc4cf0f68..cf941d56a6c9 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -239,6 +239,21 @@ struct xe_device {
 		/** @enable_display: display enabled */
 		u8 enable_display:1;
 
+		/**
+		 * @pat: Platform information related to PAT (Page Attribute
+		 * Table) settings.
+		 */
+		struct {
+			/**
+			 * @table: The PAT table encoding for every pat_index
+			 * supported by the platform.
+			 */
+			const u32 *table;
+
+			/** @n_entries: The number of entries in the @table */
+			int n_entries;
+		} pat;
+
 #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
 		const struct intel_display_device_info *display;
 		struct intel_display_runtime_info display_runtime;
diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index 32b0c922e7fa..aa2c5eb88266 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -106,24 +106,17 @@ static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
 	}
 }
 
-void xe_pat_init(struct xe_gt *gt)
+int xe_pat_init_early(struct xe_device *xe)
 {
-	struct xe_device *xe = gt_to_xe(gt);
-
 	if (xe->info.platform == XE_METEORLAKE) {
-		/*
-		 * SAMedia register offsets are adjusted by the write methods
-		 * and they target registers that are not MCR, while for normal
-		 * GT they are MCR
-		 */
-		if (xe_gt_is_media_type(gt))
-			program_pat(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
-		else
-			program_pat_mcr(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
+		xe->info.pat.table = mtl_pat_table;
+		xe->info.pat.n_entries = ARRAY_SIZE(mtl_pat_table);
 	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
-		program_pat_mcr(gt, pvc_pat_table, ARRAY_SIZE(pvc_pat_table));
+		xe->info.pat.table = pvc_pat_table;
+		xe->info.pat.n_entries = ARRAY_SIZE(pvc_pat_table);
 	} else if (GRAPHICS_VERx100(xe) <= 1210) {
-		program_pat(gt, tgl_pat_table, ARRAY_SIZE(tgl_pat_table));
+		xe->info.pat.table = tgl_pat_table;
+		xe->info.pat.n_entries = ARRAY_SIZE(tgl_pat_table);
 	} else {
 		/*
 		 * Going forward we expect to need new PAT settings for most
@@ -135,7 +128,25 @@ void xe_pat_init(struct xe_gt *gt)
 		 */
 		drm_err(&xe->drm, "Missing PAT table for platform with graphics version %d.%02d!\n",
 			GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100);
+		return -ENOTSUPP;
 	}
+
+	return 0;
+}
+
+void xe_pat_init(struct xe_gt *gt)
+{
+	struct xe_device *xe = gt_to_xe(gt);
+
+	/*
+	 * SAMedia register offsets are adjusted by the write methods
+	 * and they target registers that are not MCR, while for normal
+	 * GT they are MCR.
+	 */
+	if (xe_gt_is_media_type(gt) || GRAPHICS_VERx100(xe) < 1255)
+		program_pat(gt, xe->info.pat.table, xe->info.pat.n_entries);
+	else
+		program_pat_mcr(gt, xe->info.pat.table, xe->info.pat.n_entries);
 }
 
 void xe_pte_pat_init(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
index 5e71bd98d787..2f89503233b9 100644
--- a/drivers/gpu/drm/xe/xe_pat.h
+++ b/drivers/gpu/drm/xe/xe_pat.h
@@ -28,6 +28,7 @@
 struct xe_gt;
 struct xe_device;
 
+int xe_pat_init_early(struct xe_device *xe);
 void xe_pat_init(struct xe_gt *gt);
 void xe_pte_pat_init(struct xe_device *xe);
 unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache);
diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index a11163b89a3f..d170461bc981 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -22,6 +22,7 @@
 #include "xe_gt.h"
 #include "xe_macros.h"
 #include "xe_module.h"
+#include "xe_pat.h"
 #include "xe_pci_types.h"
 #include "xe_pm.h"
 #include "xe_step.h"
@@ -531,6 +532,7 @@ static int xe_info_init(struct xe_device *xe,
 	struct xe_tile *tile;
 	struct xe_gt *gt;
 	u8 id;
+	int err;
 
 	xe->info.platform = desc->platform;
 	xe->info.subplatform = subplatform_desc ?
@@ -579,6 +581,11 @@ static int xe_info_init(struct xe_device *xe,
 	xe->info.enable_display = IS_ENABLED(CONFIG_DRM_XE_DISPLAY) &&
 				  enable_display &&
 				  desc->has_display;
+
+	err = xe_pat_init_early(xe);
+	if (err)
+		return err;
+
 	/*
 	 * All platforms have at least one primary GT.  Any platform with media
 	 * version 13 or higher has an additional dedicated media GT.  And
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-xe] [PATCH v3 3/7] drm/xe/pat: trim the tgl PAT table
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 1/7] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 2/7] drm/xe: move pat_table into device info Matthew Auld
@ 2023-09-25 13:21 ` Matthew Auld
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 4/7] drm/xe/pat: annotate pat_index with coherency mode Matthew Auld
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Matthew Auld @ 2023-09-25 13:21 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matt Roper

We don't seem to use the 4-7 pat indexes, even though they are defined
by the HW. In the next patch userspace will be able to directly set the
pat_index as part of vm_bind and we don't want to allow setting 4-7.
Simplest is to just ignore them here.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_pat.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index aa2c5eb88266..fb490982fd99 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -38,10 +38,6 @@ static const u32 tgl_pat_table[] = {
 	[1] = TGL_PAT_WC,
 	[2] = TGL_PAT_WT,
 	[3] = TGL_PAT_UC,
-	[4] = TGL_PAT_WB,
-	[5] = TGL_PAT_WB,
-	[6] = TGL_PAT_WB,
-	[7] = TGL_PAT_WB,
 };
 
 static const u32 pvc_pat_table[] = {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-xe] [PATCH v3 4/7] drm/xe/pat: annotate pat_index with coherency mode
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
                   ` (2 preceding siblings ...)
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 3/7] drm/xe/pat: trim the tgl PAT table Matthew Auld
@ 2023-09-25 13:21 ` Matthew Auld
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 5/7] drm/xe/migrate: rather use pte_encode helpers Matthew Auld
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Matthew Auld @ 2023-09-25 13:21 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper

Future uapi needs to give userspace the ability to select the pat_index
for a given vm_bind. However we need to be able to extract the coherency
mode from the provided pat_index to ensure it matches the coherency mode
set at object creation. There are various security reasons for why this
matters.  However the pat_index itself is very platform specific, so
seems reasonable to annotate each platform definition of the pat table.
On some older platforms there is no explicit coherency mode, so we just
pick whatever makes sense.

v2:
  - Simplify with COH_AT_LEAST_1_WAY
  - Add some kernel-doc
v3 (Matt Roper):
  - Some small tweaks

Bspec: 45101, 44235 #xe
Bspec: 70552, 71582, 59400 #xe2
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  2 +-
 drivers/gpu/drm/xe/xe_pat.c          | 59 +++++++++++++++++-----------
 drivers/gpu/drm/xe/xe_pat.h          | 19 +++++++++
 3 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index cf941d56a6c9..0118b20416fa 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -248,7 +248,7 @@ struct xe_device {
 			 * @table: The PAT table encoding for every pat_index
 			 * supported by the platform.
 			 */
-			const u32 *table;
+			const struct xe_pat_table_entry *table;
 
 			/** @n_entries: The number of entries in the @table */
 			int n_entries;
diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index fb490982fd99..f4fceb3fa086 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -4,6 +4,8 @@
  */
 
 
+#include <drm/xe_drm.h>
+
 #include "regs/xe_reg_defs.h"
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
@@ -33,30 +35,30 @@
 #define TGL_PAT_WC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 1)
 #define TGL_PAT_UC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 0)
 
-static const u32 tgl_pat_table[] = {
-	[0] = TGL_PAT_WB,
-	[1] = TGL_PAT_WC,
-	[2] = TGL_PAT_WT,
-	[3] = TGL_PAT_UC,
+static const struct xe_pat_table_entry tgl_pat_table[] = {
+	[0] = { TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[1] = { TGL_PAT_WC, XE_GEM_COH_NONE },
+	[2] = { TGL_PAT_WT, XE_GEM_COH_NONE },
+	[3] = { TGL_PAT_UC, XE_GEM_COH_NONE },
 };
 
-static const u32 pvc_pat_table[] = {
-	[0] = TGL_PAT_UC,
-	[1] = TGL_PAT_WC,
-	[2] = TGL_PAT_WT,
-	[3] = TGL_PAT_WB,
-	[4] = PVC_PAT_CLOS(1) | TGL_PAT_WT,
-	[5] = PVC_PAT_CLOS(1) | TGL_PAT_WB,
-	[6] = PVC_PAT_CLOS(2) | TGL_PAT_WT,
-	[7] = PVC_PAT_CLOS(2) | TGL_PAT_WB,
+static const struct xe_pat_table_entry pvc_pat_table[] = {
+	[0] = { TGL_PAT_UC, XE_GEM_COH_NONE },
+	[1] = { TGL_PAT_WC, XE_GEM_COH_NONE },
+	[2] = { TGL_PAT_WT, XE_GEM_COH_NONE },
+	[3] = { TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[4] = { PVC_PAT_CLOS(1) | TGL_PAT_WT, XE_GEM_COH_NONE },
+	[5] = { PVC_PAT_CLOS(1) | TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[6] = { PVC_PAT_CLOS(2) | TGL_PAT_WT, XE_GEM_COH_NONE },
+	[7] = { PVC_PAT_CLOS(2) | TGL_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
 };
 
-static const u32 mtl_pat_table[] = {
-	[0] = MTL_PAT_0_WB,
-	[1] = MTL_PAT_1_WT,
-	[2] = MTL_PAT_3_UC,
-	[3] = MTL_PAT_0_WB | MTL_2_COH_1W,
-	[4] = MTL_PAT_0_WB | MTL_3_COH_2W,
+static const struct xe_pat_table_entry mtl_pat_table[] = {
+	[0] = { MTL_PAT_0_WB, XE_GEM_COH_NONE },
+	[1] = { MTL_PAT_1_WT, XE_GEM_COH_NONE },
+	[2] = { MTL_PAT_3_UC, XE_GEM_COH_NONE },
+	[3] = { MTL_PAT_0_WB | MTL_2_COH_1W, XE_GEM_COH_AT_LEAST_1WAY },
+	[4] = { MTL_PAT_0_WB | MTL_3_COH_2W, XE_GEM_COH_AT_LEAST_1WAY },
 };
 
 static const u32 xelp_pte_pat_table[XE_CACHE_LAST] = {
@@ -78,27 +80,35 @@ static const u32 xelpg_pte_pat_table[XE_CACHE_LAST] = {
 	[XE_CACHE_WB_1_WAY] = XELPG_PAT_WB_CACHE_1_WAY,
 };
 
+u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index)
+{
+	WARN_ON(pat_index >= xe->info.pat.n_entries);
+	return xe->info.pat.table[pat_index].coh_mode;
+}
+
 unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache)
 {
 	WARN_ON(cache >= XE_CACHE_LAST);
 	return (xe->pat_table).pte_pat_table[cache];
 }
 
-static void program_pat(struct xe_gt *gt, const u32 table[], int n_entries)
+static void program_pat(struct xe_gt *gt, const struct xe_pat_table_entry table[],
+			int n_entries)
 {
 	for (int i = 0; i < n_entries; i++) {
 		struct xe_reg reg = XE_REG(_PAT_INDEX(i));
 
-		xe_mmio_write32(gt, reg, table[i]);
+		xe_mmio_write32(gt, reg, table[i].value);
 	}
 }
 
-static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
+static void program_pat_mcr(struct xe_gt *gt, const struct xe_pat_table_entry table[],
+			    int n_entries)
 {
 	for (int i = 0; i < n_entries; i++) {
 		struct xe_reg_mcr reg_mcr = XE_REG_MCR(_PAT_INDEX(i));
 
-		xe_gt_mcr_multicast_write(gt, reg_mcr, table[i]);
+		xe_gt_mcr_multicast_write(gt, reg_mcr, table[i].value);
 	}
 }
 
@@ -111,6 +121,7 @@ int xe_pat_init_early(struct xe_device *xe)
 		xe->info.pat.table = pvc_pat_table;
 		xe->info.pat.n_entries = ARRAY_SIZE(pvc_pat_table);
 	} else if (GRAPHICS_VERx100(xe) <= 1210) {
+		WARN_ON_ONCE(!IS_DGFX(xe) && !xe->info.has_llc);
 		xe->info.pat.table = tgl_pat_table;
 		xe->info.pat.n_entries = ARRAY_SIZE(tgl_pat_table);
 	} else {
diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
index 2f89503233b9..e90653088709 100644
--- a/drivers/gpu/drm/xe/xe_pat.h
+++ b/drivers/gpu/drm/xe/xe_pat.h
@@ -28,9 +28,28 @@
 struct xe_gt;
 struct xe_device;
 
+/**
+ * struct xe_pat_table_entry - The pat_index encoding and other meta information.
+ */
+struct xe_pat_table_entry {
+	/**
+	 * @value: The platform specific value encoding the various memory
+	 * attributes (this maps to some fixed pat_index). So things like
+	 * caching, coherency, compression etc can be encoded here.
+	 */
+	u32 value;
+
+	/**
+	 * @coh_mode: The GPU coherency mode that @value maps to. Either
+	 * XE_GEM_COH_NONE or XE_GEM_COH_AT_LEAST_1WAY.
+	 */
+	u16 coh_mode;
+};
+
 int xe_pat_init_early(struct xe_device *xe);
 void xe_pat_init(struct xe_gt *gt);
 void xe_pte_pat_init(struct xe_device *xe);
 unsigned int xe_pat_get_index(struct xe_device *xe, enum xe_cache_level cache);
+u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-xe] [PATCH v3 5/7] drm/xe/migrate: rather use pte_encode helpers
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
                   ` (3 preceding siblings ...)
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 4/7] drm/xe/pat: annotate pat_index with coherency mode Matthew Auld
@ 2023-09-25 13:21 ` Matthew Auld
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 6/7] drm/xe: directly use pat_index for pte_encode Matthew Auld
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Matthew Auld @ 2023-09-25 13:21 UTC (permalink / raw)
  To: intel-xe; +Cc: Matt Roper, Lucas De Marchi

We need to avoid using stuff like PPAT_CACHED directly, which is no
longer going to work on newer platforms. At some point we can just
directly use the pat_index, but for now just use XE_CACHE_WB.

v2 (Matt Roper):
  - Rather use the 'level' local variable.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c |  7 ++++---
 drivers/gpu/drm/xe/xe_pt.c      | 12 ++++++------
 drivers/gpu/drm/xe/xe_pt.h      |  2 ++
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 46f88f3a8c58..ebae0117f577 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -257,8 +257,9 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 
 		level = 2;
 		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
-		flags = XE_PAGE_RW | XE_PAGE_PRESENT | PPAT_CACHED |
-			XE_PPGTT_PTE_DM | XE_PDPE_PS_1G;
+
+		flags = XE_PPGTT_PTE_DM;
+		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, level);
 
 		/*
 		 * Use 1GB pages, it shouldn't matter the physical amount of
@@ -493,7 +494,7 @@ static void emit_pte(struct xe_migrate *m,
 				addr += vram_region_gpu_offset(bo->ttm.resource);
 				addr |= XE_PPGTT_PTE_DM;
 			}
-			addr |= PPAT_CACHED | XE_PAGE_PRESENT | XE_PAGE_RW;
+			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 397ef2573c05..ddb4d9c33181 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -68,8 +68,8 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
 	return pde;
 }
 
-static u64 __pte_encode(u64 pte, enum xe_cache_level cache,
-			struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
+u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
+		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
 {
 	struct xe_device *xe = vm->xe;
 
@@ -113,7 +113,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
 		pte |= XE_PPGTT_PTE_DM;
 
-	return __pte_encode(pte, cache, vm, NULL, pt_level);
+	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
 }
 
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
@@ -595,9 +595,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 
 		XE_WARN_ON(xe_walk->va_curs_start != addr);
 
-		pte = __pte_encode(is_null ? 0 :
-				   xe_res_dma(curs) + xe_walk->dma_offset,
-				   xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
+		pte = __xe_pte_encode(is_null ? 0 :
+				      xe_res_dma(curs) + xe_walk->dma_offset,
+				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
 		pte |= xe_walk->default_pte;
 
 		/*
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 4a9143bc6628..0e66436d707d 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -49,5 +49,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
 
 u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
 		  u32 pt_level);
+u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
+		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
 
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-xe] [PATCH v3 6/7] drm/xe: directly use pat_index for pte_encode
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
                   ` (4 preceding siblings ...)
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 5/7] drm/xe/migrate: rather use pte_encode helpers Matthew Auld
@ 2023-09-25 13:21 ` Matthew Auld
  2023-09-25 22:08   ` Matt Roper
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 7/7] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Matthew Auld @ 2023-09-25 13:21 UTC (permalink / raw)
  To: intel-xe; +Cc: Matt Roper, Lucas De Marchi

In the next patch userspace will be able to directly set the pat_index
as part of vm_bind. To support this we need to get away from using
xe_cache_level in the low level routines and rather just use the
pat_index directly.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
 drivers/gpu/drm/xe/xe_ggtt.c          |  7 +++----
 drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
 drivers/gpu/drm/xe/xe_migrate.c       | 13 ++++++++-----
 drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++++------
 drivers/gpu/drm/xe/xe_pt.h            |  4 ++--
 drivers/gpu/drm/xe/xe_vm.c            |  8 ++------
 drivers/gpu/drm/xe/xe_vm_types.h      |  3 +--
 8 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index 6b4388bfbb31..d3bf4751a2d7 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
 	/* First part of the test, are we updating our pagetable bo with a new entry? */
 	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
 		  0xdeaddeadbeefbeef);
-	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
+	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
 	if (m->q->vm->flags & XE_VM_FLAG_64K)
 		expected |= XE_PTE_PS64;
 	if (xe_bo_is_vram(pt))
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 2002b6597dbf..1c447f1a4fbb 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
 		pte |= XE_GGTT_PTE_DM;
 
 	if ((ggtt->pat_encode).pte_encode)
-		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
+		pte = (ggtt->pat_encode).pte_encode(xe, pte,
+						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));
 
 	return pte;
 }
@@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
 }
 
 static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
-						enum xe_cache_level cache)
+				     u16 pat_index)
 {
-	u32 pat_index = xe_pat_get_index(xe, cache);
-
 	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
 
 	if (pat_index & BIT(0))
diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
index 7e55fac1a8a9..7981075bb228 100644
--- a/drivers/gpu/drm/xe/xe_ggtt_types.h
+++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
@@ -31,7 +31,7 @@ struct xe_ggtt {
 
 	struct {
 		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
-						enum xe_cache_level cache);
+				  u16 pat_index);
 	} pat_encode;
 };
 
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index ebae0117f577..9d201bdb9f25 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -25,6 +25,7 @@
 #include "xe_lrc.h"
 #include "xe_map.h"
 #include "xe_mocs.h"
+#include "xe_pat.h"
 #include "xe_pt.h"
 #include "xe_res_cursor.h"
 #include "xe_sched_job.h"
@@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
 	u32 map_ofs, level, i;
 	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
+	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
 	u64 entry;
 	int ret;
 
@@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 
 	/* Map the entire BO in our level 0 pt */
 	for (i = 0, level = 0; i < num_entries; level++) {
-		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
+		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
 
 		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
 
@@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		for (i = 0; i < batch->size;
 		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
 		     XE_PAGE_SIZE) {
-			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
+			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
 
 			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
 				  entry);
@@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
 
 		flags = XE_PPGTT_PTE_DM;
-		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, level);
+		flags = __xe_pte_encode(flags, pat_index, vm, NULL, level);
 
 		/*
 		 * Use 1GB pages, it shouldn't matter the physical amount of
@@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
 		     struct xe_res_cursor *cur,
 		     u32 size, struct xe_bo *bo)
 {
+	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
 	u32 ptes;
 	u64 ofs = at_pt * XE_PAGE_SIZE;
 	u64 cur_ofs;
@@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
 				addr += vram_region_gpu_offset(bo->ttm.resource);
 				addr |= XE_PPGTT_PTE_DM;
 			}
-			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
+			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 
@@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 
 			xe_tile_assert(tile, pt_bo->size == SZ_4K);
 
-			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
+			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 		}
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index ddb4d9c33181..5686ed9be175 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -11,6 +11,7 @@
 #include "xe_gt.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_migrate.h"
+#include "xe_pat.h"
 #include "xe_pt_types.h"
 #include "xe_pt_walk.h"
 #include "xe_res_cursor.h"
@@ -68,7 +69,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
 	return pde;
 }
 
-u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
+u64 __xe_pte_encode(u64 pte, u16 pat_index,
 		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
 {
 	struct xe_device *xe = vm->xe;
@@ -86,7 +87,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
 	else if (pt_level == 2)
 		pte |= XE_PDPE_PS_1G;
 
-	pte = vm->pat_encode.pte_encode(xe, pte, cache);
+	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
 
 	/* XXX: Does hw support 1 GiB pages? */
 	XE_WARN_ON(pt_level > 2);
@@ -104,7 +105,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
  *
  * Return: An encoded page-table entry. No errors.
  */
-u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
+u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
 		  u32 pt_level)
 {
 	u64 pte;
@@ -113,7 +114,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
 		pte |= XE_PPGTT_PTE_DM;
 
-	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
+	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
 }
 
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
@@ -126,7 +127,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
 
 	if (level == 0) {
 		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
-					  XE_CACHE_WB, 0);
+					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
 
 		return empty;
 	} else {
@@ -597,7 +598,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 
 		pte = __xe_pte_encode(is_null ? 0 :
 				      xe_res_dma(curs) + xe_walk->dma_offset,
-				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
+				      xe_pat_get_index(tile_to_xe(xe_walk->tile),
+						       xe_walk->cache),
+				      xe_walk->vm, xe_walk->vma, level);
 		pte |= xe_walk->default_pte;
 
 		/*
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 0e66436d707d..6d10823fca9b 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
 
 u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
 
-u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
+u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
 		  u32 pt_level);
-u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
+u64 __xe_pte_encode(u64 pte, u16 pat_index,
 		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
 
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index c53d5c1580df..28e6429488ee 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1196,10 +1196,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
 static void vm_destroy_work_func(struct work_struct *w);
 
 static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
-						enum xe_cache_level cache)
+				    u16 pat_index)
 {
-	u32 pat_index = xe_pat_get_index(xe, cache);
-
 	if (pat_index & BIT(0))
 		pte_pat |= BIT(3);
 
@@ -1217,10 +1215,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
 }
 
 static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
-						enum xe_cache_level cache)
+				     u16 pat_index)
 {
-	u32 pat_index = xe_pat_get_index(xe, cache);
-
 	if (pat_index & BIT(0))
 		pte_pat |= BIT(3);
 
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index c4d866840d6a..685c2179e533 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -340,8 +340,7 @@ struct xe_vm {
 	struct xe_file *xef;
 
 	struct {
-		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
-				  enum xe_cache_level cache);
+		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
 	} pat_encode;
 };
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-xe] [PATCH v3 7/7] drm/xe/uapi: support pat_index selection with vm_bind
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
                   ` (5 preceding siblings ...)
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 6/7] drm/xe: directly use pat_index for pte_encode Matthew Auld
@ 2023-09-25 13:21 ` Matthew Auld
  2023-09-25 13:24 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev4) Patchwork
  2023-09-25 19:47 ` [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Souza, Jose
  8 siblings, 0 replies; 16+ messages in thread
From: Matthew Auld @ 2023-09-25 13:21 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu,
	default avatarMatthew Auld, Matt Roper

Allow userspace to directly control the pat_index for a given vm
binding. This should allow directly controlling the coherency, caching
and potentially other stuff in the future for the ppGTT binding.

The exact meaning behind the pat_index is very platform specific (see
BSpec or PRMs) but effectively maps to some predefined memory
attributes. From the KMD pov we only care about the coherency that is
provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
The vm_bind coherency mode for the given pat_index needs to be at least
as coherent as the coh_mode that was set at object creation. For
platforms that lack the explicit coherency mode, we treat UC/WT/WC as
NONE and WB as AT_LEAST_1WAY.

For userptr mappings we lack a corresponding gem object, so the expected
coherency mode is instead implicit and must fall into either 1WAY or
2WAY. Trying to use NONE will be rejected by the kernel. For imported
dma-buf (from a different device) the coherency mode is also implicit
and must also be either 1WAY or 2WAY i.e AT_LEAST_1WAY.

As part of adding pat_index support with vm_bind we also need stop using
xe_cache_level and instead use the pat_index in various places. We still
make use of xe_cache_level, but only as a convenience for kernel
internal objectsi (internally it maps to some reasonable pat_index). For
now this is just a 1:1 conversion of the existing code, however for
platforms like MTL+ we might need to give more control through bo_create
or stop using WB on the CPU side if we need CPU access.

v2:
  - Undefined coh_mode(pat_index) can now be treated as programmer
    error. (Matt Roper)
  - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
    having to match exactly. This ensures imported dma-buf can always
    just use 1way (or even 2way), now that we also bundle 1way/2way into
    at_least_1way. We still require 1way/2way for external dma-buf, but
    the policy can now be the same for self-import, if desired.
  - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
  - Move as much of the pat_index validation as we can into
    vm_bind_ioctl_check_args. (José)
v3 (Matt Roper):
  - Split the pte_encode() refactoring into separate patch.

Bspec: 45101, 44235 #xe
Bspec: 70552, 71582, 59400 #xe2
Signed-off-by: default avatarMatthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c       | 14 ++------
 drivers/gpu/drm/xe/xe_vm.c       | 61 +++++++++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_vm_types.h |  7 ++++
 include/uapi/drm/xe_drm.h        | 43 +++++++++++++++++++++-
 4 files changed, 105 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 5686ed9be175..d5ed67be1a30 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -362,8 +362,6 @@ struct xe_pt_stage_bind_walk {
 	struct xe_vm *vm;
 	/** @tile: The tile we're building for. */
 	struct xe_tile *tile;
-	/** @cache: Desired cache level for the ptes */
-	enum xe_cache_level cache;
 	/** @default_pte: PTE flag only template. No address is associated */
 	u64 default_pte;
 	/** @dma_offset: DMA offset to add to the PTE. */
@@ -598,9 +596,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 
 		pte = __xe_pte_encode(is_null ? 0 :
 				      xe_res_dma(curs) + xe_walk->dma_offset,
-				      xe_pat_get_index(tile_to_xe(xe_walk->tile),
-						       xe_walk->cache),
-				      xe_walk->vm, xe_walk->vma, level);
+				      xe_walk->vma->pat_index, xe_walk->vm,
+				      xe_walk->vma, level);
 		pte |= xe_walk->default_pte;
 
 		/*
@@ -726,13 +723,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
 			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
 		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
-		xe_walk.cache = XE_CACHE_WB;
-	} else {
-		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
-			xe_walk.cache = XE_CACHE_WT;
-		else
-			xe_walk.cache = XE_CACHE_WB;
 	}
+
 	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
 		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 28e6429488ee..47ab659d4076 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -6,6 +6,7 @@
 #include "xe_vm.h"
 
 #include <linux/dma-fence-array.h>
+#include <linux/nospec.h>
 
 #include <drm/drm_exec.h>
 #include <drm/drm_print.h>
@@ -859,7 +860,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 				    u64 start, u64 end,
 				    bool read_only,
 				    bool is_null,
-				    u8 tile_mask)
+				    u8 tile_mask,
+				    u16 pat_index)
 {
 	struct xe_vma *vma;
 	struct xe_tile *tile;
@@ -898,6 +900,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			vma->tile_mask |= 0x1 << id;
 	}
 
+	vma->pat_index = pat_index;
+
 	if (vm->xe->info.platform == XE_PVC)
 		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
 
@@ -2304,7 +2308,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
 static struct drm_gpuva_ops *
 vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			 u64 bo_offset_or_userptr, u64 addr, u64 range,
-			 u32 operation, u8 tile_mask, u32 region)
+			 u32 operation, u8 tile_mask, u32 region, u16 pat_index)
 {
 	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
 	struct drm_gpuva_ops *ops;
@@ -2331,6 +2335,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
 			op->tile_mask = tile_mask;
+			op->pat_index = pat_index;
 			op->map.immediate =
 				operation & XE_VM_BIND_FLAG_IMMEDIATE;
 			op->map.read_only =
@@ -2358,6 +2363,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
 			op->tile_mask = tile_mask;
+			op->pat_index = pat_index;
 			op->prefetch.region = region;
 		}
 		break;
@@ -2400,7 +2406,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 }
 
 static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
-			      u8 tile_mask, bool read_only, bool is_null)
+			      u8 tile_mask, bool read_only, bool is_null,
+			      u16 pat_index)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
 	struct xe_vma *vma;
@@ -2416,7 +2423,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	vma = xe_vma_create(vm, bo, op->gem.offset,
 			    op->va.addr, op->va.addr +
 			    op->va.range - 1, read_only, is_null,
-			    tile_mask);
+			    tile_mask, pat_index);
 	if (bo)
 		xe_bo_unlock(bo);
 
@@ -2573,7 +2580,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 			vma = new_vma(vm, &op->base.map,
 				      op->tile_mask, op->map.read_only,
-				      op->map.is_null);
+				      op->map.is_null, op->pat_index);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto free_fence;
@@ -2601,7 +2608,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 				vma = new_vma(vm, op->base.remap.prev,
 					      op->tile_mask, read_only,
-					      is_null);
+					      is_null, op->pat_index);
 				if (IS_ERR(vma)) {
 					err = PTR_ERR(vma);
 					goto free_fence;
@@ -2637,7 +2644,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 				vma = new_vma(vm, op->base.remap.next,
 					      op->tile_mask, read_only,
-					      is_null);
+					      is_null, op->pat_index);
 				if (IS_ERR(vma)) {
 					err = PTR_ERR(vma);
 					goto free_fence;
@@ -3150,7 +3157,23 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 		u32 obj = (*bind_ops)[i].obj;
 		u64 obj_offset = (*bind_ops)[i].obj_offset;
 		u32 region = (*bind_ops)[i].region;
+		u16 pat_index = (*bind_ops)[i].pat_index;
 		bool is_null = op & XE_VM_BIND_FLAG_NULL;
+		u16 coh_mode;
+
+		if (XE_IOCTL_DBG(xe, pat_index >= xe->info.pat.n_entries)) {
+			err = -EINVAL;
+			goto free_bind_ops;
+		}
+
+		pat_index = array_index_nospec(pat_index,
+					       xe->info.pat.n_entries);
+		(*bind_ops)[i].pat_index = pat_index;
+		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
+		if (XE_WARN_ON(!coh_mode || coh_mode > XE_GEM_COH_AT_LEAST_1WAY)) {
+			err = -EINVAL;
+			goto free_bind_ops;
+		}
 
 		if (i == 0) {
 			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
@@ -3192,6 +3215,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
 		    XE_IOCTL_DBG(xe, obj &&
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
+		    XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE &&
+				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
 		    XE_IOCTL_DBG(xe, obj &&
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH) ||
 		    XE_IOCTL_DBG(xe, region &&
@@ -3340,6 +3365,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 addr = bind_ops[i].addr;
 		u32 obj = bind_ops[i].obj;
 		u64 obj_offset = bind_ops[i].obj_offset;
+		u16 pat_index = bind_ops[i].pat_index;
+		u16 coh_mode;
 
 		if (!obj)
 			continue;
@@ -3367,6 +3394,23 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 				goto put_obj;
 			}
 		}
+
+		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
+		if (bos[i]->coh_mode) {
+			if (XE_IOCTL_DBG(xe, coh_mode < bos[i]->coh_mode)) {
+				err = -EINVAL;
+				goto put_obj;
+			}
+		} else if (XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE)) {
+			/*
+			 * Imported dma-buf from a different device should
+			 * require 1way or 2way coherency since we don't know
+			 * how it was mapped on the CPU. Just assume is it
+			 * potentially cached on CPU side.
+			 */
+			err = -EINVAL;
+			goto put_obj;
+		}
 	}
 
 	if (args->num_syncs) {
@@ -3404,10 +3448,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 obj_offset = bind_ops[i].obj_offset;
 		u8 tile_mask = bind_ops[i].tile_mask;
 		u32 region = bind_ops[i].region;
+		u16 pat_index = bind_ops[i].pat_index;
 
 		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
 						  addr, range, op, tile_mask,
-						  region);
+						  region, pat_index);
 		if (IS_ERR(ops[i])) {
 			err = PTR_ERR(ops[i]);
 			ops[i] = NULL;
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 685c2179e533..02ac0b62fd8c 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -111,6 +111,11 @@ struct xe_vma {
 	 */
 	u8 tile_present;
 
+	/**
+	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
+	 */
+	u16 pat_index;
+
 	struct {
 		struct list_head rebind_link;
 	} notifier;
@@ -420,6 +425,8 @@ struct xe_vma_op {
 	struct async_op_fence *fence;
 	/** @tile_mask: gt mask for this operation */
 	u8 tile_mask;
+	/** @pat_index: The pat index to use for this operation. */
+	u16 pat_index;
 	/** @flags: operation flags */
 	enum xe_vma_op_flags flags;
 
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index fc2016ebe102..d38aec8d237c 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -605,8 +605,49 @@ struct drm_xe_vm_bind_op {
 	 */
 	__u32 obj;
 
+	/**
+	 * @pat_index: The platform defined @pat_index to use for this mapping.
+	 * The index basically maps to some predefined memory attributes,
+	 * including things like caching, coherency, compression etc.  The exact
+	 * meaning of the pat_index is platform specific and defined in the
+	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
+	 * encoded into the ppGTT PTE.
+	 *
+	 * For coherency the @pat_index needs to be least as coherent as
+	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
+	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
+	 * from the @pat_index and reject if there is a mismatch (see note below
+	 * for pre-MTL platforms).
+	 *
+	 * Note: On pre-MTL platforms there is only a caching mode and no
+	 * explicit coherency mode, but on such hardware there is always a
+	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
+	 * CPU caches even with the caching mode set as uncached.  It's only the
+	 * display engine that is incoherent (on dgpu it must be in VRAM which
+	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
+	 * consistent with newer platforms the KMD groups the different cache
+	 * levels into the following coherency buckets on all pre-MTL platforms:
+	 *
+	 *	ppGTT UC -> XE_GEM_COH_NONE
+	 *	ppGTT WC -> XE_GEM_COH_NONE
+	 *	ppGTT WT -> XE_GEM_COH_NONE
+	 *	ppGTT WB -> XE_GEM_COH_AT_LEAST_1WAY
+	 *
+	 * In practice UC/WC/WT should only ever used for scanout surfaces on
+	 * such platforms (or perhaps in general for dma-buf if shared with
+	 * another device) since it is only the display engine that is actually
+	 * incoherent.  Everything else should typically use WB given that we
+	 * have a shared-LLC.  On MTL+ this completely changes and the HW
+	 * defines the coherency mode as part of the @pat_index, where
+	 * incoherent GT access is possible.
+	 *
+	 * Note: For userptr and externally imported dma-buf the kernel expects
+	 * either 1WAY or 2WAY for the @pat_index.
+	 */
+	__u16 pat_index;
+
 	/** @pad: MBZ */
-	__u32 pad;
+	__u16 pad;
 
 	union {
 		/**
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev4)
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
                   ` (6 preceding siblings ...)
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 7/7] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
@ 2023-09-25 13:24 ` Patchwork
  2023-09-25 19:47 ` [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Souza, Jose
  8 siblings, 0 replies; 16+ messages in thread
From: Patchwork @ 2023-09-25 13:24 UTC (permalink / raw)
  To: Souza, Jose; +Cc: intel-xe

== Series Details ==

Series: PAT and cache coherency support (rev4)
URL   : https://patchwork.freedesktop.org/series/123027/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: 637824154 fixup! drm/xe/display: Implement display support
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_pat.c:135
error: drivers/gpu/drm/xe/xe_pat.c: patch does not apply
error: patch failed: drivers/gpu/drm/xe/xe_pat.h:28
error: drivers/gpu/drm/xe/xe_pat.h: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/xe/uapi: Add support for cache and coherency mode
Applying: drm/xe: move pat_table into device info
Patch failed at 0002 drm/xe: move pat_table into device info
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-xe] [PATCH v3 1/7] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 1/7] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
@ 2023-09-25 18:56   ` Souza, Jose
  2023-09-26  8:21     ` Matthew Auld
  0 siblings, 1 reply; 16+ messages in thread
From: Souza, Jose @ 2023-09-25 18:56 UTC (permalink / raw)
  To: intel-xe, Auld,  Matthew

On Mon, 2023-09-25 at 14:21 +0100, Matthew Auld wrote:
> From: Pallavi Mishra <pallavi.mishra@intel.com>
> 
> Allow userspace to specify the CPU caching mode to use for system memory
> in addition to coherency modes during object creation. Modify gem create
> handler and introduce xe_bo_create_user to replace xe_bo_create. In a
> later patch we will support setting the pat_index as part of vm_bind,
> where expectation is that the coherency mode extracted from the
> pat_index must match the one set at object creation.
> 
> v2
>   - s/smem_caching/smem_cpu_caching/ and
>     s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
>   - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
>     just cares that zeroing/swap-in can't be bypassed with the given
>     smem_caching mode. (Matt Roper)
>   - Fix broken range check for coh_mode and smem_cpu_caching and also
>     don't use constant value, but the already defined macros. (José)
>   - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
>   - Add note in kernel-doc for dgpu and coherency modes for system
>     memory. (José)
> v3 (José):
>   - Make sure to reject coh_mode == 0 for VRAM-only.
>   - Also make sure to actually pass along the (start, end) for
>     __xe_bo_create_locked.
> 
> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
> Co-authored-by: Matthew Auld <matthew.auld@intel.com>
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Cc: José Roberto de Souza <jose.souza@intel.com>
> Cc: Filip Hazubski <filip.hazubski@intel.com>
> Cc: Carl Zhang <carl.zhang@intel.com>
> Cc: Effie Yu <effie.yu@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
>  drivers/gpu/drm/xe/xe_bo.h       |   3 +-
>  drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
>  drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
>  include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
>  5 files changed, 158 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
> index 047eae071d03..f1ba0374901f 100644
> --- a/drivers/gpu/drm/xe/xe_bo.c
> +++ b/drivers/gpu/drm/xe/xe_bo.c
> @@ -326,7 +326,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>  	struct xe_device *xe = xe_bo_device(bo);
>  	struct xe_ttm_tt *tt;
>  	unsigned long extra_pages;
> -	enum ttm_caching caching = ttm_cached;
> +	enum ttm_caching caching;
>  	int err;
>  
>  	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
> @@ -340,13 +340,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>  		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
>  					   PAGE_SIZE);
>  
> +	switch (bo->smem_cpu_caching) {
> +	case XE_GEM_CPU_CACHING_WC:
> +		caching = ttm_write_combined;
> +		break;
> +	case XE_GEM_CPU_CACHING_UC:
> +		caching = ttm_uncached;
> +		break;
> +	default:
> +		caching = ttm_cached;
> +		break;
> +	}
> +
>  	/*
>  	 * Display scanout is always non-coherent with the CPU cache.
>  	 *
>  	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
>  	 * require a CPU:WC mapping.
>  	 */
> -	if (bo->flags & XE_BO_SCANOUT_BIT ||
> +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
>  	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
>  		caching = ttm_write_combined;
>  
> @@ -1190,9 +1202,10 @@ void xe_bo_free(struct xe_bo *bo)
>  	kfree(bo);
>  }
>  
> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				    struct xe_tile *tile, struct dma_resv *resv,
>  				    struct ttm_lru_bulk_move *bulk, size_t size,
> +				    u16 smem_cpu_caching, u16 coh_mode,
>  				    enum ttm_bo_type type, u32 flags)
>  {
>  	struct ttm_operation_ctx ctx = {
> @@ -1230,6 +1243,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  	bo->tile = tile;
>  	bo->size = size;
>  	bo->flags = flags;
> +	bo->smem_cpu_caching = smem_cpu_caching;
> +	bo->coh_mode = coh_mode;
>  	bo->ttm.base.funcs = &xe_gem_object_funcs;
>  	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
>  	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
> @@ -1315,10 +1330,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
>  }
>  
>  struct xe_bo *
> -xe_bo_create_locked_range(struct xe_device *xe,
> -			  struct xe_tile *tile, struct xe_vm *vm,
> -			  size_t size, u64 start, u64 end,
> -			  enum ttm_bo_type type, u32 flags)
> +__xe_bo_create_locked(struct xe_device *xe,
> +		      struct xe_tile *tile, struct xe_vm *vm,
> +		      size_t size, u64 start, u64 end,
> +		      u16 smem_cpu_caching, u16 coh_mode,
> +		      enum ttm_bo_type type, u32 flags)
>  {
>  	struct xe_bo *bo = NULL;
>  	int err;
> @@ -1339,10 +1355,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
>  		}
>  	}
>  
> -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
> +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>  				   vm && !xe_vm_in_fault_mode(vm) &&
>  				   flags & XE_BO_CREATE_USER_BIT ?
>  				   &vm->lru_bulk_move : NULL, size,
> +				   smem_cpu_caching, coh_mode,
>  				   type, flags);
>  	if (IS_ERR(bo))
>  		return bo;
> @@ -1376,11 +1393,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
>  	return ERR_PTR(err);
>  }
>  
> +struct xe_bo *
> +xe_bo_create_locked_range(struct xe_device *xe,
> +			  struct xe_tile *tile, struct xe_vm *vm,
> +			  size_t size, u64 start, u64 end,
> +			  enum ttm_bo_type type, u32 flags)
> +{
> +	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, 0, type, flags);
> +}
> +
>  struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>  				  struct xe_vm *vm, size_t size,
>  				  enum ttm_bo_type type, u32 flags)
>  {
> -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
> +}
> +
> +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
> +				       struct xe_vm *vm, size_t size,
> +				       u16 smem_cpu_caching, u16 coh_mode,
> +				       enum ttm_bo_type type,
> +				       u32 flags)
> +{
> +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
> +						 smem_cpu_caching, coh_mode, type,
> +						 flags | XE_BO_CREATE_USER_BIT);
> +	if (!IS_ERR(bo))
> +		xe_bo_unlock_vm_held(bo);
> +
> +	return bo;
>  }
>  
>  struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
> @@ -1763,11 +1804,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  	struct drm_xe_gem_create *args = data;
>  	struct xe_vm *vm = NULL;
>  	struct xe_bo *bo;
> -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
> +	unsigned int bo_flags;

bo_flags is not initialized then bits are set with or on it.

>  	u32 handle;
>  	int err;
>  
> -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
> +	if (XE_IOCTL_DBG(xe, args->extensions) ||
>  	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>  		return -EINVAL;
>  
> @@ -1809,6 +1850,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>  	}
>  
> +	if (XE_IOCTL_DBG(xe, !args->coh_mode))
> +		return -EINVAL;
> +
> +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
> +		return -EINVAL;
> +
> +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
> +		return -EINVAL;
> +
> +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
> +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
> +			return -EINVAL;
> +
> +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
> +				 bo_flags & XE_BO_SCANOUT_BIT &&
> +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> +			return -EINVAL;
> +
> +		if (args->coh_mode == XE_GEM_COH_NONE) {
> +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
> +				return -EINVAL;
> +		}
> +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
> +		return -EINVAL;
> +	}

still believe that smem_cpu_caching should != than 0 for lmem, please take a look at my reply in the previous version.

> +
>  	if (args->vm_id) {
>  		vm = xe_vm_lookup(xef, args->vm_id);
>  		if (XE_IOCTL_DBG(xe, !vm))
> @@ -1820,8 +1887,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>  		}
>  	}
>  
> -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
> -			  bo_flags);
> +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
> +			       args->smem_cpu_caching, args->coh_mode,
> +			       ttm_bo_type_device,
> +			       bo_flags);
>  	if (IS_ERR(bo)) {
>  		err = PTR_ERR(bo);
>  		goto out_vm;
> @@ -2113,10 +2182,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>  	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
>  			   page_size);
>  
> -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
> -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> -			  XE_BO_NEEDS_CPU_ACCESS);
> +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
> +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
> +			       ttm_bo_type_device,
> +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
> +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
> +			       XE_BO_NEEDS_CPU_ACCESS);
>  	if (IS_ERR(bo))
>  		return PTR_ERR(bo);
>  
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index e3c90d45e723..b3c7e33ed39a 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -83,9 +83,10 @@ struct sg_table;
>  struct xe_bo *xe_bo_alloc(void);
>  void xe_bo_free(struct xe_bo *bo);
>  
> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>  				    struct xe_tile *tile, struct dma_resv *resv,
>  				    struct ttm_lru_bulk_move *bulk, size_t size,
> +				    u16 smem_cpu_caching, u16 coh_mode,
>  				    enum ttm_bo_type type, u32 flags);
>  struct xe_bo *
>  xe_bo_create_locked_range(struct xe_device *xe,
> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
> index 051fe990c133..d44ccacc683f 100644
> --- a/drivers/gpu/drm/xe/xe_bo_types.h
> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
> @@ -76,6 +76,16 @@ struct xe_bo {
>  	struct llist_node freed;
>  	/** @created: Whether the bo has passed initial creation */
>  	bool created;
> +	/**
> +	 * @coh_mode: Coherency setting. Currently only used for userspace
> +	 * objects.
> +	 */
> +	u16 coh_mode;
> +	/**
> +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
> +	 * userspace objects.
> +	 */
> +	u16 smem_cpu_caching;
>  };
>  
>  #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
> index cfde3be3b0dc..9da5cffeef13 100644
> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
> @@ -214,8 +214,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>  	int ret;
>  
>  	dma_resv_lock(resv, NULL);
> -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
> +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
> +				    0, 0, /* Will require 1way or 2way for vm_bind */
> +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>  	if (IS_ERR(bo)) {
>  		ret = PTR_ERR(bo);
>  		goto error;
> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
> index d48d8e3c898c..fc2016ebe102 100644
> --- a/include/uapi/drm/xe_drm.h
> +++ b/include/uapi/drm/xe_drm.h
> @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
>  	 */
>  	__u32 handle;
>  
> -	/** @pad: MBZ */
> -	__u32 pad;
> +	/**
> +	 * @coh_mode: The coherency mode for this object. This will limit the
> +	 * possible @smem_caching values.
> +	 *
> +	 * Supported values:
> +	 *
> +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
> +	 * CPU. CPU caches are not snooped.
> +	 *
> +	 * XE_GEM_COH_AT_LEAST_1WAY:
> +	 *
> +	 * CPU-GPU coherency must be at least 1WAY.
> +	 *
> +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
> +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
> +	 * caches.
> +	 *
> +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
> +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
> +	 *
> +	 * Note: On dgpu the GPU device never caches system memory (outside of
> +	 * the special system-memory-read-only cache, which is anyway flushed by
> +	 * KMD when nuking TLBs for a given object so should be no concern to
> +	 * userspace). The device should be thought of as always 1WAY coherent,
> +	 * with the addition that the GPU never caches system memory. At least
> +	 * on current dgpu HW there is no way to turn off snooping so likely the
> +	 * different coherency modes of the pat_index make no difference for
> +	 * system memory.
> +	 */
> +#define XE_GEM_COH_NONE			1
> +#define XE_GEM_COH_AT_LEAST_1WAY	2
> +	__u16 coh_mode;
> +
> +	/**
> +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
> +	 *
> +	 * Supported values:
> +	 *
> +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
> +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
> +	 * be XE_GEM_COH_AT_LEAST_1WAY.
> +	 *
> +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
> +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
> +	 * use this.
> +	 *
> +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
> +	 * is permitted. Scanout surfaces are permitted to use this.
> +	 *
> +	 * MUST be left as zero for VRAM-only objects.
> +	 */
> +#define XE_GEM_CPU_CACHING_WB                      1
> +#define XE_GEM_CPU_CACHING_WC                      2
> +#define XE_GEM_CPU_CACHING_UC                      3
> +	__u16 smem_cpu_caching;
>  
>  	/** @reserved: Reserved */
>  	__u64 reserved[2];


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support
  2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
                   ` (7 preceding siblings ...)
  2023-09-25 13:24 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev4) Patchwork
@ 2023-09-25 19:47 ` Souza, Jose
  2023-09-26  8:23   ` Matthew Auld
  8 siblings, 1 reply; 16+ messages in thread
From: Souza, Jose @ 2023-09-25 19:47 UTC (permalink / raw)
  To: intel-xe, Auld,  Matthew

On Mon, 2023-09-25 at 14:21 +0100, Matthew Auld wrote:
> Branch available here (lightly tested):
> https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
> 
> Series still needs some more testing. Also note that the series directly depends
> on the WIP patch here: https://patchwork.freedesktop.org/series/122708/
> 
> Goal here is to allow userspace to directly control the pat_index when mapping
> memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
> is very much needed on newer igpu platforms which allow incoherent GT access,
> where the choice over the cache level and expected coherency is best left to
> userspace depending on their usecase.  In the future there may also be other
> stuff encoded in the pat_index, so giving userspace direct control will also be
> needed there.
> 
> To support this we added new gem_create uAPI for selecting the CPU cache
> mode to use for system memory, including the expected GPU coherency mode. There
> are various restrictions here for the selected coherency mode and compatible CPU
> cache modes.  With that in place the actual pat_index can now be provided as
> part of vm_bind. The only restriction is that the coherency mode of the
> pat_index must be at least as coherent as the gem_create coherency mode. There
> are also some special cases like with userptr and dma-buf.
> 
> v2:
>   - Loads of improvements/tweaks. Main changes are to now allow
>     gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
>     exactly. This simplifies the dma-buf policy from userspace pov. Also we now
>     only consider COH_NONE and COH_AT_LEAST_1WAY.
> v3:
>   - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
>     fixes.
> 

Thanks for the fixes, display is now working in TGL and DG2 but getting a new crash in MTL:


[  259.478814] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]  blocks   16,  97,  97, 129, 129, 161,   0,   0,  30,  33,   47 ->   62,
93,  93, 123, 123, 154,   0,   0, 137,  62,  137
[  259.478936] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] min_ddb   19, 108, 108, 143, 143, 179,   0,   0,  31,  38,   48 ->  123,
184, 184, 184, 184, 245,   0,   0, 138, 123,  138
[  259.479089] ------------[ cut here ]------------
[  259.479093] WARNING: CPU: 2 PID: 2057 at drivers/gpu/drm/xe/display/xe_fb_pin.c:199 __xe_pin_fb_vma+0x3dc/0x840 [xe]
[  259.479239] Modules linked in: xe drm_ttm_helper drm_exec gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy ttm drm_display_helper
x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul wmi_bmof pmt_telemetry pmt_class ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg
snd_hda_codec kvm_intel snd_hwdep snd_hda_core e1000e mei_me ptp snd_pcm i2c_i801 mei i2c_smbus pps_core intel_vsec video wmi pinctrl_meteorlake fuse
[  259.479327] CPU: 2 PID: 2057 Comm: gnome-shell Tainted: G        W          6.5.0-rc7+zeh-xe+ #1109
[  259.479333] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-M LP5x CONF1 RVP, BIOS MTLMFWI1.R00.3323.D84.2308220916 08/22/2023
[  259.479337] RIP: 0010:__xe_pin_fb_vma+0x3dc/0x840 [xe]
[  259.479498] Code: 4d 89 f4 48 8b 44 24 08 49 8d b4 24 28 03 00 00 b9 16 00 00 00 4c 89 60 08 48 8d 78 10 f3 48 a5 4c 8b 6c 24 08 e9 2c fd ff ff
<0f> 0b 49 c7 c5 ed ff ff ff e9 14 fd ff ff 48 8b 7c 24 28 89 14 24
[  259.479503] RSP: 0018:ffffc9000604bb88 EFLAGS: 00010246
[  259.479509] RAX: ffff888196c9f190 RBX: ffff8881a222dc00 RCX: 0000000000000001
[  259.479513] RDX: 0000000000000000 RSI: ffffffff826a896e RDI: ffffffff826ac710
[  259.479517] RBP: ffff888183823800 R08: 0000000000000128 R09: ffff8881b2eff4d8
[  259.479521] R10: ffffc9000604bac8 R11: 0000000000000002 R12: ffff8881a222dc00
[  259.479526] R13: ffff888102ab0000 R14: 0000000000000000 R15: 0000563e9575fa00
[  259.479530] FS:  00007f4dbdf5f5c0(0000) GS:ffff88846e100000(0000) knlGS:0000000000000000
[  259.479535] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  259.479540] CR2: 00000e17254bc000 CR3: 0000000117bb8005 CR4: 0000000000770ee0
[  259.479545] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  259.479549] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
[  259.479552] PKRU: 55555554
[  259.479556] Call Trace:
[  259.479560]  <TASK>
[  259.479564]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
[  259.479708]  ? __warn+0x7c/0x170
[  259.479716]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
[  259.479855]  ? report_bug+0x18d/0x1c0
[  259.479865]  ? handle_bug+0x3a/0x70
[  259.479873]  ? exc_invalid_op+0x13/0x60
[  259.479880]  ? asm_exc_invalid_op+0x16/0x20
[  259.479894]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
[  259.480030]  ? __xe_pin_fb_vma+0x34/0x840 [xe]
[  259.480160]  ? lock_acquire+0xd3/0x2d0
[  259.480170]  ? find_held_lock+0x2b/0x80
[  259.480179]  intel_plane_pin_fb+0x34/0x90 [xe]
[  259.480314]  intel_prepare_plane_fb+0x2c/0x70 [xe]
[  259.480469]  drm_atomic_helper_prepare_planes+0x6b/0x210
[  259.480481]  intel_atomic_commit+0x4d/0x360 [xe]
[  259.480666]  drm_mode_atomic_ioctl+0x7c7/0xbd0
[  259.480688]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[  259.480696]  drm_ioctl_kernel+0xc0/0x170
[  259.480705]  drm_ioctl+0x212/0x470
[  259.480711]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[  259.480729]  __x64_sys_ioctl+0x8d/0xb0
[  259.480739]  do_syscall_64+0x38/0x90
[  259.480746]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  259.480752] RIP: 0033:0x7f4dc211aaff
[  259.480756] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05
<41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[  259.480761] RSP: 002b:00007ffd6d2b7940 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  259.480767] RAX: ffffffffffffffda RBX: 00007ffd6d2b79e0 RCX: 00007f4dc211aaff
[  259.480771] RDX: 00007ffd6d2b79e0 RSI: 00000000c03864bc RDI: 0000000000000009
[  259.480774] RBP: 00000000c03864bc R08: 0000000000000000 R09: 0000000000000000
[  259.480777] R10: 00007f4dc221a2f0 R11: 0000000000000246 R12: 0000563e988b4590
[  259.480781] R13: 0000000000000009 R14: 0000563e988b4650 R15: 0000563e9814f060
[  259.480794]  </TASK>
[  259.480797] irq event stamp: 2049057
[  259.480800] hardirqs last  enabled at (2049063): [<ffffffff811e2369>] __up_console_sem+0x59/0x80
[  259.480808] hardirqs last disabled at (2049068): [<ffffffff811e234e>] __up_console_sem+0x3e/0x80
[  259.480815] softirqs last  enabled at (2048430): [<ffffffff8114f3aa>] irq_exit_rcu+0x8a/0xe0
[  259.480821] softirqs last disabled at (2048423): [<ffffffff8114f3aa>] irq_exit_rcu+0x8a/0xe0
[  259.480826] ---[ end trace 0000000000000000 ]---
[  259.494838] xe 0000:00:02.0: [drm:drm_mode_addfb2] [FB:219]
[  259.494943] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLAN

That is: __xe_pin_fb_vma()
if (XE_WARN_ON(view->type == I915_GTT_VIEW_REMAPPED)) {




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-xe] [PATCH v3 6/7] drm/xe: directly use pat_index for pte_encode
  2023-09-25 13:21 ` [Intel-xe] [PATCH v3 6/7] drm/xe: directly use pat_index for pte_encode Matthew Auld
@ 2023-09-25 22:08   ` Matt Roper
  2023-09-26 19:29     ` Lucas De Marchi
  0 siblings, 1 reply; 16+ messages in thread
From: Matt Roper @ 2023-09-25 22:08 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Lucas De Marchi, intel-xe

On Mon, Sep 25, 2023 at 02:21:19PM +0100, Matthew Auld wrote:
> In the next patch userspace will be able to directly set the pat_index
> as part of vm_bind. To support this we need to get away from using
> xe_cache_level in the low level routines and rather just use the
> pat_index directly.

It might be worth adding a kunit test (in a separate patch) that ensures
every platform's PAT table can provide a valid index for XE_CACHE_WB and 
XE_CACHE_UC, which I assume are the two the kernel might want to perform
lookups on in general code.

At some point we may want to replace xe_pat_get_index() completely and
just cache the indices for the "important" PAT settings at init that the
kernel will want to use elsewhere in various places (e.g., something
like gt->pat.cached and gt->pat.uncached).

> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
>  drivers/gpu/drm/xe/xe_ggtt.c          |  7 +++----
>  drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
>  drivers/gpu/drm/xe/xe_migrate.c       | 13 ++++++++-----
>  drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++++------
>  drivers/gpu/drm/xe/xe_pt.h            |  4 ++--
>  drivers/gpu/drm/xe/xe_vm.c            |  8 ++------
>  drivers/gpu/drm/xe/xe_vm_types.h      |  3 +--
>  8 files changed, 27 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index 6b4388bfbb31..d3bf4751a2d7 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>  	/* First part of the test, are we updating our pagetable bo with a new entry? */
>  	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
>  		  0xdeaddeadbeefbeef);
> -	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
> +	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>  	if (m->q->vm->flags & XE_VM_FLAG_64K)
>  		expected |= XE_PTE_PS64;
>  	if (xe_bo_is_vram(pt))
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index 2002b6597dbf..1c447f1a4fbb 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
>  		pte |= XE_GGTT_PTE_DM;
>  
>  	if ((ggtt->pat_encode).pte_encode)
> -		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
> +		pte = (ggtt->pat_encode).pte_encode(xe, pte,
> +						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));

Hmm, looks like this series is still based on top of Ravi's old series?
I believe his series was going to get reworked to remove stuff like
XE_CACHE_WB_1_WAY, so some of the underlying code here is likely to
change.  I think Lucas was going to help with reworking that, so he
might be able to help reconcile the updates to Ravi's series with the
changes on top that your series is making.


Matt

>  
>  	return pte;
>  }
> @@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
>  }
>  
>  static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				     u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
>  
>  	if (pat_index & BIT(0))
> diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
> index 7e55fac1a8a9..7981075bb228 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
> +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
> @@ -31,7 +31,7 @@ struct xe_ggtt {
>  
>  	struct {
>  		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache);
> +				  u16 pat_index);
>  	} pat_encode;
>  };
>  
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index ebae0117f577..9d201bdb9f25 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -25,6 +25,7 @@
>  #include "xe_lrc.h"
>  #include "xe_map.h"
>  #include "xe_mocs.h"
> +#include "xe_pat.h"
>  #include "xe_pt.h"
>  #include "xe_res_cursor.h"
>  #include "xe_sched_job.h"
> @@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
>  	u32 map_ofs, level, i;
>  	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
> +	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
>  	u64 entry;
>  	int ret;
>  
> @@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  
>  	/* Map the entire BO in our level 0 pt */
>  	for (i = 0, level = 0; i < num_entries; level++) {
> -		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
> +		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
>  
>  		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>  
> @@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		for (i = 0; i < batch->size;
>  		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
>  		     XE_PAGE_SIZE) {
> -			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
> +			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
>  
>  			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
>  				  entry);
> @@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>  
>  		flags = XE_PPGTT_PTE_DM;
> -		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, level);
> +		flags = __xe_pte_encode(flags, pat_index, vm, NULL, level);
>  
>  		/*
>  		 * Use 1GB pages, it shouldn't matter the physical amount of
> @@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
>  		     struct xe_res_cursor *cur,
>  		     u32 size, struct xe_bo *bo)
>  {
> +	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
>  	u32 ptes;
>  	u64 ofs = at_pt * XE_PAGE_SIZE;
>  	u64 cur_ofs;
> @@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
>  				addr += vram_region_gpu_offset(bo->ttm.resource);
>  				addr |= XE_PPGTT_PTE_DM;
>  			}
> -			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
> +			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  
> @@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>  
>  			xe_tile_assert(tile, pt_bo->size == SZ_4K);
>  
> -			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
> +			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  		}
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index ddb4d9c33181..5686ed9be175 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -11,6 +11,7 @@
>  #include "xe_gt.h"
>  #include "xe_gt_tlb_invalidation.h"
>  #include "xe_migrate.h"
> +#include "xe_pat.h"
>  #include "xe_pt_types.h"
>  #include "xe_pt_walk.h"
>  #include "xe_res_cursor.h"
> @@ -68,7 +69,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
>  	return pde;
>  }
>  
> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>  		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
>  {
>  	struct xe_device *xe = vm->xe;
> @@ -86,7 +87,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>  	else if (pt_level == 2)
>  		pte |= XE_PDPE_PS_1G;
>  
> -	pte = vm->pat_encode.pte_encode(xe, pte, cache);
> +	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
>  
>  	/* XXX: Does hw support 1 GiB pages? */
>  	XE_WARN_ON(pt_level > 2);
> @@ -104,7 +105,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>   *
>   * Return: An encoded page-table entry. No errors.
>   */
> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>  		  u32 pt_level)
>  {
>  	u64 pte;
> @@ -113,7 +114,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>  		pte |= XE_PPGTT_PTE_DM;
>  
> -	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
> +	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
>  }
>  
>  static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
> @@ -126,7 +127,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>  
>  	if (level == 0) {
>  		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
> -					  XE_CACHE_WB, 0);
> +					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
>  
>  		return empty;
>  	} else {
> @@ -597,7 +598,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  
>  		pte = __xe_pte_encode(is_null ? 0 :
>  				      xe_res_dma(curs) + xe_walk->dma_offset,
> -				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
> +				      xe_pat_get_index(tile_to_xe(xe_walk->tile),
> +						       xe_walk->cache),
> +				      xe_walk->vm, xe_walk->vma, level);
>  		pte |= xe_walk->default_pte;
>  
>  		/*
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 0e66436d707d..6d10823fca9b 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>  
>  u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
>  
> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>  		  u32 pt_level);
> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>  		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
>  
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index c53d5c1580df..28e6429488ee 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1196,10 +1196,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
>  static void vm_destroy_work_func(struct work_struct *w);
>  
>  static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				    u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	if (pat_index & BIT(0))
>  		pte_pat |= BIT(3);
>  
> @@ -1217,10 +1215,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>  }
>  
>  static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
> -						enum xe_cache_level cache)
> +				     u16 pat_index)
>  {
> -	u32 pat_index = xe_pat_get_index(xe, cache);
> -
>  	if (pat_index & BIT(0))
>  		pte_pat |= BIT(3);
>  
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index c4d866840d6a..685c2179e533 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -340,8 +340,7 @@ struct xe_vm {
>  	struct xe_file *xef;
>  
>  	struct {
> -		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
> -				  enum xe_cache_level cache);
> +		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
>  	} pat_encode;
>  };
>  
> -- 
> 2.41.0
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-xe] [PATCH v3 1/7] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-25 18:56   ` Souza, Jose
@ 2023-09-26  8:21     ` Matthew Auld
  0 siblings, 0 replies; 16+ messages in thread
From: Matthew Auld @ 2023-09-26  8:21 UTC (permalink / raw)
  To: Souza, Jose, intel-xe

On 25/09/2023 19:56, Souza, Jose wrote:
> On Mon, 2023-09-25 at 14:21 +0100, Matthew Auld wrote:
>> From: Pallavi Mishra <pallavi.mishra@intel.com>
>>
>> Allow userspace to specify the CPU caching mode to use for system memory
>> in addition to coherency modes during object creation. Modify gem create
>> handler and introduce xe_bo_create_user to replace xe_bo_create. In a
>> later patch we will support setting the pat_index as part of vm_bind,
>> where expectation is that the coherency mode extracted from the
>> pat_index must match the one set at object creation.
>>
>> v2
>>    - s/smem_caching/smem_cpu_caching/ and
>>      s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
>>    - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
>>      just cares that zeroing/swap-in can't be bypassed with the given
>>      smem_caching mode. (Matt Roper)
>>    - Fix broken range check for coh_mode and smem_cpu_caching and also
>>      don't use constant value, but the already defined macros. (José)
>>    - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
>>    - Add note in kernel-doc for dgpu and coherency modes for system
>>      memory. (José)
>> v3 (José):
>>    - Make sure to reject coh_mode == 0 for VRAM-only.
>>    - Also make sure to actually pass along the (start, end) for
>>      __xe_bo_create_locked.
>>
>> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
>> Co-authored-by: Matthew Auld <matthew.auld@intel.com>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> Cc: José Roberto de Souza <jose.souza@intel.com>
>> Cc: Filip Hazubski <filip.hazubski@intel.com>
>> Cc: Carl Zhang <carl.zhang@intel.com>
>> Cc: Effie Yu <effie.yu@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_bo.c       | 105 ++++++++++++++++++++++++++-----
>>   drivers/gpu/drm/xe/xe_bo.h       |   3 +-
>>   drivers/gpu/drm/xe/xe_bo_types.h |  10 +++
>>   drivers/gpu/drm/xe/xe_dma_buf.c  |   5 +-
>>   include/uapi/drm/xe_drm.h        |  57 ++++++++++++++++-
>>   5 files changed, 158 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index 047eae071d03..f1ba0374901f 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -326,7 +326,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>   	struct xe_device *xe = xe_bo_device(bo);
>>   	struct xe_ttm_tt *tt;
>>   	unsigned long extra_pages;
>> -	enum ttm_caching caching = ttm_cached;
>> +	enum ttm_caching caching;
>>   	int err;
>>   
>>   	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
>> @@ -340,13 +340,25 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
>>   		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
>>   					   PAGE_SIZE);
>>   
>> +	switch (bo->smem_cpu_caching) {
>> +	case XE_GEM_CPU_CACHING_WC:
>> +		caching = ttm_write_combined;
>> +		break;
>> +	case XE_GEM_CPU_CACHING_UC:
>> +		caching = ttm_uncached;
>> +		break;
>> +	default:
>> +		caching = ttm_cached;
>> +		break;
>> +	}
>> +
>>   	/*
>>   	 * Display scanout is always non-coherent with the CPU cache.
>>   	 *
>>   	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
>>   	 * require a CPU:WC mapping.
>>   	 */
>> -	if (bo->flags & XE_BO_SCANOUT_BIT ||
>> +	if ((!bo->smem_cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
>>   	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
>>   		caching = ttm_write_combined;
>>   
>> @@ -1190,9 +1202,10 @@ void xe_bo_free(struct xe_bo *bo)
>>   	kfree(bo);
>>   }
>>   
>> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   				    struct xe_tile *tile, struct dma_resv *resv,
>>   				    struct ttm_lru_bulk_move *bulk, size_t size,
>> +				    u16 smem_cpu_caching, u16 coh_mode,
>>   				    enum ttm_bo_type type, u32 flags)
>>   {
>>   	struct ttm_operation_ctx ctx = {
>> @@ -1230,6 +1243,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   	bo->tile = tile;
>>   	bo->size = size;
>>   	bo->flags = flags;
>> +	bo->smem_cpu_caching = smem_cpu_caching;
>> +	bo->coh_mode = coh_mode;
>>   	bo->ttm.base.funcs = &xe_gem_object_funcs;
>>   	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
>>   	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
>> @@ -1315,10 +1330,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
>>   }
>>   
>>   struct xe_bo *
>> -xe_bo_create_locked_range(struct xe_device *xe,
>> -			  struct xe_tile *tile, struct xe_vm *vm,
>> -			  size_t size, u64 start, u64 end,
>> -			  enum ttm_bo_type type, u32 flags)
>> +__xe_bo_create_locked(struct xe_device *xe,
>> +		      struct xe_tile *tile, struct xe_vm *vm,
>> +		      size_t size, u64 start, u64 end,
>> +		      u16 smem_cpu_caching, u16 coh_mode,
>> +		      enum ttm_bo_type type, u32 flags)
>>   {
>>   	struct xe_bo *bo = NULL;
>>   	int err;
>> @@ -1339,10 +1355,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
>>   		}
>>   	}
>>   
>> -	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>> +	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
>>   				   vm && !xe_vm_in_fault_mode(vm) &&
>>   				   flags & XE_BO_CREATE_USER_BIT ?
>>   				   &vm->lru_bulk_move : NULL, size,
>> +				   smem_cpu_caching, coh_mode,
>>   				   type, flags);
>>   	if (IS_ERR(bo))
>>   		return bo;
>> @@ -1376,11 +1393,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
>>   	return ERR_PTR(err);
>>   }
>>   
>> +struct xe_bo *
>> +xe_bo_create_locked_range(struct xe_device *xe,
>> +			  struct xe_tile *tile, struct xe_vm *vm,
>> +			  size_t size, u64 start, u64 end,
>> +			  enum ttm_bo_type type, u32 flags)
>> +{
>> +	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, 0, type, flags);
>> +}
>> +
>>   struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
>>   				  struct xe_vm *vm, size_t size,
>>   				  enum ttm_bo_type type, u32 flags)
>>   {
>> -	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
>> +	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
>> +}
>> +
>> +static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
>> +				       struct xe_vm *vm, size_t size,
>> +				       u16 smem_cpu_caching, u16 coh_mode,
>> +				       enum ttm_bo_type type,
>> +				       u32 flags)
>> +{
>> +	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
>> +						 smem_cpu_caching, coh_mode, type,
>> +						 flags | XE_BO_CREATE_USER_BIT);
>> +	if (!IS_ERR(bo))
>> +		xe_bo_unlock_vm_held(bo);
>> +
>> +	return bo;
>>   }
>>   
>>   struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
>> @@ -1763,11 +1804,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   	struct drm_xe_gem_create *args = data;
>>   	struct xe_vm *vm = NULL;
>>   	struct xe_bo *bo;
>> -	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
>> +	unsigned int bo_flags;
> 
> bo_flags is not initialized then bits are set with or on it.

Thanks for catching. Will fix.

> 
>>   	u32 handle;
>>   	int err;
>>   
>> -	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
>> +	if (XE_IOCTL_DBG(xe, args->extensions) ||
>>   	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
>>   		return -EINVAL;
>>   
>> @@ -1809,6 +1850,32 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
>>   	}
>>   
>> +	if (XE_IOCTL_DBG(xe, !args->coh_mode))
>> +		return -EINVAL;
>> +
>> +	if (XE_IOCTL_DBG(xe, args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
>> +		return -EINVAL;
>> +
>> +	if (XE_IOCTL_DBG(xe, args->smem_cpu_caching > XE_GEM_CPU_CACHING_UC))
>> +		return -EINVAL;
>> +
>> +	if (bo_flags & XE_BO_CREATE_SYSTEM_BIT) {
>> +		if (XE_IOCTL_DBG(xe, !args->smem_cpu_caching))
>> +			return -EINVAL;
>> +
>> +		if (XE_IOCTL_DBG(xe, !IS_DGFX(xe) &&
>> +				 bo_flags & XE_BO_SCANOUT_BIT &&
>> +				 args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>> +			return -EINVAL;
>> +
>> +		if (args->coh_mode == XE_GEM_COH_NONE) {
>> +			if (XE_IOCTL_DBG(xe, args->smem_cpu_caching == XE_GEM_CPU_CACHING_WB))
>> +				return -EINVAL;
>> +		}
>> +	} else if (XE_IOCTL_DBG(xe, args->smem_cpu_caching)) {
>> +		return -EINVAL;
>> +	}
> 
> still believe that smem_cpu_caching should != than 0 for lmem, please take a look at my reply in the previous version.
> 
>> +
>>   	if (args->vm_id) {
>>   		vm = xe_vm_lookup(xef, args->vm_id);
>>   		if (XE_IOCTL_DBG(xe, !vm))
>> @@ -1820,8 +1887,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
>>   		}
>>   	}
>>   
>> -	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
>> -			  bo_flags);
>> +	bo = xe_bo_create_user(xe, NULL, vm, args->size,
>> +			       args->smem_cpu_caching, args->coh_mode,
>> +			       ttm_bo_type_device,
>> +			       bo_flags);
>>   	if (IS_ERR(bo)) {
>>   		err = PTR_ERR(bo);
>>   		goto out_vm;
>> @@ -2113,10 +2182,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
>>   	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
>>   			   page_size);
>>   
>> -	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
>> -			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> -			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
>> -			  XE_BO_NEEDS_CPU_ACCESS);
>> +	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
>> +			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
>> +			       ttm_bo_type_device,
>> +			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
>> +			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
>> +			       XE_BO_NEEDS_CPU_ACCESS);
>>   	if (IS_ERR(bo))
>>   		return PTR_ERR(bo);
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>> index e3c90d45e723..b3c7e33ed39a 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.h
>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>> @@ -83,9 +83,10 @@ struct sg_table;
>>   struct xe_bo *xe_bo_alloc(void);
>>   void xe_bo_free(struct xe_bo *bo);
>>   
>> -struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>> +struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
>>   				    struct xe_tile *tile, struct dma_resv *resv,
>>   				    struct ttm_lru_bulk_move *bulk, size_t size,
>> +				    u16 smem_cpu_caching, u16 coh_mode,
>>   				    enum ttm_bo_type type, u32 flags);
>>   struct xe_bo *
>>   xe_bo_create_locked_range(struct xe_device *xe,
>> diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
>> index 051fe990c133..d44ccacc683f 100644
>> --- a/drivers/gpu/drm/xe/xe_bo_types.h
>> +++ b/drivers/gpu/drm/xe/xe_bo_types.h
>> @@ -76,6 +76,16 @@ struct xe_bo {
>>   	struct llist_node freed;
>>   	/** @created: Whether the bo has passed initial creation */
>>   	bool created;
>> +	/**
>> +	 * @coh_mode: Coherency setting. Currently only used for userspace
>> +	 * objects.
>> +	 */
>> +	u16 coh_mode;
>> +	/**
>> +	 * @smem_cpu_caching: Caching mode for smem. Currently only used for
>> +	 * userspace objects.
>> +	 */
>> +	u16 smem_cpu_caching;
>>   };
>>   
>>   #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
>> diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
>> index cfde3be3b0dc..9da5cffeef13 100644
>> --- a/drivers/gpu/drm/xe/xe_dma_buf.c
>> +++ b/drivers/gpu/drm/xe/xe_dma_buf.c
>> @@ -214,8 +214,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
>>   	int ret;
>>   
>>   	dma_resv_lock(resv, NULL);
>> -	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>> -				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>> +	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
>> +				    0, 0, /* Will require 1way or 2way for vm_bind */
>> +				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
>>   	if (IS_ERR(bo)) {
>>   		ret = PTR_ERR(bo);
>>   		goto error;
>> diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
>> index d48d8e3c898c..fc2016ebe102 100644
>> --- a/include/uapi/drm/xe_drm.h
>> +++ b/include/uapi/drm/xe_drm.h
>> @@ -456,8 +456,61 @@ struct drm_xe_gem_create {
>>   	 */
>>   	__u32 handle;
>>   
>> -	/** @pad: MBZ */
>> -	__u32 pad;
>> +	/**
>> +	 * @coh_mode: The coherency mode for this object. This will limit the
>> +	 * possible @smem_caching values.
>> +	 *
>> +	 * Supported values:
>> +	 *
>> +	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
>> +	 * CPU. CPU caches are not snooped.
>> +	 *
>> +	 * XE_GEM_COH_AT_LEAST_1WAY:
>> +	 *
>> +	 * CPU-GPU coherency must be at least 1WAY.
>> +	 *
>> +	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
>> +	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
>> +	 * caches.
>> +	 *
>> +	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
>> +	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
>> +	 *
>> +	 * Note: On dgpu the GPU device never caches system memory (outside of
>> +	 * the special system-memory-read-only cache, which is anyway flushed by
>> +	 * KMD when nuking TLBs for a given object so should be no concern to
>> +	 * userspace). The device should be thought of as always 1WAY coherent,
>> +	 * with the addition that the GPU never caches system memory. At least
>> +	 * on current dgpu HW there is no way to turn off snooping so likely the
>> +	 * different coherency modes of the pat_index make no difference for
>> +	 * system memory.
>> +	 */
>> +#define XE_GEM_COH_NONE			1
>> +#define XE_GEM_COH_AT_LEAST_1WAY	2
>> +	__u16 coh_mode;
>> +
>> +	/**
>> +	 * @smem_cpu_caching: The CPU caching mode to select for system memory.
>> +	 *
>> +	 * Supported values:
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
>> +	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
>> +	 * be XE_GEM_COH_AT_LEAST_1WAY.
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
>> +	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
>> +	 * use this.
>> +	 *
>> +	 * XE_GEM_CPU_CACHING_UC: Allocate the pages as uncached. Any @coh_mode
>> +	 * is permitted. Scanout surfaces are permitted to use this.
>> +	 *
>> +	 * MUST be left as zero for VRAM-only objects.
>> +	 */
>> +#define XE_GEM_CPU_CACHING_WB                      1
>> +#define XE_GEM_CPU_CACHING_WC                      2
>> +#define XE_GEM_CPU_CACHING_UC                      3
>> +	__u16 smem_cpu_caching;
>>   
>>   	/** @reserved: Reserved */
>>   	__u64 reserved[2];
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support
  2023-09-25 19:47 ` [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Souza, Jose
@ 2023-09-26  8:23   ` Matthew Auld
  2023-09-26 18:03     ` Souza, Jose
  0 siblings, 1 reply; 16+ messages in thread
From: Matthew Auld @ 2023-09-26  8:23 UTC (permalink / raw)
  To: Souza, Jose, intel-xe

On 25/09/2023 20:47, Souza, Jose wrote:
> On Mon, 2023-09-25 at 14:21 +0100, Matthew Auld wrote:
>> Branch available here (lightly tested):
>> https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
>>
>> Series still needs some more testing. Also note that the series directly depends
>> on the WIP patch here: https://patchwork.freedesktop.org/series/122708/
>>
>> Goal here is to allow userspace to directly control the pat_index when mapping
>> memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
>> is very much needed on newer igpu platforms which allow incoherent GT access,
>> where the choice over the cache level and expected coherency is best left to
>> userspace depending on their usecase.  In the future there may also be other
>> stuff encoded in the pat_index, so giving userspace direct control will also be
>> needed there.
>>
>> To support this we added new gem_create uAPI for selecting the CPU cache
>> mode to use for system memory, including the expected GPU coherency mode. There
>> are various restrictions here for the selected coherency mode and compatible CPU
>> cache modes.  With that in place the actual pat_index can now be provided as
>> part of vm_bind. The only restriction is that the coherency mode of the
>> pat_index must be at least as coherent as the gem_create coherency mode. There
>> are also some special cases like with userptr and dma-buf.
>>
>> v2:
>>    - Loads of improvements/tweaks. Main changes are to now allow
>>      gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
>>      exactly. This simplifies the dma-buf policy from userspace pov. Also we now
>>      only consider COH_NONE and COH_AT_LEAST_1WAY.
>> v3:
>>    - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
>>      fixes.
>>
> 
> Thanks for the fixes, display is now working in TGL and DG2 but getting a new crash in MTL:

Is the MTL bug present on the same base branch. i.e if you drop all the 
patches in this series?

> 
> 
> [  259.478814] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]  blocks   16,  97,  97, 129, 129, 161,   0,   0,  30,  33,   47 ->   62,
> 93,  93, 123, 123, 154,   0,   0, 137,  62,  137
> [  259.478936] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] min_ddb   19, 108, 108, 143, 143, 179,   0,   0,  31,  38,   48 ->  123,
> 184, 184, 184, 184, 245,   0,   0, 138, 123,  138
> [  259.479089] ------------[ cut here ]------------
> [  259.479093] WARNING: CPU: 2 PID: 2057 at drivers/gpu/drm/xe/display/xe_fb_pin.c:199 __xe_pin_fb_vma+0x3dc/0x840 [xe]
> [  259.479239] Modules linked in: xe drm_ttm_helper drm_exec gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy ttm drm_display_helper
> x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul wmi_bmof pmt_telemetry pmt_class ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg
> snd_hda_codec kvm_intel snd_hwdep snd_hda_core e1000e mei_me ptp snd_pcm i2c_i801 mei i2c_smbus pps_core intel_vsec video wmi pinctrl_meteorlake fuse
> [  259.479327] CPU: 2 PID: 2057 Comm: gnome-shell Tainted: G        W          6.5.0-rc7+zeh-xe+ #1109
> [  259.479333] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-M LP5x CONF1 RVP, BIOS MTLMFWI1.R00.3323.D84.2308220916 08/22/2023
> [  259.479337] RIP: 0010:__xe_pin_fb_vma+0x3dc/0x840 [xe]
> [  259.479498] Code: 4d 89 f4 48 8b 44 24 08 49 8d b4 24 28 03 00 00 b9 16 00 00 00 4c 89 60 08 48 8d 78 10 f3 48 a5 4c 8b 6c 24 08 e9 2c fd ff ff
> <0f> 0b 49 c7 c5 ed ff ff ff e9 14 fd ff ff 48 8b 7c 24 28 89 14 24
> [  259.479503] RSP: 0018:ffffc9000604bb88 EFLAGS: 00010246
> [  259.479509] RAX: ffff888196c9f190 RBX: ffff8881a222dc00 RCX: 0000000000000001
> [  259.479513] RDX: 0000000000000000 RSI: ffffffff826a896e RDI: ffffffff826ac710
> [  259.479517] RBP: ffff888183823800 R08: 0000000000000128 R09: ffff8881b2eff4d8
> [  259.479521] R10: ffffc9000604bac8 R11: 0000000000000002 R12: ffff8881a222dc00
> [  259.479526] R13: ffff888102ab0000 R14: 0000000000000000 R15: 0000563e9575fa00
> [  259.479530] FS:  00007f4dbdf5f5c0(0000) GS:ffff88846e100000(0000) knlGS:0000000000000000
> [  259.479535] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  259.479540] CR2: 00000e17254bc000 CR3: 0000000117bb8005 CR4: 0000000000770ee0
> [  259.479545] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  259.479549] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> [  259.479552] PKRU: 55555554
> [  259.479556] Call Trace:
> [  259.479560]  <TASK>
> [  259.479564]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> [  259.479708]  ? __warn+0x7c/0x170
> [  259.479716]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> [  259.479855]  ? report_bug+0x18d/0x1c0
> [  259.479865]  ? handle_bug+0x3a/0x70
> [  259.479873]  ? exc_invalid_op+0x13/0x60
> [  259.479880]  ? asm_exc_invalid_op+0x16/0x20
> [  259.479894]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> [  259.480030]  ? __xe_pin_fb_vma+0x34/0x840 [xe]
> [  259.480160]  ? lock_acquire+0xd3/0x2d0
> [  259.480170]  ? find_held_lock+0x2b/0x80
> [  259.480179]  intel_plane_pin_fb+0x34/0x90 [xe]
> [  259.480314]  intel_prepare_plane_fb+0x2c/0x70 [xe]
> [  259.480469]  drm_atomic_helper_prepare_planes+0x6b/0x210
> [  259.480481]  intel_atomic_commit+0x4d/0x360 [xe]
> [  259.480666]  drm_mode_atomic_ioctl+0x7c7/0xbd0
> [  259.480688]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
> [  259.480696]  drm_ioctl_kernel+0xc0/0x170
> [  259.480705]  drm_ioctl+0x212/0x470
> [  259.480711]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
> [  259.480729]  __x64_sys_ioctl+0x8d/0xb0
> [  259.480739]  do_syscall_64+0x38/0x90
> [  259.480746]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  259.480752] RIP: 0033:0x7f4dc211aaff
> [  259.480756] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05
> <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> [  259.480761] RSP: 002b:00007ffd6d2b7940 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  259.480767] RAX: ffffffffffffffda RBX: 00007ffd6d2b79e0 RCX: 00007f4dc211aaff
> [  259.480771] RDX: 00007ffd6d2b79e0 RSI: 00000000c03864bc RDI: 0000000000000009
> [  259.480774] RBP: 00000000c03864bc R08: 0000000000000000 R09: 0000000000000000
> [  259.480777] R10: 00007f4dc221a2f0 R11: 0000000000000246 R12: 0000563e988b4590
> [  259.480781] R13: 0000000000000009 R14: 0000563e988b4650 R15: 0000563e9814f060
> [  259.480794]  </TASK>
> [  259.480797] irq event stamp: 2049057
> [  259.480800] hardirqs last  enabled at (2049063): [<ffffffff811e2369>] __up_console_sem+0x59/0x80
> [  259.480808] hardirqs last disabled at (2049068): [<ffffffff811e234e>] __up_console_sem+0x3e/0x80
> [  259.480815] softirqs last  enabled at (2048430): [<ffffffff8114f3aa>] irq_exit_rcu+0x8a/0xe0
> [  259.480821] softirqs last disabled at (2048423): [<ffffffff8114f3aa>] irq_exit_rcu+0x8a/0xe0
> [  259.480826] ---[ end trace 0000000000000000 ]---
> [  259.494838] xe 0000:00:02.0: [drm:drm_mode_addfb2] [FB:219]
> [  259.494943] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLAN
> 
> That is: __xe_pin_fb_vma()
> if (XE_WARN_ON(view->type == I915_GTT_VIEW_REMAPPED)) {
> 
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support
  2023-09-26  8:23   ` Matthew Auld
@ 2023-09-26 18:03     ` Souza, Jose
  0 siblings, 0 replies; 16+ messages in thread
From: Souza, Jose @ 2023-09-26 18:03 UTC (permalink / raw)
  To: intel-xe, Auld,  Matthew

On Tue, 2023-09-26 at 09:23 +0100, Matthew Auld wrote:
> On 25/09/2023 20:47, Souza, Jose wrote:
> > On Mon, 2023-09-25 at 14:21 +0100, Matthew Auld wrote:
> > > Branch available here (lightly tested):
> > > https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
> > > 
> > > Series still needs some more testing. Also note that the series directly depends
> > > on the WIP patch here: https://patchwork.freedesktop.org/series/122708/
> > > 
> > > Goal here is to allow userspace to directly control the pat_index when mapping
> > > memory via the ppGTT, in addtion to the CPU caching mode for system memory. This
> > > is very much needed on newer igpu platforms which allow incoherent GT access,
> > > where the choice over the cache level and expected coherency is best left to
> > > userspace depending on their usecase.  In the future there may also be other
> > > stuff encoded in the pat_index, so giving userspace direct control will also be
> > > needed there.
> > > 
> > > To support this we added new gem_create uAPI for selecting the CPU cache
> > > mode to use for system memory, including the expected GPU coherency mode. There
> > > are various restrictions here for the selected coherency mode and compatible CPU
> > > cache modes.  With that in place the actual pat_index can now be provided as
> > > part of vm_bind. The only restriction is that the coherency mode of the
> > > pat_index must be at least as coherent as the gem_create coherency mode. There
> > > are also some special cases like with userptr and dma-buf.
> > > 
> > > v2:
> > >    - Loads of improvements/tweaks. Main changes are to now allow
> > >      gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
> > >      exactly. This simplifies the dma-buf policy from userspace pov. Also we now
> > >      only consider COH_NONE and COH_AT_LEAST_1WAY.
> > > v3:
> > >    - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
> > >      fixes.
> > > 
> > 
> > Thanks for the fixes, display is now working in TGL and DG2 but getting a new crash in MTL:
> 
> Is the MTL bug present on the same base branch. i.e if you drop all the 
> patches in this series?

Also happens without your patches.
Found CI bug with the same signature: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/606


> 
> > 
> > 
> > [  259.478814] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A]  blocks   16,  97,  97, 129, 129, 161,   0,   0,  30,  33,   47 ->   62,
> > 93,  93, 123, 123, 154,   0,   0, 137,  62,  137
> > [  259.478936] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLANE:31:plane 1A] min_ddb   19, 108, 108, 143, 143, 179,   0,   0,  31,  38,   48 ->  123,
> > 184, 184, 184, 184, 245,   0,   0, 138, 123,  138
> > [  259.479089] ------------[ cut here ]------------
> > [  259.479093] WARNING: CPU: 2 PID: 2057 at drivers/gpu/drm/xe/display/xe_fb_pin.c:199 __xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.479239] Modules linked in: xe drm_ttm_helper drm_exec gpu_sched drm_suballoc_helper i2c_algo_bit drm_buddy ttm drm_display_helper
> > x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul wmi_bmof pmt_telemetry pmt_class ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg
> > snd_hda_codec kvm_intel snd_hwdep snd_hda_core e1000e mei_me ptp snd_pcm i2c_i801 mei i2c_smbus pps_core intel_vsec video wmi pinctrl_meteorlake fuse
> > [  259.479327] CPU: 2 PID: 2057 Comm: gnome-shell Tainted: G        W          6.5.0-rc7+zeh-xe+ #1109
> > [  259.479333] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-M LP5x CONF1 RVP, BIOS MTLMFWI1.R00.3323.D84.2308220916 08/22/2023
> > [  259.479337] RIP: 0010:__xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.479498] Code: 4d 89 f4 48 8b 44 24 08 49 8d b4 24 28 03 00 00 b9 16 00 00 00 4c 89 60 08 48 8d 78 10 f3 48 a5 4c 8b 6c 24 08 e9 2c fd ff ff
> > <0f> 0b 49 c7 c5 ed ff ff ff e9 14 fd ff ff 48 8b 7c 24 28 89 14 24
> > [  259.479503] RSP: 0018:ffffc9000604bb88 EFLAGS: 00010246
> > [  259.479509] RAX: ffff888196c9f190 RBX: ffff8881a222dc00 RCX: 0000000000000001
> > [  259.479513] RDX: 0000000000000000 RSI: ffffffff826a896e RDI: ffffffff826ac710
> > [  259.479517] RBP: ffff888183823800 R08: 0000000000000128 R09: ffff8881b2eff4d8
> > [  259.479521] R10: ffffc9000604bac8 R11: 0000000000000002 R12: ffff8881a222dc00
> > [  259.479526] R13: ffff888102ab0000 R14: 0000000000000000 R15: 0000563e9575fa00
> > [  259.479530] FS:  00007f4dbdf5f5c0(0000) GS:ffff88846e100000(0000) knlGS:0000000000000000
> > [  259.479535] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  259.479540] CR2: 00000e17254bc000 CR3: 0000000117bb8005 CR4: 0000000000770ee0
> > [  259.479545] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [  259.479549] DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400
> > [  259.479552] PKRU: 55555554
> > [  259.479556] Call Trace:
> > [  259.479560]  <TASK>
> > [  259.479564]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.479708]  ? __warn+0x7c/0x170
> > [  259.479716]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.479855]  ? report_bug+0x18d/0x1c0
> > [  259.479865]  ? handle_bug+0x3a/0x70
> > [  259.479873]  ? exc_invalid_op+0x13/0x60
> > [  259.479880]  ? asm_exc_invalid_op+0x16/0x20
> > [  259.479894]  ? __xe_pin_fb_vma+0x3dc/0x840 [xe]
> > [  259.480030]  ? __xe_pin_fb_vma+0x34/0x840 [xe]
> > [  259.480160]  ? lock_acquire+0xd3/0x2d0
> > [  259.480170]  ? find_held_lock+0x2b/0x80
> > [  259.480179]  intel_plane_pin_fb+0x34/0x90 [xe]
> > [  259.480314]  intel_prepare_plane_fb+0x2c/0x70 [xe]
> > [  259.480469]  drm_atomic_helper_prepare_planes+0x6b/0x210
> > [  259.480481]  intel_atomic_commit+0x4d/0x360 [xe]
> > [  259.480666]  drm_mode_atomic_ioctl+0x7c7/0xbd0
> > [  259.480688]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
> > [  259.480696]  drm_ioctl_kernel+0xc0/0x170
> > [  259.480705]  drm_ioctl+0x212/0x470
> > [  259.480711]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
> > [  259.480729]  __x64_sys_ioctl+0x8d/0xb0
> > [  259.480739]  do_syscall_64+0x38/0x90
> > [  259.480746]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> > [  259.480752] RIP: 0033:0x7f4dc211aaff
> > [  259.480756] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05
> > <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
> > [  259.480761] RSP: 002b:00007ffd6d2b7940 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> > [  259.480767] RAX: ffffffffffffffda RBX: 00007ffd6d2b79e0 RCX: 00007f4dc211aaff
> > [  259.480771] RDX: 00007ffd6d2b79e0 RSI: 00000000c03864bc RDI: 0000000000000009
> > [  259.480774] RBP: 00000000c03864bc R08: 0000000000000000 R09: 0000000000000000
> > [  259.480777] R10: 00007f4dc221a2f0 R11: 0000000000000246 R12: 0000563e988b4590
> > [  259.480781] R13: 0000000000000009 R14: 0000563e988b4650 R15: 0000563e9814f060
> > [  259.480794]  </TASK>
> > [  259.480797] irq event stamp: 2049057
> > [  259.480800] hardirqs last  enabled at (2049063): [<ffffffff811e2369>] __up_console_sem+0x59/0x80
> > [  259.480808] hardirqs last disabled at (2049068): [<ffffffff811e234e>] __up_console_sem+0x3e/0x80
> > [  259.480815] softirqs last  enabled at (2048430): [<ffffffff8114f3aa>] irq_exit_rcu+0x8a/0xe0
> > [  259.480821] softirqs last disabled at (2048423): [<ffffffff8114f3aa>] irq_exit_rcu+0x8a/0xe0
> > [  259.480826] ---[ end trace 0000000000000000 ]---
> > [  259.494838] xe 0000:00:02.0: [drm:drm_mode_addfb2] [FB:219]
> > [  259.494943] xe 0000:00:02.0: [drm:skl_compute_wm [xe]] [PLAN
> > 
> > That is: __xe_pin_fb_vma()
> > if (XE_WARN_ON(view->type == I915_GTT_VIEW_REMAPPED)) {
> > 
> > 
> > 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Intel-xe] [PATCH v3 6/7] drm/xe: directly use pat_index for pte_encode
  2023-09-25 22:08   ` Matt Roper
@ 2023-09-26 19:29     ` Lucas De Marchi
  0 siblings, 0 replies; 16+ messages in thread
From: Lucas De Marchi @ 2023-09-26 19:29 UTC (permalink / raw)
  To: Matt Roper; +Cc: Matthew Auld, intel-xe

On Mon, Sep 25, 2023 at 03:08:01PM -0700, Matt Roper wrote:
>On Mon, Sep 25, 2023 at 02:21:19PM +0100, Matthew Auld wrote:
>> In the next patch userspace will be able to directly set the pat_index
>> as part of vm_bind. To support this we need to get away from using
>> xe_cache_level in the low level routines and rather just use the
>> pat_index directly.
>
>It might be worth adding a kunit test (in a separate patch) that ensures
>every platform's PAT table can provide a valid index for XE_CACHE_WB and
>XE_CACHE_UC, which I assume are the two the kernel might want to perform
>lookups on in general code.
>
>At some point we may want to replace xe_pat_get_index() completely and
>just cache the indices for the "important" PAT settings at init that the
>kernel will want to use elsewhere in various places (e.g., something
>like gt->pat.cached and gt->pat.uncached).

already done in my series

>
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>  drivers/gpu/drm/xe/tests/xe_migrate.c |  2 +-
>>  drivers/gpu/drm/xe/xe_ggtt.c          |  7 +++----
>>  drivers/gpu/drm/xe/xe_ggtt_types.h    |  2 +-
>>  drivers/gpu/drm/xe/xe_migrate.c       | 13 ++++++++-----
>>  drivers/gpu/drm/xe/xe_pt.c            | 15 +++++++++------
>>  drivers/gpu/drm/xe/xe_pt.h            |  4 ++--
>>  drivers/gpu/drm/xe/xe_vm.c            |  8 ++------
>>  drivers/gpu/drm/xe/xe_vm_types.h      |  3 +--
>>  8 files changed, 27 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
>> index 6b4388bfbb31..d3bf4751a2d7 100644
>> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
>> @@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>>  	/* First part of the test, are we updating our pagetable bo with a new entry? */
>>  	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
>>  		  0xdeaddeadbeefbeef);
>> -	expected = xe_pte_encode(m->q->vm, pt, 0, XE_CACHE_WB, 0);
>> +	expected = xe_pte_encode(m->q->vm, pt, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>>  	if (m->q->vm->flags & XE_VM_FLAG_64K)
>>  		expected |= XE_PTE_PS64;
>>  	if (xe_bo_is_vram(pt))
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
>> index 2002b6597dbf..1c447f1a4fbb 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt.c
>> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
>> @@ -41,7 +41,8 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
>>  		pte |= XE_GGTT_PTE_DM;
>>
>>  	if ((ggtt->pat_encode).pte_encode)
>> -		pte = (ggtt->pat_encode).pte_encode(xe, pte, XE_CACHE_WB_1_WAY);
>> +		pte = (ggtt->pat_encode).pte_encode(xe, pte,
>> +						    xe_pat_get_index(xe, XE_CACHE_WB_1_WAY));
>
>Hmm, looks like this series is still based on top of Ravi's old series?
>I believe his series was going to get reworked to remove stuff like
>XE_CACHE_WB_1_WAY, so some of the underlying code here is likely to
>change.  I think Lucas was going to help with reworking that, so he
>might be able to help reconcile the updates to Ravi's series with the
>changes on top that your series is making.

yeah, new one is https://patchwork.freedesktop.org/series/124225/
I will submit a v2 today.

Lucas De Marchi

>
>
>Matt
>
>>
>>  	return pte;
>>  }
>> @@ -102,10 +103,8 @@ static void primelockdep(struct xe_ggtt *ggtt)
>>  }
>>
>>  static u64 xelpg_ggtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				     u16 pat_index)
>>  {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>  	pte_pat &= ~(XELPG_GGTT_PTE_PAT_MASK);
>>
>>  	if (pat_index & BIT(0))
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
>> index 7e55fac1a8a9..7981075bb228 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
>> @@ -31,7 +31,7 @@ struct xe_ggtt {
>>
>>  	struct {
>>  		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache);
>> +				  u16 pat_index);
>>  	} pat_encode;
>>  };
>>
>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>> index ebae0117f577..9d201bdb9f25 100644
>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>> @@ -25,6 +25,7 @@
>>  #include "xe_lrc.h"
>>  #include "xe_map.h"
>>  #include "xe_mocs.h"
>> +#include "xe_pat.h"
>>  #include "xe_pt.h"
>>  #include "xe_res_cursor.h"
>>  #include "xe_sched_job.h"
>> @@ -162,6 +163,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>  	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
>>  	u32 map_ofs, level, i;
>>  	struct xe_bo *bo, *batch = tile->mem.kernel_bb_pool->bo;
>> +	u16 pat_index = xe_pat_get_index(xe, XE_CACHE_WB);
>>  	u64 entry;
>>  	int ret;
>>
>> @@ -196,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>
>>  	/* Map the entire BO in our level 0 pt */
>>  	for (i = 0, level = 0; i < num_entries; level++) {
>> -		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
>> +		entry = xe_pte_encode(vm, bo, i * XE_PAGE_SIZE, pat_index, 0);
>>
>>  		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>>
>> @@ -214,7 +216,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>  		for (i = 0; i < batch->size;
>>  		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
>>  		     XE_PAGE_SIZE) {
>> -			entry = xe_pte_encode(vm, batch, i, XE_CACHE_WB, 0);
>> +			entry = xe_pte_encode(vm, batch, i, pat_index, 0);
>>
>>  			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
>>  				  entry);
>> @@ -259,7 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>>
>>  		flags = XE_PPGTT_PTE_DM;
>> -		flags = __xe_pte_encode(flags, XE_CACHE_WB, vm, NULL, level);
>> +		flags = __xe_pte_encode(flags, pat_index, vm, NULL, level);
>>
>>  		/*
>>  		 * Use 1GB pages, it shouldn't matter the physical amount of
>> @@ -454,6 +456,7 @@ static void emit_pte(struct xe_migrate *m,
>>  		     struct xe_res_cursor *cur,
>>  		     u32 size, struct xe_bo *bo)
>>  {
>> +	u16 pat_index = xe_pat_get_index(m->tile->xe, XE_CACHE_WB);
>>  	u32 ptes;
>>  	u64 ofs = at_pt * XE_PAGE_SIZE;
>>  	u64 cur_ofs;
>> @@ -494,7 +497,7 @@ static void emit_pte(struct xe_migrate *m,
>>  				addr += vram_region_gpu_offset(bo->ttm.resource);
>>  				addr |= XE_PPGTT_PTE_DM;
>>  			}
>> -			addr = __xe_pte_encode(addr, XE_CACHE_WB, m->q->vm, NULL, 0);
>> +			addr = __xe_pte_encode(addr, pat_index, m->q->vm, NULL, 0);
>>  			bb->cs[bb->len++] = lower_32_bits(addr);
>>  			bb->cs[bb->len++] = upper_32_bits(addr);
>>
>> @@ -1254,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>>
>>  			xe_tile_assert(tile, pt_bo->size == SZ_4K);
>>
>> -			addr = xe_pte_encode(vm, pt_bo, 0, XE_CACHE_WB, 0);
>> +			addr = xe_pte_encode(vm, pt_bo, 0, xe_pat_get_index(xe, XE_CACHE_WB), 0);
>>  			bb->cs[bb->len++] = lower_32_bits(addr);
>>  			bb->cs[bb->len++] = upper_32_bits(addr);
>>  		}
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index ddb4d9c33181..5686ed9be175 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -11,6 +11,7 @@
>>  #include "xe_gt.h"
>>  #include "xe_gt_tlb_invalidation.h"
>>  #include "xe_migrate.h"
>> +#include "xe_pat.h"
>>  #include "xe_pt_types.h"
>>  #include "xe_pt_walk.h"
>>  #include "xe_res_cursor.h"
>> @@ -68,7 +69,7 @@ u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset)
>>  	return pde;
>>  }
>>
>> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>>  		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level)
>>  {
>>  	struct xe_device *xe = vm->xe;
>> @@ -86,7 +87,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>>  	else if (pt_level == 2)
>>  		pte |= XE_PDPE_PS_1G;
>>
>> -	pte = vm->pat_encode.pte_encode(xe, pte, cache);
>> +	pte = vm->pat_encode.pte_encode(xe, pte, pat_index);
>>
>>  	/* XXX: Does hw support 1 GiB pages? */
>>  	XE_WARN_ON(pt_level > 2);
>> @@ -104,7 +105,7 @@ u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>>   *
>>   * Return: An encoded page-table entry. No errors.
>>   */
>> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
>> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>>  		  u32 pt_level)
>>  {
>>  	u64 pte;
>> @@ -113,7 +114,7 @@ u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_
>>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>>  		pte |= XE_PPGTT_PTE_DM;
>>
>> -	return __xe_pte_encode(pte, cache, vm, NULL, pt_level);
>> +	return __xe_pte_encode(pte, pat_index, vm, NULL, pt_level);
>>  }
>>
>>  static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>> @@ -126,7 +127,7 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>>
>>  	if (level == 0) {
>>  		u64 empty = xe_pte_encode(vm, vm->scratch_bo[id], 0,
>> -					  XE_CACHE_WB, 0);
>> +					  xe_pat_get_index(vm->xe, XE_CACHE_WB), 0);
>>
>>  		return empty;
>>  	} else {
>> @@ -597,7 +598,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>>
>>  		pte = __xe_pte_encode(is_null ? 0 :
>>  				      xe_res_dma(curs) + xe_walk->dma_offset,
>> -				      xe_walk->cache, xe_walk->vm, xe_walk->vma, level);
>> +				      xe_pat_get_index(tile_to_xe(xe_walk->tile),
>> +						       xe_walk->cache),
>> +				      xe_walk->vm, xe_walk->vma, level);
>>  		pte |= xe_walk->default_pte;
>>
>>  		/*
>> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
>> index 0e66436d707d..6d10823fca9b 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.h
>> +++ b/drivers/gpu/drm/xe/xe_pt.h
>> @@ -47,9 +47,9 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>>
>>  u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset);
>>
>> -u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
>> +u64 xe_pte_encode(struct xe_vm *vm, struct xe_bo *bo, u64 offset, u16 pat_index,
>>  		  u32 pt_level);
>> -u64 __xe_pte_encode(u64 pte, enum xe_cache_level cache,
>> +u64 __xe_pte_encode(u64 pte, u16 pat_index,
>>  		    struct xe_vm *vm, struct xe_vma *vma, u32 pt_level);
>>
>>  #endif
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index c53d5c1580df..28e6429488ee 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -1196,10 +1196,8 @@ static void xe_vma_op_work_func(struct work_struct *w);
>>  static void vm_destroy_work_func(struct work_struct *w);
>>
>>  static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				    u16 pat_index)
>>  {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>  	if (pat_index & BIT(0))
>>  		pte_pat |= BIT(3);
>>
>> @@ -1217,10 +1215,8 @@ static u64 xe2_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>>  }
>>
>>  static u64 xelp_ppgtt_pte_encode_pat(struct xe_device *xe, u64 pte_pat,
>> -						enum xe_cache_level cache)
>> +				     u16 pat_index)
>>  {
>> -	u32 pat_index = xe_pat_get_index(xe, cache);
>> -
>>  	if (pat_index & BIT(0))
>>  		pte_pat |= BIT(3);
>>
>> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
>> index c4d866840d6a..685c2179e533 100644
>> --- a/drivers/gpu/drm/xe/xe_vm_types.h
>> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
>> @@ -340,8 +340,7 @@ struct xe_vm {
>>  	struct xe_file *xef;
>>
>>  	struct {
>> -		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat,
>> -				  enum xe_cache_level cache);
>> +		u64 (*pte_encode)(struct xe_device *xe, u64 pte_pat, u16 pat_index);
>>  	} pat_encode;
>>  };
>>
>> --
>> 2.41.0
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-09-26 19:29 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25 13:21 [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Matthew Auld
2023-09-25 13:21 ` [Intel-xe] [PATCH v3 1/7] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
2023-09-25 18:56   ` Souza, Jose
2023-09-26  8:21     ` Matthew Auld
2023-09-25 13:21 ` [Intel-xe] [PATCH v3 2/7] drm/xe: move pat_table into device info Matthew Auld
2023-09-25 13:21 ` [Intel-xe] [PATCH v3 3/7] drm/xe/pat: trim the tgl PAT table Matthew Auld
2023-09-25 13:21 ` [Intel-xe] [PATCH v3 4/7] drm/xe/pat: annotate pat_index with coherency mode Matthew Auld
2023-09-25 13:21 ` [Intel-xe] [PATCH v3 5/7] drm/xe/migrate: rather use pte_encode helpers Matthew Auld
2023-09-25 13:21 ` [Intel-xe] [PATCH v3 6/7] drm/xe: directly use pat_index for pte_encode Matthew Auld
2023-09-25 22:08   ` Matt Roper
2023-09-26 19:29     ` Lucas De Marchi
2023-09-25 13:21 ` [Intel-xe] [PATCH v3 7/7] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
2023-09-25 13:24 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev4) Patchwork
2023-09-25 19:47 ` [Intel-xe] [PATCH v3 0/7] PAT and cache coherency support Souza, Jose
2023-09-26  8:23   ` Matthew Auld
2023-09-26 18:03     ` Souza, Jose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.