All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support
@ 2023-09-27 11:00 Matthew Auld
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 1/5] drm/xe/pat: trim the xelp PAT table Matthew Auld
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Matthew Auld @ 2023-09-27 11:00 UTC (permalink / raw)
  To: intel-xe

Branch available here:
https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads

Series directly depends on the patches here:
https://patchwork.freedesktop.org/series/124225/

Goal here is to allow userspace to directly control the pat_index when mapping
memory via the ppGTT, in addtion to the CPU caching mode. This is very much
needed on newer igpu platforms which allow incoherent GT access, where the
choice over the cache level and expected coherency is best left to userspace
depending on their usecase.  In the future there may also be other stuff encoded
in the pat_index, so giving userspace direct control will also be needed there.

To support this we added new gem_create uAPI for selecting the CPU cache
mode to use for system memory, including the expected GPU coherency mode. There
are various restrictions here for the selected coherency mode and compatible CPU
cache modes.  With that in place the actual pat_index can now be provided as
part of vm_bind. The only restriction is that the coherency mode of the
pat_index must be at least as coherent as the gem_create coherency mode. There
are also some special cases like with userptr and dma-buf.

v2:
  - Loads of improvements/tweaks. Main changes are to now allow
    gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
    exactly. This simplifies the dma-buf policy from userspace pov. Also we now
    only consider COH_NONE and COH_AT_LEAST_1WAY.
v3:
  - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
    fixes.
v4:
  - Rebase on Lucas' new series.
  - Drop UC cache mode.
  - s/smem_cpu_caching/cpu_caching/. Idea is to make VRAM WC explicit in the
    uapi, plus make it more future proof.

-- 
2.41.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-xe] [PATCH v4 1/5] drm/xe/pat: trim the xelp PAT table
  2023-09-27 11:00 [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Matthew Auld
@ 2023-09-27 11:00 ` Matthew Auld
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 2/5] drm/xe: directly use pat_index for pte_encode Matthew Auld
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Matthew Auld @ 2023-09-27 11:00 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matt Roper

We don't seem to use the 4-7 pat indexes, even though they are defined
by the HW. In the next patch userspace will be able to directly set the
pat_index as part of vm_bind and we don't want to allow setting 4-7.
Simplest is to just ignore them here.

Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_pat.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index 6efa44556689..2c759ffa35ef 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -42,10 +42,6 @@ static const u32 xelp_pat_table[] = {
 	[1] = XELP_PAT_WC,
 	[2] = XELP_PAT_WT,
 	[3] = XELP_PAT_UC,
-	[4] = XELP_PAT_WB,
-	[5] = XELP_PAT_WB,
-	[6] = XELP_PAT_WB,
-	[7] = XELP_PAT_WB,
 };
 
 static const u32 xehpc_pat_table[] = {
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-xe] [PATCH v4 2/5] drm/xe: directly use pat_index for pte_encode
  2023-09-27 11:00 [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Matthew Auld
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 1/5] drm/xe/pat: trim the xelp PAT table Matthew Auld
@ 2023-09-27 11:00 ` Matthew Auld
  2023-09-28  4:41   ` Niranjana Vishwanathapura
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 3/5] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Matthew Auld @ 2023-09-27 11:00 UTC (permalink / raw)
  To: intel-xe; +Cc: Matt Roper, Lucas De Marchi

In the next patch userspace will be able to directly set the pat_index
as part of vm_bind. To support this we need to get away from using
xe_cache_level in the low level routines and rather just use the
pat_index directly.

v2: Rebase

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_ggtt.c       |  7 +++----
 drivers/gpu/drm/xe/xe_ggtt_types.h |  3 +--
 drivers/gpu/drm/xe/xe_migrate.c    | 19 +++++++++++--------
 drivers/gpu/drm/xe/xe_pt.c         | 11 ++++++-----
 drivers/gpu/drm/xe/xe_pt_types.h   |  8 ++++----
 drivers/gpu/drm/xe/xe_vm.c         | 24 +++++++++++-------------
 6 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 99b54794917e..2334c47c19cc 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -27,7 +27,7 @@
 #define GUC_GGTT_TOP	0xFEE00000
 
 static u64 xelp_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
-				   enum xe_cache_level cache)
+				   u16 pat_index)
 {
 	u64 pte;
 
@@ -41,13 +41,12 @@ static u64 xelp_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
 }
 
 static u64 xelpg_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
-				    enum xe_cache_level cache)
+				    u16 pat_index)
 {
 	struct xe_device *xe = xe_bo_device(bo);
-	u32 pat_index = xe->pat.idx[cache];
 	u64 pte;
 
-	pte = xelp_ggtt_pte_encode_bo(bo, bo_offset, cache);
+	pte = xelp_ggtt_pte_encode_bo(bo, bo_offset, pat_index);
 
 	xe_assert(xe, pat_index <= 3);
 
diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
index 486016ea5b67..d8c584d9a8c3 100644
--- a/drivers/gpu/drm/xe/xe_ggtt_types.h
+++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
@@ -14,8 +14,7 @@ struct xe_bo;
 struct xe_gt;
 
 struct xe_ggtt_pt_ops {
-	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
-			     enum xe_cache_level cache);
+	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset, u16 pat_index);
 };
 
 struct xe_ggtt {
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 258c2269c916..90a1ff1aca9b 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -158,6 +158,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 				 struct xe_vm *vm)
 {
 	struct xe_device *xe = tile_to_xe(tile);
+	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
 	u8 id = tile->id;
 	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
 	u32 map_ofs, level, i;
@@ -189,7 +190,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		return ret;
 	}
 
-	entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, XE_CACHE_WB);
+	entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, pat_index);
 	xe_pt_write(xe, &vm->pt_root[id]->bo->vmap, 0, entry);
 
 	map_ofs = (num_entries - num_level) * XE_PAGE_SIZE;
@@ -197,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	/* Map the entire BO in our level 0 pt */
 	for (i = 0, level = 0; i < num_entries; level++) {
 		entry = vm->pt_ops->pte_encode_bo(bo, i * XE_PAGE_SIZE,
-						  XE_CACHE_WB, 0);
+						  pat_index, 0);
 
 		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
 
@@ -216,7 +217,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
 		     XE_PAGE_SIZE) {
 			entry = vm->pt_ops->pte_encode_bo(batch, i,
-							  XE_CACHE_WB, 0);
+							  pat_index, 0);
 
 			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
 				  entry);
@@ -241,7 +242,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 			flags = XE_PDE_64K;
 
 		entry = vm->pt_ops->pde_encode_bo(bo, map_ofs + (level - 1) *
-						  XE_PAGE_SIZE, XE_CACHE_WB);
+						  XE_PAGE_SIZE, pat_index);
 		xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE * level, u64,
 			  entry | flags);
 	}
@@ -249,7 +250,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 	/* Write PDE's that point to our BO. */
 	for (i = 0; i < num_entries - num_level; i++) {
 		entry = vm->pt_ops->pde_encode_bo(bo, i * XE_PAGE_SIZE,
-						  XE_CACHE_WB);
+						  pat_index);
 
 		xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE +
 			  (i + 1) * 8, u64, entry);
@@ -261,7 +262,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 
 		level = 2;
 		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
-		flags = vm->pt_ops->pte_encode_addr(xe, 0, XE_CACHE_WB, level,
+		flags = vm->pt_ops->pte_encode_addr(xe, 0, pat_index, level,
 						    true, 0);
 
 		/*
@@ -457,6 +458,7 @@ static void emit_pte(struct xe_migrate *m,
 		     struct xe_res_cursor *cur,
 		     u32 size, struct xe_bo *bo)
 {
+	u16 pat_index = m->tile->xe->pat.idx[XE_CACHE_WB];
 	u32 ptes;
 	u64 ofs = at_pt * XE_PAGE_SIZE;
 	u64 cur_ofs;
@@ -500,7 +502,7 @@ static void emit_pte(struct xe_migrate *m,
 			}
 
 			addr = m->q->vm->pt_ops->pte_encode_addr(m->tile->xe,
-								 addr, XE_CACHE_WB,
+								 addr, pat_index,
 								 0, devmem, flags);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
@@ -1190,6 +1192,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 	bool first_munmap_rebind = vma &&
 		vma->gpuva.flags & XE_VMA_FIRST_REBIND;
 	struct xe_exec_queue *q_override = !q ? m->q : q;
+	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
 
 	/* Use the CPU if no in syncs and engine is idle */
 	if (no_in_syncs(syncs, num_syncs) && xe_exec_queue_is_idle(q_override)) {
@@ -1261,7 +1264,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 
 			xe_tile_assert(tile, pt_bo->size == SZ_4K);
 
-			addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, XE_CACHE_WB, 0);
+			addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, pat_index, 0);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 		}
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 4d4c6a4c305e..92b512641b4a 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -50,6 +50,7 @@ static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
 			     unsigned int level)
 {
+	u16 pat_index = tile_to_xe(tile)->pat.idx[XE_CACHE_WB];
 	u8 id = tile->id;
 
 	if (!vm->scratch_bo[id])
@@ -57,9 +58,9 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
 
 	if (level > 0)
 		return vm->pt_ops->pde_encode_bo(vm->scratch_pt[id][level - 1]->bo,
-						 0, XE_CACHE_WB);
+						 0, pat_index);
 
-	return vm->pt_ops->pte_encode_bo(vm->scratch_bo[id], 0, XE_CACHE_WB, 0);
+	return vm->pt_ops->pte_encode_bo(vm->scratch_bo[id], 0, pat_index, 0);
 }
 
 /**
@@ -510,6 +511,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 {
 	struct xe_pt_stage_bind_walk *xe_walk =
 		container_of(walk, typeof(*xe_walk), base);
+	u16 pat_index = tile_to_xe(xe_walk->tile)->pat.idx[xe_walk->cache];
 	struct xe_pt *xe_parent = container_of(parent, typeof(*xe_parent), base);
 	struct xe_vm *vm = xe_walk->vm;
 	struct xe_pt *xe_child;
@@ -526,7 +528,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 
 		pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
 						 xe_res_dma(curs) + xe_walk->dma_offset,
-						 xe_walk->vma, xe_walk->cache, level);
+						 xe_walk->vma, pat_index, level);
 		pte |= xe_walk->default_pte;
 
 		/*
@@ -591,8 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 			xe_child->is_compact = true;
 		}
 
-		pte = vm->pt_ops->pde_encode_bo(xe_child->bo, 0,
-						xe_walk->cache) | flags;
+		pte = vm->pt_ops->pde_encode_bo(xe_child->bo, 0, pat_index) | flags;
 		ret = xe_pt_insert_entry(xe_walk, xe_parent, offset, xe_child,
 					 pte);
 	}
diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
index bd6645295fe6..355fa8f014e9 100644
--- a/drivers/gpu/drm/xe/xe_pt_types.h
+++ b/drivers/gpu/drm/xe/xe_pt_types.h
@@ -38,14 +38,14 @@ struct xe_pt {
 
 struct xe_pt_ops {
 	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
-			     enum xe_cache_level cache, u32 pt_level);
+			     u16 pat_index, u32 pt_level);
 	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
-			      enum xe_cache_level cache, u32 pt_level);
+			      u16 pat_index, u32 pt_level);
 	u64 (*pte_encode_addr)(struct xe_device *xe, u64 addr,
-			       enum xe_cache_level cache,
+			       u16 pat_index,
 			       u32 pt_level, bool devmem, u64 flags);
 	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
-			     const enum xe_cache_level cache);
+			     const u16 pat_index);
 };
 
 struct xe_pt_entry {
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index beffbb1039d3..962bfd2b0179 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1191,9 +1191,8 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
 	.op_alloc = xe_vm_op_alloc,
 };
 
-static u64 pde_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
+static u64 pde_encode_pat_index(struct xe_device *xe, u16 pat_index)
 {
-	u32 pat_index = xe->pat.idx[cache];
 	u64 pte = 0;
 
 	if (pat_index & BIT(0))
@@ -1205,9 +1204,8 @@ static u64 pde_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
 	return pte;
 }
 
-static u64 pte_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
+static u64 pte_encode_pat_index(struct xe_device *xe, u16 pat_index)
 {
-	u32 pat_index = xe->pat.idx[cache];
 	u64 pte = 0;
 
 	if (pat_index & BIT(0))
@@ -1238,27 +1236,27 @@ static u64 pte_encode_ps(u32 pt_level)
 }
 
 static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
-			      const enum xe_cache_level cache)
+			      const u16 pat_index)
 {
 	struct xe_device *xe = xe_bo_device(bo);
 	u64 pde;
 
 	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
 	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
-	pde |= pde_encode_cache(xe, cache);
+	pde |= pde_encode_pat_index(xe, pat_index);
 
 	return pde;
 }
 
 static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
-			      enum xe_cache_level cache, u32 pt_level)
+			      u16 pat_index, u32 pt_level)
 {
 	struct xe_device *xe = xe_bo_device(bo);
 	u64 pte;
 
 	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
 	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
-	pte |= pte_encode_cache(xe, cache);
+	pte |= pte_encode_pat_index(xe, pat_index);
 	pte |= pte_encode_ps(pt_level);
 
 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
@@ -1268,7 +1266,7 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
 }
 
 static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
-			       enum xe_cache_level cache, u32 pt_level)
+			       u16 pat_index, u32 pt_level)
 {
 	struct xe_device *xe = xe_vma_vm(vma)->xe;
 
@@ -1277,7 +1275,7 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 	if (likely(!xe_vma_read_only(vma)))
 		pte |= XE_PAGE_RW;
 
-	pte |= pte_encode_cache(xe, cache);
+	pte |= pte_encode_pat_index(xe, pat_index);
 	pte |= pte_encode_ps(pt_level);
 
 	if (unlikely(xe_vma_is_null(vma)))
@@ -1287,7 +1285,7 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 }
 
 static u64 xelp_pte_encode_addr(struct xe_device *xe, u64 addr,
-				enum xe_cache_level cache,
+				u16 pat_index,
 				u32 pt_level, bool devmem, u64 flags)
 {
 	u64 pte;
@@ -1297,7 +1295,7 @@ static u64 xelp_pte_encode_addr(struct xe_device *xe, u64 addr,
 
 	pte = addr;
 	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
-	pte |= pte_encode_cache(xe, cache);
+	pte |= pte_encode_pat_index(xe, pat_index);
 	pte |= pte_encode_ps(pt_level);
 
 	if (devmem)
@@ -1701,7 +1699,7 @@ struct xe_vm *xe_vm_lookup(struct xe_file *xef, u32 id)
 u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile)
 {
 	return vm->pt_ops->pde_encode_bo(vm->pt_root[tile->id]->bo, 0,
-					 XE_CACHE_WB);
+					 tile->xe->pat.idx[XE_CACHE_WB]);
 }
 
 static struct dma_fence *
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-xe] [PATCH v4 3/5] drm/xe/uapi: Add support for cache and coherency mode
  2023-09-27 11:00 [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Matthew Auld
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 1/5] drm/xe/pat: trim the xelp PAT table Matthew Auld
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 2/5] drm/xe: directly use pat_index for pte_encode Matthew Auld
@ 2023-09-27 11:00 ` Matthew Auld
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 4/5] drm/xe/pat: annotate pat_index with " Matthew Auld
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Matthew Auld @ 2023-09-27 11:00 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper

From: Pallavi Mishra <pallavi.mishra@intel.com>

Allow userspace to specify the CPU caching mode to use in addition to
coherency modes during object creation. Modify gem create handler and
introduce xe_bo_create_user to replace xe_bo_create. In a later patch we
will support setting the pat_index as part of vm_bind, where expectation
is that the coherency mode extracted from the pat_index must match the
one set at object creation.

v2
  - s/smem_caching/smem_cpu_caching/ and
    s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
  - Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
    just cares that zeroing/swap-in can't be bypassed with the given
    smem_caching mode. (Matt Roper)
  - Fix broken range check for coh_mode and smem_cpu_caching and also
    don't use constant value, but the already defined macros. (José)
  - Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
  - Add note in kernel-doc for dgpu and coherency modes for system
    memory. (José)
v3 (José):
  - Make sure to reject coh_mode == 0 for VRAM-only.
  - Also make sure to actually pass along the (start, end) for
    __xe_bo_create_locked.
v4
  - Drop UC caching mode. Can be added back if we need it. (Matt Roper)
  - s/smem_cpu_caching/cpu_caching. Idea is that VRAM is always WC, but
    that is currently implicit and KMD controlled. Make it explicit in
    the uapi with the limitation that it currently must be WC. For VRAM
    + SYS objects userspace must now select WC. (José)
  - Make sure to initialize bo_flags. (José)

Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Co-authored-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.c       | 97 ++++++++++++++++++++++++++------
 drivers/gpu/drm/xe/xe_bo.h       |  3 +-
 drivers/gpu/drm/xe/xe_bo_types.h | 10 ++++
 drivers/gpu/drm/xe/xe_dma_buf.c  |  5 +-
 include/uapi/drm/xe_drm.h        | 50 +++++++++++++++-
 5 files changed, 143 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 61789c0e88fb..0de463907428 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -326,7 +326,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 	struct xe_device *xe = xe_bo_device(bo);
 	struct xe_ttm_tt *tt;
 	unsigned long extra_pages;
-	enum ttm_caching caching = ttm_cached;
+	enum ttm_caching caching;
 	int err;
 
 	tt = kzalloc(sizeof(*tt), GFP_KERNEL);
@@ -340,13 +340,22 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo,
 		extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size),
 					   PAGE_SIZE);
 
+	switch (bo->cpu_caching) {
+	case XE_GEM_CPU_CACHING_WC:
+		caching = ttm_write_combined;
+		break;
+	default:
+		caching = ttm_cached;
+		break;
+	}
+
 	/*
 	 * Display scanout is always non-coherent with the CPU cache.
 	 *
 	 * For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and
 	 * require a CPU:WC mapping.
 	 */
-	if (bo->flags & XE_BO_SCANOUT_BIT ||
+	if ((!bo->cpu_caching && bo->flags & XE_BO_SCANOUT_BIT) ||
 	    (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_PAGETABLE))
 		caching = ttm_write_combined;
 
@@ -1190,9 +1199,10 @@ void xe_bo_free(struct xe_bo *bo)
 	kfree(bo);
 }
 
-struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
+struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				    struct xe_tile *tile, struct dma_resv *resv,
 				    struct ttm_lru_bulk_move *bulk, size_t size,
+				    u16 cpu_caching, u16 coh_mode,
 				    enum ttm_bo_type type, u32 flags)
 {
 	struct ttm_operation_ctx ctx = {
@@ -1231,6 +1241,8 @@ struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 	bo->tile = tile;
 	bo->size = size;
 	bo->flags = flags;
+	bo->cpu_caching = cpu_caching;
+	bo->coh_mode = coh_mode;
 	bo->ttm.base.funcs = &xe_gem_object_funcs;
 	bo->props.preferred_mem_class = XE_BO_PROPS_INVALID;
 	bo->props.preferred_gt = XE_BO_PROPS_INVALID;
@@ -1316,10 +1328,11 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,
 }
 
 struct xe_bo *
-xe_bo_create_locked_range(struct xe_device *xe,
-			  struct xe_tile *tile, struct xe_vm *vm,
-			  size_t size, u64 start, u64 end,
-			  enum ttm_bo_type type, u32 flags)
+__xe_bo_create_locked(struct xe_device *xe,
+		      struct xe_tile *tile, struct xe_vm *vm,
+		      size_t size, u64 start, u64 end,
+		      u16 cpu_caching, u16 coh_mode,
+		      enum ttm_bo_type type, u32 flags)
 {
 	struct xe_bo *bo = NULL;
 	int err;
@@ -1340,10 +1353,11 @@ xe_bo_create_locked_range(struct xe_device *xe,
 		}
 	}
 
-	bo = __xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
+	bo = ___xe_bo_create_locked(xe, bo, tile, vm ? &vm->resv : NULL,
 				   vm && !xe_vm_in_fault_mode(vm) &&
 				   flags & XE_BO_CREATE_USER_BIT ?
 				   &vm->lru_bulk_move : NULL, size,
+				   cpu_caching, coh_mode,
 				   type, flags);
 	if (IS_ERR(bo))
 		return bo;
@@ -1377,11 +1391,35 @@ xe_bo_create_locked_range(struct xe_device *xe,
 	return ERR_PTR(err);
 }
 
+struct xe_bo *
+xe_bo_create_locked_range(struct xe_device *xe,
+			  struct xe_tile *tile, struct xe_vm *vm,
+			  size_t size, u64 start, u64 end,
+			  enum ttm_bo_type type, u32 flags)
+{
+	return __xe_bo_create_locked(xe, tile, vm, size, start, end, 0, 0, type, flags);
+}
+
 struct xe_bo *xe_bo_create_locked(struct xe_device *xe, struct xe_tile *tile,
 				  struct xe_vm *vm, size_t size,
 				  enum ttm_bo_type type, u32 flags)
 {
-	return xe_bo_create_locked_range(xe, tile, vm, size, 0, ~0ULL, type, flags);
+	return __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL, 0, 0, type, flags);
+}
+
+static struct xe_bo *xe_bo_create_user(struct xe_device *xe, struct xe_tile *tile,
+				       struct xe_vm *vm, size_t size,
+				       u16 cpu_caching, u16 coh_mode,
+				       enum ttm_bo_type type,
+				       u32 flags)
+{
+	struct xe_bo *bo = __xe_bo_create_locked(xe, tile, vm, size, 0, ~0ULL,
+						 cpu_caching, coh_mode, type,
+						 flags | XE_BO_CREATE_USER_BIT);
+	if (!IS_ERR(bo))
+		xe_bo_unlock_vm_held(bo);
+
+	return bo;
 }
 
 struct xe_bo *xe_bo_create(struct xe_device *xe, struct xe_tile *tile,
@@ -1764,11 +1802,11 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct drm_xe_gem_create *args = data;
 	struct xe_vm *vm = NULL;
 	struct xe_bo *bo;
-	unsigned int bo_flags = XE_BO_CREATE_USER_BIT;
+	unsigned int bo_flags;
 	u32 handle;
 	int err;
 
-	if (XE_IOCTL_DBG(xe, args->extensions) || XE_IOCTL_DBG(xe, args->pad) ||
+	if (XE_IOCTL_DBG(xe, args->extensions) ||
 	    XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
 		return -EINVAL;
 
@@ -1795,6 +1833,7 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 	if (XE_IOCTL_DBG(xe, args->size & ~PAGE_MASK))
 		return -EINVAL;
 
+	bo_flags = 0;
 	if (args->flags & XE_GEM_CREATE_FLAG_DEFER_BACKING)
 		bo_flags |= XE_BO_DEFER_BACKING;
 
@@ -1810,6 +1849,26 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 		bo_flags |= XE_BO_NEEDS_CPU_ACCESS;
 	}
 
+	if (XE_IOCTL_DBG(xe, !args->coh_mode ||
+			 args->coh_mode > XE_GEM_COH_AT_LEAST_1WAY))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, !args->cpu_caching ||
+			 args->cpu_caching > XE_GEM_CPU_CACHING_WC))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_CREATE_VRAM_MASK &&
+			 args->cpu_caching != XE_GEM_CPU_CACHING_WC))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, bo_flags & XE_BO_SCANOUT_BIT &&
+			 args->cpu_caching == XE_GEM_CPU_CACHING_WB))
+		return -EINVAL;
+
+	if (XE_IOCTL_DBG(xe, args->coh_mode == XE_GEM_COH_NONE &&
+			 args->cpu_caching == XE_GEM_CPU_CACHING_WB))
+		return -EINVAL;
+
 	if (args->vm_id) {
 		vm = xe_vm_lookup(xef, args->vm_id);
 		if (XE_IOCTL_DBG(xe, !vm))
@@ -1821,8 +1880,10 @@ int xe_gem_create_ioctl(struct drm_device *dev, void *data,
 		}
 	}
 
-	bo = xe_bo_create(xe, NULL, vm, args->size, ttm_bo_type_device,
-			  bo_flags);
+	bo = xe_bo_create_user(xe, NULL, vm, args->size,
+			       args->cpu_caching, args->coh_mode,
+			       ttm_bo_type_device,
+			       bo_flags);
 	if (IS_ERR(bo)) {
 		err = PTR_ERR(bo);
 		goto out_vm;
@@ -2114,10 +2175,12 @@ int xe_bo_dumb_create(struct drm_file *file_priv,
 	args->size = ALIGN(mul_u32_u32(args->pitch, args->height),
 			   page_size);
 
-	bo = xe_bo_create(xe, NULL, NULL, args->size, ttm_bo_type_device,
-			  XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
-			  XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
-			  XE_BO_NEEDS_CPU_ACCESS);
+	bo = xe_bo_create_user(xe, NULL, NULL, args->size,
+			       XE_GEM_CPU_CACHING_WC, XE_GEM_COH_NONE,
+			       ttm_bo_type_device,
+			       XE_BO_CREATE_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
+			       XE_BO_CREATE_USER_BIT | XE_BO_SCANOUT_BIT |
+			       XE_BO_NEEDS_CPU_ACCESS);
 	if (IS_ERR(bo))
 		return PTR_ERR(bo);
 
diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index 5090bdd1e462..eccfa7d187f8 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -83,9 +83,10 @@ struct sg_table;
 struct xe_bo *xe_bo_alloc(void);
 void xe_bo_free(struct xe_bo *bo);
 
-struct xe_bo *__xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
+struct xe_bo *___xe_bo_create_locked(struct xe_device *xe, struct xe_bo *bo,
 				    struct xe_tile *tile, struct dma_resv *resv,
 				    struct ttm_lru_bulk_move *bulk, size_t size,
+				    u16 cpu_caching, u16 coh_mode,
 				    enum ttm_bo_type type, u32 flags);
 struct xe_bo *
 xe_bo_create_locked_range(struct xe_device *xe,
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 051fe990c133..56f7f9a4975f 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -76,6 +76,16 @@ struct xe_bo {
 	struct llist_node freed;
 	/** @created: Whether the bo has passed initial creation */
 	bool created;
+	/**
+	 * @coh_mode: Coherency setting. Currently only used for userspace
+	 * objects.
+	 */
+	u16 coh_mode;
+	/**
+	 * @cpu_caching: CPU caching mode. Currently only used for userspace
+	 * objects.
+	 */
+	u16 cpu_caching;
 };
 
 #define intel_bo_to_drm_bo(bo) (&(bo)->ttm.base)
diff --git a/drivers/gpu/drm/xe/xe_dma_buf.c b/drivers/gpu/drm/xe/xe_dma_buf.c
index cfde3be3b0dc..9da5cffeef13 100644
--- a/drivers/gpu/drm/xe/xe_dma_buf.c
+++ b/drivers/gpu/drm/xe/xe_dma_buf.c
@@ -214,8 +214,9 @@ xe_dma_buf_init_obj(struct drm_device *dev, struct xe_bo *storage,
 	int ret;
 
 	dma_resv_lock(resv, NULL);
-	bo = __xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
-				   ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
+	bo = ___xe_bo_create_locked(xe, storage, NULL, resv, NULL, dma_buf->size,
+				    0, 0, /* Will require 1way or 2way for vm_bind */
+				    ttm_bo_type_sg, XE_BO_CREATE_SYSTEM_BIT);
 	if (IS_ERR(bo)) {
 		ret = PTR_ERR(bo);
 		goto error;
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index d48d8e3c898c..260417b60c41 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -456,8 +456,54 @@ struct drm_xe_gem_create {
 	 */
 	__u32 handle;
 
-	/** @pad: MBZ */
-	__u32 pad;
+	/**
+	 * @coh_mode: The coherency mode for this object. This will limit the
+	 * possible @smem_caching values.
+	 *
+	 * Supported values:
+	 *
+	 * XE_GEM_COH_NONE: GPU access is assumed to be not coherent with
+	 * CPU. CPU caches are not snooped.
+	 *
+	 * XE_GEM_COH_AT_LEAST_1WAY:
+	 *
+	 * CPU-GPU coherency must be at least 1WAY.
+	 *
+	 * If 1WAY then GPU access is coherent with CPU (CPU caches are snooped)
+	 * until GPU acquires. The acquire by the GPU is not tracked by CPU
+	 * caches.
+	 *
+	 * If 2WAY then should be fully coherent between GPU and CPU.  Fully
+	 * tracked by CPU caches. Both CPU and GPU caches are snooped.
+	 *
+	 * Note: On dgpu the GPU device never caches system memory. The device
+	 * should be thought of as always 1WAY coherent, with the addition that
+	 * the GPU never caches system memory. At least on current dgpu HW there
+	 * is no way to turn off snooping so likely the different coherency
+	 * modes of the pat_index make no difference for system memory.
+	 */
+#define XE_GEM_COH_NONE			1
+#define XE_GEM_COH_AT_LEAST_1WAY	2
+	__u16 coh_mode;
+
+	/**
+	 * @cpu_caching: The CPU caching mode to select for this object. If
+	 * mmaping the object the mode selected here will also be used.
+	 *
+	 * Supported values:
+	 *
+	 * XE_GEM_CPU_CACHING_WB: Allocate the pages with write-back caching.
+	 * On iGPU this can't be used for scanout surfaces. The @coh_mode must
+	 * be XE_GEM_COH_AT_LEAST_1WAY. Currently not allowed for objects placed
+	 * in VRAM.
+	 *
+	 * XE_GEM_CPU_CACHING_WC: Allocate the pages as write-combined. This is
+	 * uncached. Any @coh_mode is permitted. Scanout surfaces should likely
+	 * use this. All objects that can be placed in VRAM must use this.
+	 */
+#define XE_GEM_CPU_CACHING_WB                      1
+#define XE_GEM_CPU_CACHING_WC                      2
+	__u16 cpu_caching;
 
 	/** @reserved: Reserved */
 	__u64 reserved[2];
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-xe] [PATCH v4 4/5] drm/xe/pat: annotate pat_index with coherency mode
  2023-09-27 11:00 [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Matthew Auld
                   ` (2 preceding siblings ...)
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 3/5] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
@ 2023-09-27 11:00 ` Matthew Auld
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 5/5] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Matthew Auld @ 2023-09-27 11:00 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper

Future uapi needs to give userspace the ability to select the pat_index
for a given vm_bind. However we need to be able to extract the coherency
mode from the provided pat_index to ensure it matches the coherency mode
set at object creation. There are various security reasons for why this
matters.  However the pat_index itself is very platform specific, so
seems reasonable to annotate each platform definition of the pat table.
On some older platforms there is no explicit coherency mode, so we just
pick whatever makes sense.

v2:
  - Simplify with COH_AT_LEAST_1_WAY
  - Add some kernel-doc
v3 (Matt Roper):
  - Some small tweaks
v4:
  - Rebase

Bspec: 45101, 44235 #xe
Bspec: 70552, 71582, 59400 #xe2
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  2 +-
 drivers/gpu/drm/xe/xe_pat.c          | 62 ++++++++++++++++------------
 drivers/gpu/drm/xe/xe_pat.h          | 28 +++++++++++++
 3 files changed, 65 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 7d0f2109c23a..18af9e29f42f 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -340,7 +340,7 @@ struct xe_device {
 		/** Internal operations to abstract platforms */
 		const struct xe_pat_ops *ops;
 		/** PAT table to program in the HW */
-		const u32 *table;
+		const struct xe_pat_table_entry *table;
 		/** Number of PAT entries */
 		int n_entries;
 		u32 idx[__XE_CACHE_LEVEL_COUNT];
diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index 2c759ffa35ef..5943798b1f02 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -5,6 +5,8 @@
 
 #include "xe_pat.h"
 
+#include <drm/xe_drm.h>
+
 #include "regs/xe_reg_defs.h"
 #include "xe_gt.h"
 #include "xe_gt_mcr.h"
@@ -33,51 +35,58 @@
 #define XELP_PAT_UC				REG_FIELD_PREP(XELP_MEM_TYPE_MASK, 0)
 
 struct xe_pat_ops {
-	void (*program_graphics)(struct xe_gt *gt, const u32 table[], int n_entries);
-	void (*program_media)(struct xe_gt *gt, const u32 table[], int n_entries);
+	void (*program_graphics)(struct xe_gt *gt, const struct xe_pat_table_entry table[], int n_entries);
+	void (*program_media)(struct xe_gt *gt, const struct xe_pat_table_entry table[], int n_entries);
 };
 
-static const u32 xelp_pat_table[] = {
-	[0] = XELP_PAT_WB,
-	[1] = XELP_PAT_WC,
-	[2] = XELP_PAT_WT,
-	[3] = XELP_PAT_UC,
+static const struct xe_pat_table_entry xelp_pat_table[] = {
+	[0] = { XELP_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[1] = { XELP_PAT_WC, XE_GEM_COH_NONE },
+	[2] = { XELP_PAT_WT, XE_GEM_COH_NONE },
+	[3] = { XELP_PAT_UC, XE_GEM_COH_NONE },
 };
 
-static const u32 xehpc_pat_table[] = {
-	[0] = XELP_PAT_UC,
-	[1] = XELP_PAT_WC,
-	[2] = XELP_PAT_WT,
-	[3] = XELP_PAT_WB,
-	[4] = XEHPC_PAT_CLOS(1) | XELP_PAT_WT,
-	[5] = XEHPC_PAT_CLOS(1) | XELP_PAT_WB,
-	[6] = XEHPC_PAT_CLOS(2) | XELP_PAT_WT,
-	[7] = XEHPC_PAT_CLOS(2) | XELP_PAT_WB,
+static const struct xe_pat_table_entry xehpc_pat_table[] = {
+	[0] = { XELP_PAT_UC, XE_GEM_COH_NONE },
+	[1] = { XELP_PAT_WC, XE_GEM_COH_NONE },
+	[2] = { XELP_PAT_WT, XE_GEM_COH_NONE },
+	[3] = { XELP_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[4] = { XEHPC_PAT_CLOS(1) | XELP_PAT_WT, XE_GEM_COH_NONE },
+	[5] = { XEHPC_PAT_CLOS(1) | XELP_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
+	[6] = { XEHPC_PAT_CLOS(2) | XELP_PAT_WT, XE_GEM_COH_NONE },
+	[7] = { XEHPC_PAT_CLOS(2) | XELP_PAT_WB, XE_GEM_COH_AT_LEAST_1WAY },
 };
 
-static const u32 xelpg_pat_table[] = {
-	[0] = XELPG_PAT_0_WB,
-	[1] = XELPG_PAT_1_WT,
-	[2] = XELPG_PAT_3_UC,
-	[3] = XELPG_PAT_0_WB | XELPG_2_COH_1W,
-	[4] = XELPG_PAT_0_WB | XELPG_3_COH_2W,
+static const struct xe_pat_table_entry xelpg_pat_table[] = {
+	[0] = { XELPG_PAT_0_WB, XE_GEM_COH_NONE },
+	[1] = { XELPG_PAT_1_WT, XE_GEM_COH_NONE },
+	[2] = { XELPG_PAT_3_UC, XE_GEM_COH_NONE },
+	[3] = { XELPG_PAT_0_WB | XELPG_2_COH_1W, XE_GEM_COH_AT_LEAST_1WAY },
+	[4] = { XELPG_PAT_0_WB | XELPG_3_COH_2W, XE_GEM_COH_AT_LEAST_1WAY },
 };
 
-static void program_pat(struct xe_gt *gt, const u32 table[], int n_entries)
+u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index)
+{
+	WARN_ON(pat_index >= xe->pat.n_entries);
+	return xe->pat.table[pat_index].coh_mode;
+}
+
+static void program_pat(struct xe_gt *gt, const struct xe_pat_table_entry table[], int n_entries)
 {
 	for (int i = 0; i < n_entries; i++) {
 		struct xe_reg reg = XE_REG(_PAT_INDEX(i));
 
-		xe_mmio_write32(gt, reg, table[i]);
+		xe_mmio_write32(gt, reg, table[i].value);
 	}
 }
 
-static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
+static void program_pat_mcr(struct xe_gt *gt, const struct xe_pat_table_entry table[],
+			    int n_entries)
 {
 	for (int i = 0; i < n_entries; i++) {
 		struct xe_reg_mcr reg_mcr = XE_REG_MCR(_PAT_INDEX(i));
 
-		xe_gt_mcr_multicast_write(gt, reg_mcr, table[i]);
+		xe_gt_mcr_multicast_write(gt, reg_mcr, table[i].value);
 	}
 }
 
@@ -125,6 +134,7 @@ void xe_pat_init_early(struct xe_device *xe)
 		xe->pat.idx[XE_CACHE_WT] = 2;
 		xe->pat.idx[XE_CACHE_WB] = 0;
 	} else if (GRAPHICS_VERx100(xe) <= 1210) {
+		WARN_ON_ONCE(!IS_DGFX(xe) && !xe->info.has_llc);
 		xe->pat.ops = &xelp_pat_ops;
 		xe->pat.table = xelp_pat_table;
 		xe->pat.n_entries = ARRAY_SIZE(xelp_pat_table);
diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
index 744318cab69b..4032c0ef975c 100644
--- a/drivers/gpu/drm/xe/xe_pat.h
+++ b/drivers/gpu/drm/xe/xe_pat.h
@@ -6,9 +6,29 @@
 #ifndef _XE_PAT_H_
 #define _XE_PAT_H_
 
+#include <linux/types.h>
+
 struct xe_gt;
 struct xe_device;
 
+/**
+ * struct xe_pat_table_entry - The pat_index encoding and other meta information.
+ */
+struct xe_pat_table_entry {
+	/**
+	 * @value: The platform specific value encoding the various memory
+	 * attributes (this maps to some fixed pat_index). So things like
+	 * caching, coherency, compression etc can be encoded here.
+	 */
+	u32 value;
+
+	/**
+	 * @coh_mode: The GPU coherency mode that @value maps to. Either
+	 * XE_GEM_COH_NONE or XE_GEM_COH_AT_LEAST_1WAY.
+	 */
+	u16 coh_mode;
+};
+
 /**
  * xe_pat_init_early - SW initialization, setting up data based on device
  * @xe: xe device
@@ -21,4 +41,12 @@ void xe_pat_init_early(struct xe_device *xe);
  */
 void xe_pat_init(struct xe_gt *gt);
 
+/**
+ * xe_pat_index_get_coh_mode - Extract the coherency mode for the given
+ * pat_index.
+ * @xe: xe device
+ * @pat_index: The pat_index to query
+ */
+u16 xe_pat_index_get_coh_mode(struct xe_device *xe, u16 pat_index);
+
 #endif
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-xe] [PATCH v4 5/5] drm/xe/uapi: support pat_index selection with vm_bind
  2023-09-27 11:00 [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Matthew Auld
                   ` (3 preceding siblings ...)
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 4/5] drm/xe/pat: annotate pat_index with " Matthew Auld
@ 2023-09-27 11:00 ` Matthew Auld
  2023-09-27 11:31 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev5) Patchwork
  2023-09-27 16:21 ` [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Souza, Jose
  6 siblings, 0 replies; 11+ messages in thread
From: Matthew Auld @ 2023-09-27 11:00 UTC (permalink / raw)
  To: intel-xe
  Cc: Filip Hazubski, Lucas De Marchi, Carl Zhang, Effie Yu, Matt Roper

Allow userspace to directly control the pat_index for a given vm
binding. This should allow directly controlling the coherency, caching
and potentially other stuff in the future for the ppGTT binding.

The exact meaning behind the pat_index is very platform specific (see
BSpec or PRMs) but effectively maps to some predefined memory
attributes. From the KMD pov we only care about the coherency that is
provided by the pat_index, which falls into either NONE, 1WAY or 2WAY.
The vm_bind coherency mode for the given pat_index needs to be at least
as coherent as the coh_mode that was set at object creation. For
platforms that lack the explicit coherency mode, we treat UC/WT/WC as
NONE and WB as AT_LEAST_1WAY.

For userptr mappings we lack a corresponding gem object, so the expected
coherency mode is instead implicit and must fall into either 1WAY or
2WAY. Trying to use NONE will be rejected by the kernel. For imported
dma-buf (from a different device) the coherency mode is also implicit
and must also be either 1WAY or 2WAY i.e AT_LEAST_1WAY.

v2:
  - Undefined coh_mode(pat_index) can now be treated as programmer
    error. (Matt Roper)
  - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than
    having to match exactly. This ensures imported dma-buf can always
    just use 1way (or even 2way), now that we also bundle 1way/2way into
    at_least_1way. We still require 1way/2way for external dma-buf, but
    the policy can now be the same for self-import, if desired.
  - Use u16 for pat_index in uapi. u32 is massive overkill. (José)
  - Move as much of the pat_index validation as we can into
    vm_bind_ioctl_check_args. (José)
v3 (Matt Roper):
  - Split the pte_encode() refactoring into separate patch.
v4:
  - Rebase

Bspec: 45101, 44235 #xe
Bspec: 70552, 71582, 59400 #xe2
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Pallavi Mishra <pallavi.mishra@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c       | 11 ++----
 drivers/gpu/drm/xe/xe_vm.c       | 61 +++++++++++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_vm_types.h |  7 ++++
 include/uapi/drm/xe_drm.h        | 43 +++++++++++++++++++++-
 4 files changed, 104 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 92b512641b4a..f9f9010dca10 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -290,8 +290,6 @@ struct xe_pt_stage_bind_walk {
 	struct xe_vm *vm;
 	/** @tile: The tile we're building for. */
 	struct xe_tile *tile;
-	/** @cache: Desired cache level for the ptes */
-	enum xe_cache_level cache;
 	/** @default_pte: PTE flag only template. No address is associated */
 	u64 default_pte;
 	/** @dma_offset: DMA offset to add to the PTE. */
@@ -511,7 +509,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 {
 	struct xe_pt_stage_bind_walk *xe_walk =
 		container_of(walk, typeof(*xe_walk), base);
-	u16 pat_index = tile_to_xe(xe_walk->tile)->pat.idx[xe_walk->cache];
+	u16 pat_index = xe_walk->vma->pat_index;
 	struct xe_pt *xe_parent = container_of(parent, typeof(*xe_parent), base);
 	struct xe_vm *vm = xe_walk->vm;
 	struct xe_pt *xe_child;
@@ -654,13 +652,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 		if (vma && vma->gpuva.flags & XE_VMA_ATOMIC_PTE_BIT)
 			xe_walk.default_pte |= XE_USM_PPGTT_PTE_AE;
 		xe_walk.dma_offset = vram_region_gpu_offset(bo->ttm.resource);
-		xe_walk.cache = XE_CACHE_WB;
-	} else {
-		if (!xe_vma_has_no_bo(vma) && bo->flags & XE_BO_SCANOUT_BIT)
-			xe_walk.cache = XE_CACHE_WT;
-		else
-			xe_walk.cache = XE_CACHE_WB;
 	}
+
 	if (!xe_vma_has_no_bo(vma) && xe_bo_is_stolen(bo))
 		xe_walk.dma_offset = xe_ttm_stolen_gpu_offset(xe_bo_device(bo));
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 962bfd2b0179..d9f43ad05969 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -6,6 +6,7 @@
 #include "xe_vm.h"
 
 #include <linux/dma-fence-array.h>
+#include <linux/nospec.h>
 
 #include <drm/drm_exec.h>
 #include <drm/drm_print.h>
@@ -25,6 +26,7 @@
 #include "xe_gt_pagefault.h"
 #include "xe_gt_tlb_invalidation.h"
 #include "xe_migrate.h"
+#include "xe_pat.h"
 #include "xe_pm.h"
 #include "xe_preempt_fence.h"
 #include "xe_pt.h"
@@ -858,7 +860,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 				    u64 start, u64 end,
 				    bool read_only,
 				    bool is_null,
-				    u8 tile_mask)
+				    u8 tile_mask,
+				    u16 pat_index)
 {
 	struct xe_vma *vma;
 	struct xe_tile *tile;
@@ -897,6 +900,8 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 			vma->tile_mask |= 0x1 << id;
 	}
 
+	vma->pat_index = pat_index;
+
 	if (vm->xe->info.platform == XE_PVC)
 		vma->gpuva.flags |= XE_VMA_ATOMIC_PTE_BIT;
 
@@ -2389,7 +2394,7 @@ static void print_op(struct xe_device *xe, struct drm_gpuva_op *op)
 static struct drm_gpuva_ops *
 vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			 u64 bo_offset_or_userptr, u64 addr, u64 range,
-			 u32 operation, u8 tile_mask, u32 region)
+			 u32 operation, u8 tile_mask, u32 region, u16 pat_index)
 {
 	struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
 	struct drm_gpuva_ops *ops;
@@ -2416,6 +2421,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
 			op->tile_mask = tile_mask;
+			op->pat_index = pat_index;
 			op->map.immediate =
 				operation & XE_VM_BIND_FLAG_IMMEDIATE;
 			op->map.read_only =
@@ -2443,6 +2449,7 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 			struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
 			op->tile_mask = tile_mask;
+			op->pat_index = pat_index;
 			op->prefetch.region = region;
 		}
 		break;
@@ -2485,7 +2492,8 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
 }
 
 static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
-			      u8 tile_mask, bool read_only, bool is_null)
+			      u8 tile_mask, bool read_only, bool is_null,
+			      u16 pat_index)
 {
 	struct xe_bo *bo = op->gem.obj ? gem_to_xe_bo(op->gem.obj) : NULL;
 	struct xe_vma *vma;
@@ -2501,7 +2509,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
 	vma = xe_vma_create(vm, bo, op->gem.offset,
 			    op->va.addr, op->va.addr +
 			    op->va.range - 1, read_only, is_null,
-			    tile_mask);
+			    tile_mask, pat_index);
 	if (bo)
 		xe_bo_unlock(bo);
 
@@ -2658,7 +2666,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 			vma = new_vma(vm, &op->base.map,
 				      op->tile_mask, op->map.read_only,
-				      op->map.is_null);
+				      op->map.is_null, op->pat_index);
 			if (IS_ERR(vma)) {
 				err = PTR_ERR(vma);
 				goto free_fence;
@@ -2686,7 +2694,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 				vma = new_vma(vm, op->base.remap.prev,
 					      op->tile_mask, read_only,
-					      is_null);
+					      is_null, op->pat_index);
 				if (IS_ERR(vma)) {
 					err = PTR_ERR(vma);
 					goto free_fence;
@@ -2722,7 +2730,7 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
 
 				vma = new_vma(vm, op->base.remap.next,
 					      op->tile_mask, read_only,
-					      is_null);
+					      is_null, op->pat_index);
 				if (IS_ERR(vma)) {
 					err = PTR_ERR(vma);
 					goto free_fence;
@@ -3235,7 +3243,22 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 		u32 obj = (*bind_ops)[i].obj;
 		u64 obj_offset = (*bind_ops)[i].obj_offset;
 		u32 region = (*bind_ops)[i].region;
+		u16 pat_index = (*bind_ops)[i].pat_index;
 		bool is_null = op & XE_VM_BIND_FLAG_NULL;
+		u16 coh_mode;
+
+		if (XE_IOCTL_DBG(xe, pat_index >= xe->pat.n_entries)) {
+			err = -EINVAL;
+			goto free_bind_ops;
+		}
+
+		pat_index = array_index_nospec(pat_index, xe->pat.n_entries);
+		(*bind_ops)[i].pat_index = pat_index;
+		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
+		if (XE_WARN_ON(!coh_mode || coh_mode > XE_GEM_COH_AT_LEAST_1WAY)) {
+			err = -EINVAL;
+			goto free_bind_ops;
+		}
 
 		if (i == 0) {
 			*async = !!(op & XE_VM_BIND_FLAG_ASYNC);
@@ -3277,6 +3300,8 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_UNMAP_ALL) ||
 		    XE_IOCTL_DBG(xe, obj &&
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
+		    XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE &&
+				 VM_BIND_OP(op) == XE_VM_BIND_OP_MAP_USERPTR) ||
 		    XE_IOCTL_DBG(xe, obj &&
 				 VM_BIND_OP(op) == XE_VM_BIND_OP_PREFETCH) ||
 		    XE_IOCTL_DBG(xe, region &&
@@ -3425,6 +3450,8 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 addr = bind_ops[i].addr;
 		u32 obj = bind_ops[i].obj;
 		u64 obj_offset = bind_ops[i].obj_offset;
+		u16 pat_index = bind_ops[i].pat_index;
+		u16 coh_mode;
 
 		if (!obj)
 			continue;
@@ -3452,6 +3479,23 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 				goto put_obj;
 			}
 		}
+
+		coh_mode = xe_pat_index_get_coh_mode(xe, pat_index);
+		if (bos[i]->coh_mode) {
+			if (XE_IOCTL_DBG(xe, coh_mode < bos[i]->coh_mode)) {
+				err = -EINVAL;
+				goto put_obj;
+			}
+		} else if (XE_IOCTL_DBG(xe, coh_mode == XE_GEM_COH_NONE)) {
+			/*
+			 * Imported dma-buf from a different device should
+			 * require 1way or 2way coherency since we don't know
+			 * how it was mapped on the CPU. Just assume is it
+			 * potentially cached on CPU side.
+			 */
+			err = -EINVAL;
+			goto put_obj;
+		}
 	}
 
 	if (args->num_syncs) {
@@ -3489,10 +3533,11 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 		u64 obj_offset = bind_ops[i].obj_offset;
 		u8 tile_mask = bind_ops[i].tile_mask;
 		u32 region = bind_ops[i].region;
+		u16 pat_index = bind_ops[i].pat_index;
 
 		ops[i] = vm_bind_ioctl_ops_create(vm, bos[i], obj_offset,
 						  addr, range, op, tile_mask,
-						  region);
+						  region, pat_index);
 		if (IS_ERR(ops[i])) {
 			err = PTR_ERR(ops[i]);
 			ops[i] = NULL;
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 1c5553b842d7..692e1cecb64f 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -111,6 +111,11 @@ struct xe_vma {
 	 */
 	u8 tile_present;
 
+	/**
+	 * @pat_index: The pat index to use when encoding the PTEs for this vma.
+	 */
+	u16 pat_index;
+
 	struct {
 		struct list_head rebind_link;
 	} notifier;
@@ -418,6 +423,8 @@ struct xe_vma_op {
 	struct async_op_fence *fence;
 	/** @tile_mask: gt mask for this operation */
 	u8 tile_mask;
+	/** @pat_index: The pat index to use for this operation. */
+	u16 pat_index;
 	/** @flags: operation flags */
 	enum xe_vma_op_flags flags;
 
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 260417b60c41..61012b194f74 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -598,8 +598,49 @@ struct drm_xe_vm_bind_op {
 	 */
 	__u32 obj;
 
+	/**
+	 * @pat_index: The platform defined @pat_index to use for this mapping.
+	 * The index basically maps to some predefined memory attributes,
+	 * including things like caching, coherency, compression etc.  The exact
+	 * meaning of the pat_index is platform specific and defined in the
+	 * Bspec and PRMs.  When the KMD sets up the binding the index here is
+	 * encoded into the ppGTT PTE.
+	 *
+	 * For coherency the @pat_index needs to be least as coherent as
+	 * drm_xe_gem_create.coh_mode. i.e coh_mode(pat_index) >=
+	 * drm_xe_gem_create.coh_mode. The KMD will extract the coherency mode
+	 * from the @pat_index and reject if there is a mismatch (see note below
+	 * for pre-MTL platforms).
+	 *
+	 * Note: On pre-MTL platforms there is only a caching mode and no
+	 * explicit coherency mode, but on such hardware there is always a
+	 * shared-LLC (or is dgpu) so all GT memory accesses are coherent with
+	 * CPU caches even with the caching mode set as uncached.  It's only the
+	 * display engine that is incoherent (on dgpu it must be in VRAM which
+	 * is always mapped as WC on the CPU). However to keep the uapi somewhat
+	 * consistent with newer platforms the KMD groups the different cache
+	 * levels into the following coherency buckets on all pre-MTL platforms:
+	 *
+	 *	ppGTT UC -> XE_GEM_COH_NONE
+	 *	ppGTT WC -> XE_GEM_COH_NONE
+	 *	ppGTT WT -> XE_GEM_COH_NONE
+	 *	ppGTT WB -> XE_GEM_COH_AT_LEAST_1WAY
+	 *
+	 * In practice UC/WC/WT should only ever used for scanout surfaces on
+	 * such platforms (or perhaps in general for dma-buf if shared with
+	 * another device) since it is only the display engine that is actually
+	 * incoherent.  Everything else should typically use WB given that we
+	 * have a shared-LLC.  On MTL+ this completely changes and the HW
+	 * defines the coherency mode as part of the @pat_index, where
+	 * incoherent GT access is possible.
+	 *
+	 * Note: For userptr and externally imported dma-buf the kernel expects
+	 * either 1WAY or 2WAY for the @pat_index.
+	 */
+	__u16 pat_index;
+
 	/** @pad: MBZ */
-	__u32 pad;
+	__u16 pad;
 
 	union {
 		/**
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev5)
  2023-09-27 11:00 [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Matthew Auld
                   ` (4 preceding siblings ...)
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 5/5] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
@ 2023-09-27 11:31 ` Patchwork
  2023-09-27 16:21 ` [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Souza, Jose
  6 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2023-09-27 11:31 UTC (permalink / raw)
  To: Souza, Jose; +Cc: intel-xe

== Series Details ==

Series: PAT and cache coherency support (rev5)
URL   : https://patchwork.freedesktop.org/series/123027/
State : failure

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: fc8ec3c56 drm/xe: Add Wa_18028616096
=== git am output follows ===
error: patch failed: drivers/gpu/drm/xe/xe_pat.c:42
error: drivers/gpu/drm/xe/xe_pat.c: patch does not apply
hint: Use 'git am --show-current-patch' to see the failed patch
Applying: drm/xe/pat: trim the xelp PAT table
Patch failed at 0001 drm/xe/pat: trim the xelp PAT table
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support
  2023-09-27 11:00 [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Matthew Auld
                   ` (5 preceding siblings ...)
  2023-09-27 11:31 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev5) Patchwork
@ 2023-09-27 16:21 ` Souza, Jose
  2023-09-28  7:53   ` Matthew Auld
  6 siblings, 1 reply; 11+ messages in thread
From: Souza, Jose @ 2023-09-27 16:21 UTC (permalink / raw)
  To: intel-xe, Auld,  Matthew

On Wed, 2023-09-27 at 12:00 +0100, Matthew Auld wrote:
> Branch available here:
> https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
> 
> Series directly depends on the patches here:
> https://patchwork.freedesktop.org/series/124225/
> 
> Goal here is to allow userspace to directly control the pat_index when mapping
> memory via the ppGTT, in addtion to the CPU caching mode. This is very much
> needed on newer igpu platforms which allow incoherent GT access, where the
> choice over the cache level and expected coherency is best left to userspace
> depending on their usecase.  In the future there may also be other stuff encoded
> in the pat_index, so giving userspace direct control will also be needed there.
> 
> To support this we added new gem_create uAPI for selecting the CPU cache
> mode to use for system memory, including the expected GPU coherency mode. There
> are various restrictions here for the selected coherency mode and compatible CPU
> cache modes.  With that in place the actual pat_index can now be provided as
> part of vm_bind. The only restriction is that the coherency mode of the
> pat_index must be at least as coherent as the gem_create coherency mode. There
> are also some special cases like with userptr and dma-buf.
> 
> v2:
>   - Loads of improvements/tweaks. Main changes are to now allow
>     gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
>     exactly. This simplifies the dma-buf policy from userspace pov. Also we now
>     only consider COH_NONE and COH_AT_LEAST_1WAY.
> v3:
>   - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
>     fixes.
> v4:
>   - Rebase on Lucas' new series.
>   - Drop UC cache mode.
>   - s/smem_cpu_caching/cpu_caching/. Idea is to make VRAM WC explicit in the
>     uapi, plus make it more future proof.
> 

Thanks for the smem_cpu_caching to cpu_caching change.

This latest version is causing a GuC fw load failure in MTL, I have bisected and it is caused by "drm/xe: directly use pat_index for pte_encode".

[  173.995308] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[12] = 0x00000000
[  173.995388] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[13] = 0x00000000
[  173.995467] xe 0000:00:02.0: [drm:xe_wopcm_init [xe]] WOPCM: 4096K
[  173.995609] xe 0000:00:02.0: [drm:xe_wopcm_init [xe]] GuC WOPCM is already locked [2048K, 832K)
[  174.234667] xe 0000:00:02.0: [drm] GuC load failed: status = 0x80007134
[  174.234681] xe 0000:00:02.0: [drm] GuC load failed: status: Reset = 0, BootROM = 0x1A, UKernel = 0x71, MIA = 0x00, Auth = 0x02
[  174.234690] xe 0000:00:02.0: [drm] 0xcabba9e6 0xdeadfeed 0x00000000 0x00000078
[  174.234697] xe 0000:00:02.0: [drm] 0x00010000 0x00000000 0x0000fff0 0x00000000
[  174.234703] xe 0000:00:02.0: [drm] 0x00000002 0xcabba9e6 0x8086dead 0x00000000
[  174.234709] xe 0000:00:02.0: [drm] 0x00000000 0x00002000 0x00000000 0x00002000
[  174.234714] xe 0000:00:02.0: [drm] 0x00000000 0x00000002 0xcabba9f6 0xbeeffeed
[  174.234719] xe 0000:00:02.0: [drm] 0x00000000 0x00000000 0x00004000 0x00000000
[  174.234724] xe 0000:00:02.0: [drm] 0x00004000 0x00000000 0x00000002 0x8086900d
[  174.234730] xe 0000:00:02.0: [drm] 0x00010000 0x00000006 0x00010001 0x00460606
[  174.234735] xe 0000:00:02.0: [drm] 0x00020001 0x00004050 0x00030001 0x00004b00
[  174.234741] xe 0000:00:02.0: [drm] 0x00000000 0x00000000 0x00000000 0x00000000


uAPI wise it need some renames to follow with the uAPI alignment series(https://patchwork.freedesktop.org/series/124271/) take a look at
https://patchwork.freedesktop.org/patch/559576/?series=124271&rev=1
https://patchwork.freedesktop.org/patch/559577/?series=124271&rev=1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH v4 2/5] drm/xe: directly use pat_index for pte_encode
  2023-09-27 11:00 ` [Intel-xe] [PATCH v4 2/5] drm/xe: directly use pat_index for pte_encode Matthew Auld
@ 2023-09-28  4:41   ` Niranjana Vishwanathapura
  2023-09-28  7:25     ` Matthew Auld
  0 siblings, 1 reply; 11+ messages in thread
From: Niranjana Vishwanathapura @ 2023-09-28  4:41 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Lucas De Marchi, Matt Roper, intel-xe

On Wed, Sep 27, 2023 at 12:00:08PM +0100, Matthew Auld wrote:
>In the next patch userspace will be able to directly set the pat_index
>as part of vm_bind. To support this we need to get away from using
>xe_cache_level in the low level routines and rather just use the
>pat_index directly.
>
>v2: Rebase
>
>Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>Cc: Matt Roper <matthew.d.roper@intel.com>
>---
> drivers/gpu/drm/xe/xe_ggtt.c       |  7 +++----
> drivers/gpu/drm/xe/xe_ggtt_types.h |  3 +--
> drivers/gpu/drm/xe/xe_migrate.c    | 19 +++++++++++--------
> drivers/gpu/drm/xe/xe_pt.c         | 11 ++++++-----
> drivers/gpu/drm/xe/xe_pt_types.h   |  8 ++++----
> drivers/gpu/drm/xe/xe_vm.c         | 24 +++++++++++-------------
> 6 files changed, 36 insertions(+), 36 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
>index 99b54794917e..2334c47c19cc 100644
>--- a/drivers/gpu/drm/xe/xe_ggtt.c
>+++ b/drivers/gpu/drm/xe/xe_ggtt.c
>@@ -27,7 +27,7 @@
> #define GUC_GGTT_TOP	0xFEE00000
>
> static u64 xelp_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>-				   enum xe_cache_level cache)
>+				   u16 pat_index)
> {
> 	u64 pte;
>
>@@ -41,13 +41,12 @@ static u64 xelp_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
> }
>
> static u64 xelpg_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>-				    enum xe_cache_level cache)
>+				    u16 pat_index)
> {
> 	struct xe_device *xe = xe_bo_device(bo);
>-	u32 pat_index = xe->pat.idx[cache];
> 	u64 pte;
>
>-	pte = xelp_ggtt_pte_encode_bo(bo, bo_offset, cache);
>+	pte = xelp_ggtt_pte_encode_bo(bo, bo_offset, pat_index);
>
> 	xe_assert(xe, pat_index <= 3);
>

Looks like this file has couple instances of pte_encode_bo() calls which
needs to be updated to use pat_index instead of cache level.

>diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
>index 486016ea5b67..d8c584d9a8c3 100644
>--- a/drivers/gpu/drm/xe/xe_ggtt_types.h
>+++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
>@@ -14,8 +14,7 @@ struct xe_bo;
> struct xe_gt;
>
> struct xe_ggtt_pt_ops {
>-	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>-			     enum xe_cache_level cache);
>+	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset, u16 pat_index);
> };
>
> struct xe_ggtt {
>diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>index 258c2269c916..90a1ff1aca9b 100644
>--- a/drivers/gpu/drm/xe/xe_migrate.c
>+++ b/drivers/gpu/drm/xe/xe_migrate.c
>@@ -158,6 +158,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> 				 struct xe_vm *vm)
> {
> 	struct xe_device *xe = tile_to_xe(tile);
>+	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
> 	u8 id = tile->id;
> 	u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
> 	u32 map_ofs, level, i;
>@@ -189,7 +190,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> 		return ret;
> 	}
>
>-	entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, XE_CACHE_WB);
>+	entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, pat_index);
> 	xe_pt_write(xe, &vm->pt_root[id]->bo->vmap, 0, entry);
>
> 	map_ofs = (num_entries - num_level) * XE_PAGE_SIZE;
>@@ -197,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> 	/* Map the entire BO in our level 0 pt */
> 	for (i = 0, level = 0; i < num_entries; level++) {
> 		entry = vm->pt_ops->pte_encode_bo(bo, i * XE_PAGE_SIZE,
>-						  XE_CACHE_WB, 0);
>+						  pat_index, 0);
>
> 		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>
>@@ -216,7 +217,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> 		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
> 		     XE_PAGE_SIZE) {
> 			entry = vm->pt_ops->pte_encode_bo(batch, i,
>-							  XE_CACHE_WB, 0);
>+							  pat_index, 0);
>
> 			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
> 				  entry);
>@@ -241,7 +242,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> 			flags = XE_PDE_64K;
>
> 		entry = vm->pt_ops->pde_encode_bo(bo, map_ofs + (level - 1) *
>-						  XE_PAGE_SIZE, XE_CACHE_WB);
>+						  XE_PAGE_SIZE, pat_index);
> 		xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE * level, u64,
> 			  entry | flags);
> 	}
>@@ -249,7 +250,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
> 	/* Write PDE's that point to our BO. */
> 	for (i = 0; i < num_entries - num_level; i++) {
> 		entry = vm->pt_ops->pde_encode_bo(bo, i * XE_PAGE_SIZE,
>-						  XE_CACHE_WB);
>+						  pat_index);
>
> 		xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE +
> 			  (i + 1) * 8, u64, entry);
>@@ -261,7 +262,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>
> 		level = 2;
> 		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>-		flags = vm->pt_ops->pte_encode_addr(xe, 0, XE_CACHE_WB, level,
>+		flags = vm->pt_ops->pte_encode_addr(xe, 0, pat_index, level,
> 						    true, 0);
>
> 		/*
>@@ -457,6 +458,7 @@ static void emit_pte(struct xe_migrate *m,
> 		     struct xe_res_cursor *cur,
> 		     u32 size, struct xe_bo *bo)
> {
>+	u16 pat_index = m->tile->xe->pat.idx[XE_CACHE_WB];

NIT...probably use tile_to_xe() instead of tile->xe here and elsewhere
just to be consistent?

> 	u32 ptes;
> 	u64 ofs = at_pt * XE_PAGE_SIZE;
> 	u64 cur_ofs;
>@@ -500,7 +502,7 @@ static void emit_pte(struct xe_migrate *m,
> 			}
>
> 			addr = m->q->vm->pt_ops->pte_encode_addr(m->tile->xe,
>-								 addr, XE_CACHE_WB,
>+								 addr, pat_index,
> 								 0, devmem, flags);
> 			bb->cs[bb->len++] = lower_32_bits(addr);
> 			bb->cs[bb->len++] = upper_32_bits(addr);
>@@ -1190,6 +1192,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
> 	bool first_munmap_rebind = vma &&
> 		vma->gpuva.flags & XE_VMA_FIRST_REBIND;
> 	struct xe_exec_queue *q_override = !q ? m->q : q;
>+	u16 pat_index = xe->pat.idx[XE_CACHE_WB];
>
> 	/* Use the CPU if no in syncs and engine is idle */
> 	if (no_in_syncs(syncs, num_syncs) && xe_exec_queue_is_idle(q_override)) {
>@@ -1261,7 +1264,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>
> 			xe_tile_assert(tile, pt_bo->size == SZ_4K);
>
>-			addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, XE_CACHE_WB, 0);
>+			addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, pat_index, 0);
> 			bb->cs[bb->len++] = lower_32_bits(addr);
> 			bb->cs[bb->len++] = upper_32_bits(addr);
> 		}
>diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>index 4d4c6a4c305e..92b512641b4a 100644
>--- a/drivers/gpu/drm/xe/xe_pt.c
>+++ b/drivers/gpu/drm/xe/xe_pt.c
>@@ -50,6 +50,7 @@ static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
> static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
> 			     unsigned int level)
> {
>+	u16 pat_index = tile_to_xe(tile)->pat.idx[XE_CACHE_WB];
> 	u8 id = tile->id;
>
> 	if (!vm->scratch_bo[id])
>@@ -57,9 +58,9 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>
> 	if (level > 0)
> 		return vm->pt_ops->pde_encode_bo(vm->scratch_pt[id][level - 1]->bo,
>-						 0, XE_CACHE_WB);
>+						 0, pat_index);
>
>-	return vm->pt_ops->pte_encode_bo(vm->scratch_bo[id], 0, XE_CACHE_WB, 0);
>+	return vm->pt_ops->pte_encode_bo(vm->scratch_bo[id], 0, pat_index, 0);
> }
>
> /**
>@@ -510,6 +511,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
> {
> 	struct xe_pt_stage_bind_walk *xe_walk =
> 		container_of(walk, typeof(*xe_walk), base);
>+	u16 pat_index = tile_to_xe(xe_walk->tile)->pat.idx[xe_walk->cache];

why not change xe_walk->cache to a xe_walk->pat_index?

Niranjana

> 	struct xe_pt *xe_parent = container_of(parent, typeof(*xe_parent), base);
> 	struct xe_vm *vm = xe_walk->vm;
> 	struct xe_pt *xe_child;
>@@ -526,7 +528,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>
> 		pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
> 						 xe_res_dma(curs) + xe_walk->dma_offset,
>-						 xe_walk->vma, xe_walk->cache, level);
>+						 xe_walk->vma, pat_index, level);
> 		pte |= xe_walk->default_pte;
>
> 		/*
>@@ -591,8 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
> 			xe_child->is_compact = true;
> 		}
>
>-		pte = vm->pt_ops->pde_encode_bo(xe_child->bo, 0,
>-						xe_walk->cache) | flags;
>+		pte = vm->pt_ops->pde_encode_bo(xe_child->bo, 0, pat_index) | flags;
> 		ret = xe_pt_insert_entry(xe_walk, xe_parent, offset, xe_child,
> 					 pte);
> 	}
>diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
>index bd6645295fe6..355fa8f014e9 100644
>--- a/drivers/gpu/drm/xe/xe_pt_types.h
>+++ b/drivers/gpu/drm/xe/xe_pt_types.h
>@@ -38,14 +38,14 @@ struct xe_pt {
>
> struct xe_pt_ops {
> 	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>-			     enum xe_cache_level cache, u32 pt_level);
>+			     u16 pat_index, u32 pt_level);
> 	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
>-			      enum xe_cache_level cache, u32 pt_level);
>+			      u16 pat_index, u32 pt_level);
> 	u64 (*pte_encode_addr)(struct xe_device *xe, u64 addr,
>-			       enum xe_cache_level cache,
>+			       u16 pat_index,
> 			       u32 pt_level, bool devmem, u64 flags);
> 	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>-			     const enum xe_cache_level cache);
>+			     const u16 pat_index);
> };
>
> struct xe_pt_entry {
>diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>index beffbb1039d3..962bfd2b0179 100644
>--- a/drivers/gpu/drm/xe/xe_vm.c
>+++ b/drivers/gpu/drm/xe/xe_vm.c
>@@ -1191,9 +1191,8 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
> 	.op_alloc = xe_vm_op_alloc,
> };
>
>-static u64 pde_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
>+static u64 pde_encode_pat_index(struct xe_device *xe, u16 pat_index)
> {
>-	u32 pat_index = xe->pat.idx[cache];
> 	u64 pte = 0;
>
> 	if (pat_index & BIT(0))
>@@ -1205,9 +1204,8 @@ static u64 pde_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
> 	return pte;
> }
>
>-static u64 pte_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
>+static u64 pte_encode_pat_index(struct xe_device *xe, u16 pat_index)
> {
>-	u32 pat_index = xe->pat.idx[cache];
> 	u64 pte = 0;
>
> 	if (pat_index & BIT(0))
>@@ -1238,27 +1236,27 @@ static u64 pte_encode_ps(u32 pt_level)
> }
>
> static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
>-			      const enum xe_cache_level cache)
>+			      const u16 pat_index)
> {
> 	struct xe_device *xe = xe_bo_device(bo);
> 	u64 pde;
>
> 	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> 	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
>-	pde |= pde_encode_cache(xe, cache);
>+	pde |= pde_encode_pat_index(xe, pat_index);
>
> 	return pde;
> }
>
> static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>-			      enum xe_cache_level cache, u32 pt_level)
>+			      u16 pat_index, u32 pt_level)
> {
> 	struct xe_device *xe = xe_bo_device(bo);
> 	u64 pte;
>
> 	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> 	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>-	pte |= pte_encode_cache(xe, cache);
>+	pte |= pte_encode_pat_index(xe, pat_index);
> 	pte |= pte_encode_ps(pt_level);
>
> 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>@@ -1268,7 +1266,7 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
> }
>
> static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>-			       enum xe_cache_level cache, u32 pt_level)
>+			       u16 pat_index, u32 pt_level)
> {
> 	struct xe_device *xe = xe_vma_vm(vma)->xe;
>
>@@ -1277,7 +1275,7 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
> 	if (likely(!xe_vma_read_only(vma)))
> 		pte |= XE_PAGE_RW;
>
>-	pte |= pte_encode_cache(xe, cache);
>+	pte |= pte_encode_pat_index(xe, pat_index);
> 	pte |= pte_encode_ps(pt_level);
>
> 	if (unlikely(xe_vma_is_null(vma)))
>@@ -1287,7 +1285,7 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
> }
>
> static u64 xelp_pte_encode_addr(struct xe_device *xe, u64 addr,
>-				enum xe_cache_level cache,
>+				u16 pat_index,
> 				u32 pt_level, bool devmem, u64 flags)
> {
> 	u64 pte;
>@@ -1297,7 +1295,7 @@ static u64 xelp_pte_encode_addr(struct xe_device *xe, u64 addr,
>
> 	pte = addr;
> 	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>-	pte |= pte_encode_cache(xe, cache);
>+	pte |= pte_encode_pat_index(xe, pat_index);
> 	pte |= pte_encode_ps(pt_level);
>
> 	if (devmem)
>@@ -1701,7 +1699,7 @@ struct xe_vm *xe_vm_lookup(struct xe_file *xef, u32 id)
> u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile)
> {
> 	return vm->pt_ops->pde_encode_bo(vm->pt_root[tile->id]->bo, 0,
>-					 XE_CACHE_WB);
>+					 tile->xe->pat.idx[XE_CACHE_WB]);
> }
>
> static struct dma_fence *
>-- 
>2.41.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH v4 2/5] drm/xe: directly use pat_index for pte_encode
  2023-09-28  4:41   ` Niranjana Vishwanathapura
@ 2023-09-28  7:25     ` Matthew Auld
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Auld @ 2023-09-28  7:25 UTC (permalink / raw)
  To: Niranjana Vishwanathapura; +Cc: Lucas De Marchi, Matt Roper, intel-xe

On 28/09/2023 05:41, Niranjana Vishwanathapura wrote:
> On Wed, Sep 27, 2023 at 12:00:08PM +0100, Matthew Auld wrote:
>> In the next patch userspace will be able to directly set the pat_index
>> as part of vm_bind. To support this we need to get away from using
>> xe_cache_level in the low level routines and rather just use the
>> pat_index directly.
>>
>> v2: Rebase
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Pallavi Mishra <pallavi.mishra@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_ggtt.c       |  7 +++----
>> drivers/gpu/drm/xe/xe_ggtt_types.h |  3 +--
>> drivers/gpu/drm/xe/xe_migrate.c    | 19 +++++++++++--------
>> drivers/gpu/drm/xe/xe_pt.c         | 11 ++++++-----
>> drivers/gpu/drm/xe/xe_pt_types.h   |  8 ++++----
>> drivers/gpu/drm/xe/xe_vm.c         | 24 +++++++++++-------------
>> 6 files changed, 36 insertions(+), 36 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
>> index 99b54794917e..2334c47c19cc 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt.c
>> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
>> @@ -27,7 +27,7 @@
>> #define GUC_GGTT_TOP    0xFEE00000
>>
>> static u64 xelp_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>> -                   enum xe_cache_level cache)
>> +                   u16 pat_index)
>> {
>>     u64 pte;
>>
>> @@ -41,13 +41,12 @@ static u64 xelp_ggtt_pte_encode_bo(struct xe_bo 
>> *bo, u64 bo_offset,
>> }
>>
>> static u64 xelpg_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>> -                    enum xe_cache_level cache)
>> +                    u16 pat_index)
>> {
>>     struct xe_device *xe = xe_bo_device(bo);
>> -    u32 pat_index = xe->pat.idx[cache];
>>     u64 pte;
>>
>> -    pte = xelp_ggtt_pte_encode_bo(bo, bo_offset, cache);
>> +    pte = xelp_ggtt_pte_encode_bo(bo, bo_offset, pat_index);
>>
>>     xe_assert(xe, pat_index <= 3);
>>
> 
> Looks like this file has couple instances of pte_encode_bo() calls which
> needs to be updated to use pat_index instead of cache level.

Indeed, I missed a few it seems. Thanks for catching that.

> 
>> diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h 
>> b/drivers/gpu/drm/xe/xe_ggtt_types.h
>> index 486016ea5b67..d8c584d9a8c3 100644
>> --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
>> @@ -14,8 +14,7 @@ struct xe_bo;
>> struct xe_gt;
>>
>> struct xe_ggtt_pt_ops {
>> -    u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>> -                 enum xe_cache_level cache);
>> +    u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset, u16 
>> pat_index);
>> };
>>
>> struct xe_ggtt {
>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c 
>> b/drivers/gpu/drm/xe/xe_migrate.c
>> index 258c2269c916..90a1ff1aca9b 100644
>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>> @@ -158,6 +158,7 @@ static int xe_migrate_prepare_vm(struct xe_tile 
>> *tile, struct xe_migrate *m,
>>                  struct xe_vm *vm)
>> {
>>     struct xe_device *xe = tile_to_xe(tile);
>> +    u16 pat_index = xe->pat.idx[XE_CACHE_WB];
>>     u8 id = tile->id;
>>     u32 num_entries = NUM_PT_SLOTS, num_level = vm->pt_root[id]->level;
>>     u32 map_ofs, level, i;
>> @@ -189,7 +190,7 @@ static int xe_migrate_prepare_vm(struct xe_tile 
>> *tile, struct xe_migrate *m,
>>         return ret;
>>     }
>>
>> -    entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, 
>> XE_CACHE_WB);
>> +    entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, 
>> pat_index);
>>     xe_pt_write(xe, &vm->pt_root[id]->bo->vmap, 0, entry);
>>
>>     map_ofs = (num_entries - num_level) * XE_PAGE_SIZE;
>> @@ -197,7 +198,7 @@ static int xe_migrate_prepare_vm(struct xe_tile 
>> *tile, struct xe_migrate *m,
>>     /* Map the entire BO in our level 0 pt */
>>     for (i = 0, level = 0; i < num_entries; level++) {
>>         entry = vm->pt_ops->pte_encode_bo(bo, i * XE_PAGE_SIZE,
>> -                          XE_CACHE_WB, 0);
>> +                          pat_index, 0);
>>
>>         xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>>
>> @@ -216,7 +217,7 @@ static int xe_migrate_prepare_vm(struct xe_tile 
>> *tile, struct xe_migrate *m,
>>              i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
>>              XE_PAGE_SIZE) {
>>             entry = vm->pt_ops->pte_encode_bo(batch, i,
>> -                              XE_CACHE_WB, 0);
>> +                              pat_index, 0);
>>
>>             xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
>>                   entry);
>> @@ -241,7 +242,7 @@ static int xe_migrate_prepare_vm(struct xe_tile 
>> *tile, struct xe_migrate *m,
>>             flags = XE_PDE_64K;
>>
>>         entry = vm->pt_ops->pde_encode_bo(bo, map_ofs + (level - 1) *
>> -                          XE_PAGE_SIZE, XE_CACHE_WB);
>> +                          XE_PAGE_SIZE, pat_index);
>>         xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE * level, u64,
>>               entry | flags);
>>     }
>> @@ -249,7 +250,7 @@ static int xe_migrate_prepare_vm(struct xe_tile 
>> *tile, struct xe_migrate *m,
>>     /* Write PDE's that point to our BO. */
>>     for (i = 0; i < num_entries - num_level; i++) {
>>         entry = vm->pt_ops->pde_encode_bo(bo, i * XE_PAGE_SIZE,
>> -                          XE_CACHE_WB);
>> +                          pat_index);
>>
>>         xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE +
>>               (i + 1) * 8, u64, entry);
>> @@ -261,7 +262,7 @@ static int xe_migrate_prepare_vm(struct xe_tile 
>> *tile, struct xe_migrate *m,
>>
>>         level = 2;
>>         ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>> -        flags = vm->pt_ops->pte_encode_addr(xe, 0, XE_CACHE_WB, level,
>> +        flags = vm->pt_ops->pte_encode_addr(xe, 0, pat_index, level,
>>                             true, 0);
>>
>>         /*
>> @@ -457,6 +458,7 @@ static void emit_pte(struct xe_migrate *m,
>>              struct xe_res_cursor *cur,
>>              u32 size, struct xe_bo *bo)
>> {
>> +    u16 pat_index = m->tile->xe->pat.idx[XE_CACHE_WB];
> 
> NIT...probably use tile_to_xe() instead of tile->xe here and elsewhere
> just to be consistent?

Ok, will fix.

> 
>>     u32 ptes;
>>     u64 ofs = at_pt * XE_PAGE_SIZE;
>>     u64 cur_ofs;
>> @@ -500,7 +502,7 @@ static void emit_pte(struct xe_migrate *m,
>>             }
>>
>>             addr = m->q->vm->pt_ops->pte_encode_addr(m->tile->xe,
>> -                                 addr, XE_CACHE_WB,
>> +                                 addr, pat_index,
>>                                  0, devmem, flags);
>>             bb->cs[bb->len++] = lower_32_bits(addr);
>>             bb->cs[bb->len++] = upper_32_bits(addr);
>> @@ -1190,6 +1192,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>>     bool first_munmap_rebind = vma &&
>>         vma->gpuva.flags & XE_VMA_FIRST_REBIND;
>>     struct xe_exec_queue *q_override = !q ? m->q : q;
>> +    u16 pat_index = xe->pat.idx[XE_CACHE_WB];
>>
>>     /* Use the CPU if no in syncs and engine is idle */
>>     if (no_in_syncs(syncs, num_syncs) && 
>> xe_exec_queue_is_idle(q_override)) {
>> @@ -1261,7 +1264,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>>
>>             xe_tile_assert(tile, pt_bo->size == SZ_4K);
>>
>> -            addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, XE_CACHE_WB, 0);
>> +            addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, pat_index, 0);
>>             bb->cs[bb->len++] = lower_32_bits(addr);
>>             bb->cs[bb->len++] = upper_32_bits(addr);
>>         }
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index 4d4c6a4c305e..92b512641b4a 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -50,6 +50,7 @@ static struct xe_pt *xe_pt_entry(struct xe_pt_dir 
>> *pt_dir, unsigned int index)
>> static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>>                  unsigned int level)
>> {
>> +    u16 pat_index = tile_to_xe(tile)->pat.idx[XE_CACHE_WB];
>>     u8 id = tile->id;
>>
>>     if (!vm->scratch_bo[id])
>> @@ -57,9 +58,9 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, 
>> struct xe_vm *vm,
>>
>>     if (level > 0)
>>         return vm->pt_ops->pde_encode_bo(vm->scratch_pt[id][level - 
>> 1]->bo,
>> -                         0, XE_CACHE_WB);
>> +                         0, pat_index);
>>
>> -    return vm->pt_ops->pte_encode_bo(vm->scratch_bo[id], 0, 
>> XE_CACHE_WB, 0);
>> +    return vm->pt_ops->pte_encode_bo(vm->scratch_bo[id], 0, 
>> pat_index, 0);
>> }
>>
>> /**
>> @@ -510,6 +511,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, 
>> pgoff_t offset,
>> {
>>     struct xe_pt_stage_bind_walk *xe_walk =
>>         container_of(walk, typeof(*xe_walk), base);
>> +    u16 pat_index = tile_to_xe(xe_walk->tile)->pat.idx[xe_walk->cache];
> 
> why not change xe_walk->cache to a xe_walk->pat_index?

In the later vm_bind patch the xe_walk->[cache, pat_index] is anyway 
removed and instead extracted directly from the vma, so figured there 
was not much point. The above line just becomes xe_walk->vma->pat_index.

> 
> Niranjana
> 
>>     struct xe_pt *xe_parent = container_of(parent, typeof(*xe_parent), 
>> base);
>>     struct xe_vm *vm = xe_walk->vm;
>>     struct xe_pt *xe_child;
>> @@ -526,7 +528,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, 
>> pgoff_t offset,
>>
>>         pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
>>                          xe_res_dma(curs) + xe_walk->dma_offset,
>> -                         xe_walk->vma, xe_walk->cache, level);
>> +                         xe_walk->vma, pat_index, level);
>>         pte |= xe_walk->default_pte;
>>
>>         /*
>> @@ -591,8 +593,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, 
>> pgoff_t offset,
>>             xe_child->is_compact = true;
>>         }
>>
>> -        pte = vm->pt_ops->pde_encode_bo(xe_child->bo, 0,
>> -                        xe_walk->cache) | flags;
>> +        pte = vm->pt_ops->pde_encode_bo(xe_child->bo, 0, pat_index) | 
>> flags;
>>         ret = xe_pt_insert_entry(xe_walk, xe_parent, offset, xe_child,
>>                      pte);
>>     }
>> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h 
>> b/drivers/gpu/drm/xe/xe_pt_types.h
>> index bd6645295fe6..355fa8f014e9 100644
>> --- a/drivers/gpu/drm/xe/xe_pt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
>> @@ -38,14 +38,14 @@ struct xe_pt {
>>
>> struct xe_pt_ops {
>>     u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>> -                 enum xe_cache_level cache, u32 pt_level);
>> +                 u16 pat_index, u32 pt_level);
>>     u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
>> -                  enum xe_cache_level cache, u32 pt_level);
>> +                  u16 pat_index, u32 pt_level);
>>     u64 (*pte_encode_addr)(struct xe_device *xe, u64 addr,
>> -                   enum xe_cache_level cache,
>> +                   u16 pat_index,
>>                    u32 pt_level, bool devmem, u64 flags);
>>     u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>> -                 const enum xe_cache_level cache);
>> +                 const u16 pat_index);
>> };
>>
>> struct xe_pt_entry {
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index beffbb1039d3..962bfd2b0179 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -1191,9 +1191,8 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
>>     .op_alloc = xe_vm_op_alloc,
>> };
>>
>> -static u64 pde_encode_cache(struct xe_device *xe, enum xe_cache_level 
>> cache)
>> +static u64 pde_encode_pat_index(struct xe_device *xe, u16 pat_index)
>> {
>> -    u32 pat_index = xe->pat.idx[cache];
>>     u64 pte = 0;
>>
>>     if (pat_index & BIT(0))
>> @@ -1205,9 +1204,8 @@ static u64 pde_encode_cache(struct xe_device 
>> *xe, enum xe_cache_level cache)
>>     return pte;
>> }
>>
>> -static u64 pte_encode_cache(struct xe_device *xe, enum xe_cache_level 
>> cache)
>> +static u64 pte_encode_pat_index(struct xe_device *xe, u16 pat_index)
>> {
>> -    u32 pat_index = xe->pat.idx[cache];
>>     u64 pte = 0;
>>
>>     if (pat_index & BIT(0))
>> @@ -1238,27 +1236,27 @@ static u64 pte_encode_ps(u32 pt_level)
>> }
>>
>> static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
>> -                  const enum xe_cache_level cache)
>> +                  const u16 pat_index)
>> {
>>     struct xe_device *xe = xe_bo_device(bo);
>>     u64 pde;
>>
>>     pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>>     pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> -    pde |= pde_encode_cache(xe, cache);
>> +    pde |= pde_encode_pat_index(xe, pat_index);
>>
>>     return pde;
>> }
>>
>> static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>> -                  enum xe_cache_level cache, u32 pt_level)
>> +                  u16 pat_index, u32 pt_level)
>> {
>>     struct xe_device *xe = xe_bo_device(bo);
>>     u64 pte;
>>
>>     pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>>     pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> -    pte |= pte_encode_cache(xe, cache);
>> +    pte |= pte_encode_pat_index(xe, pat_index);
>>     pte |= pte_encode_ps(pt_level);
>>
>>     if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>> @@ -1268,7 +1266,7 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, 
>> u64 bo_offset,
>> }
>>
>> static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>> -                   enum xe_cache_level cache, u32 pt_level)
>> +                   u16 pat_index, u32 pt_level)
>> {
>>     struct xe_device *xe = xe_vma_vm(vma)->xe;
>>
>> @@ -1277,7 +1275,7 @@ static u64 xelp_pte_encode_vma(u64 pte, struct 
>> xe_vma *vma,
>>     if (likely(!xe_vma_read_only(vma)))
>>         pte |= XE_PAGE_RW;
>>
>> -    pte |= pte_encode_cache(xe, cache);
>> +    pte |= pte_encode_pat_index(xe, pat_index);
>>     pte |= pte_encode_ps(pt_level);
>>
>>     if (unlikely(xe_vma_is_null(vma)))
>> @@ -1287,7 +1285,7 @@ static u64 xelp_pte_encode_vma(u64 pte, struct 
>> xe_vma *vma,
>> }
>>
>> static u64 xelp_pte_encode_addr(struct xe_device *xe, u64 addr,
>> -                enum xe_cache_level cache,
>> +                u16 pat_index,
>>                 u32 pt_level, bool devmem, u64 flags)
>> {
>>     u64 pte;
>> @@ -1297,7 +1295,7 @@ static u64 xelp_pte_encode_addr(struct xe_device 
>> *xe, u64 addr,
>>
>>     pte = addr;
>>     pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> -    pte |= pte_encode_cache(xe, cache);
>> +    pte |= pte_encode_pat_index(xe, pat_index);
>>     pte |= pte_encode_ps(pt_level);
>>
>>     if (devmem)
>> @@ -1701,7 +1699,7 @@ struct xe_vm *xe_vm_lookup(struct xe_file *xef, 
>> u32 id)
>> u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile)
>> {
>>     return vm->pt_ops->pde_encode_bo(vm->pt_root[tile->id]->bo, 0,
>> -                     XE_CACHE_WB);
>> +                     tile->xe->pat.idx[XE_CACHE_WB]);
>> }
>>
>> static struct dma_fence *
>> -- 
>> 2.41.0
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support
  2023-09-27 16:21 ` [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Souza, Jose
@ 2023-09-28  7:53   ` Matthew Auld
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Auld @ 2023-09-28  7:53 UTC (permalink / raw)
  To: Souza, Jose, intel-xe

On 27/09/2023 17:21, Souza, Jose wrote:
> On Wed, 2023-09-27 at 12:00 +0100, Matthew Auld wrote:
>> Branch available here:
>> https://gitlab.freedesktop.org/mwa/kernel/-/tree/xe-pat-index?ref_type=heads
>>
>> Series directly depends on the patches here:
>> https://patchwork.freedesktop.org/series/124225/
>>
>> Goal here is to allow userspace to directly control the pat_index when mapping
>> memory via the ppGTT, in addtion to the CPU caching mode. This is very much
>> needed on newer igpu platforms which allow incoherent GT access, where the
>> choice over the cache level and expected coherency is best left to userspace
>> depending on their usecase.  In the future there may also be other stuff encoded
>> in the pat_index, so giving userspace direct control will also be needed there.
>>
>> To support this we added new gem_create uAPI for selecting the CPU cache
>> mode to use for system memory, including the expected GPU coherency mode. There
>> are various restrictions here for the selected coherency mode and compatible CPU
>> cache modes.  With that in place the actual pat_index can now be provided as
>> part of vm_bind. The only restriction is that the coherency mode of the
>> pat_index must be at least as coherent as the gem_create coherency mode. There
>> are also some special cases like with userptr and dma-buf.
>>
>> v2:
>>    - Loads of improvements/tweaks. Main changes are to now allow
>>      gem_create.coh_mode <= coh_mode(pat_index), rather than it needing to match
>>      exactly. This simplifies the dma-buf policy from userspace pov. Also we now
>>      only consider COH_NONE and COH_AT_LEAST_1WAY.
>> v3:
>>    - Rebase. Split the pte_encode() refactoring, plus various smaller tweaks and
>>      fixes.
>> v4:
>>    - Rebase on Lucas' new series.
>>    - Drop UC cache mode.
>>    - s/smem_cpu_caching/cpu_caching/. Idea is to make VRAM WC explicit in the
>>      uapi, plus make it more future proof.
>>
> 
> Thanks for the smem_cpu_caching to cpu_caching change.
> 
> This latest version is causing a GuC fw load failure in MTL, I have bisected and it is caused by "drm/xe: directly use pat_index for pte_encode".

Ok, there are some fixes in the latest version around that patch, 
perhaps that resolves it.

> 
> [  173.995308] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[12] = 0x00000000
> [  173.995388] xe 0000:00:02.0: [drm:xe_guc_init [xe]] GuC param[13] = 0x00000000
> [  173.995467] xe 0000:00:02.0: [drm:xe_wopcm_init [xe]] WOPCM: 4096K
> [  173.995609] xe 0000:00:02.0: [drm:xe_wopcm_init [xe]] GuC WOPCM is already locked [2048K, 832K)
> [  174.234667] xe 0000:00:02.0: [drm] GuC load failed: status = 0x80007134
> [  174.234681] xe 0000:00:02.0: [drm] GuC load failed: status: Reset = 0, BootROM = 0x1A, UKernel = 0x71, MIA = 0x00, Auth = 0x02
> [  174.234690] xe 0000:00:02.0: [drm] 0xcabba9e6 0xdeadfeed 0x00000000 0x00000078
> [  174.234697] xe 0000:00:02.0: [drm] 0x00010000 0x00000000 0x0000fff0 0x00000000
> [  174.234703] xe 0000:00:02.0: [drm] 0x00000002 0xcabba9e6 0x8086dead 0x00000000
> [  174.234709] xe 0000:00:02.0: [drm] 0x00000000 0x00002000 0x00000000 0x00002000
> [  174.234714] xe 0000:00:02.0: [drm] 0x00000000 0x00000002 0xcabba9f6 0xbeeffeed
> [  174.234719] xe 0000:00:02.0: [drm] 0x00000000 0x00000000 0x00004000 0x00000000
> [  174.234724] xe 0000:00:02.0: [drm] 0x00004000 0x00000000 0x00000002 0x8086900d
> [  174.234730] xe 0000:00:02.0: [drm] 0x00010000 0x00000006 0x00010001 0x00460606
> [  174.234735] xe 0000:00:02.0: [drm] 0x00020001 0x00004050 0x00030001 0x00004b00
> [  174.234741] xe 0000:00:02.0: [drm] 0x00000000 0x00000000 0x00000000 0x00000000
> 
> 
> uAPI wise it need some renames to follow with the uAPI alignment series(https://patchwork.freedesktop.org/series/124271/) take a look at
> https://patchwork.freedesktop.org/patch/559576/?series=124271&rev=1

Ok, will align with that.

> https://patchwork.freedesktop.org/patch/559577/?series=124271&rev=1

I think we are only dealing with adding properties and not so much 
flags. i.e you can't apply them like flags: cpu_mapping = WC | WB.

> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-09-28  7:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-27 11:00 [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Matthew Auld
2023-09-27 11:00 ` [Intel-xe] [PATCH v4 1/5] drm/xe/pat: trim the xelp PAT table Matthew Auld
2023-09-27 11:00 ` [Intel-xe] [PATCH v4 2/5] drm/xe: directly use pat_index for pte_encode Matthew Auld
2023-09-28  4:41   ` Niranjana Vishwanathapura
2023-09-28  7:25     ` Matthew Auld
2023-09-27 11:00 ` [Intel-xe] [PATCH v4 3/5] drm/xe/uapi: Add support for cache and coherency mode Matthew Auld
2023-09-27 11:00 ` [Intel-xe] [PATCH v4 4/5] drm/xe/pat: annotate pat_index with " Matthew Auld
2023-09-27 11:00 ` [Intel-xe] [PATCH v4 5/5] drm/xe/uapi: support pat_index selection with vm_bind Matthew Auld
2023-09-27 11:31 ` [Intel-xe] ✗ CI.Patch_applied: failure for PAT and cache coherency support (rev5) Patchwork
2023-09-27 16:21 ` [Intel-xe] [PATCH v4 0/5] PAT and cache coherency support Souza, Jose
2023-09-28  7:53   ` Matthew Auld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.