All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT
@ 2023-09-25 22:10 Lucas De Marchi
  2023-09-25 22:10 ` [Intel-xe] [PATCH 1/9] drm/xe: Normalize pte/pde encoding Lucas De Marchi
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

This reworks how we set PAT bits in both PPGTT and GGTT with the kernel
users in mind only. Allowing to set PAT by userspace is being added by
https://lore.kernel.org/all/20230829162840.73444-7-matthew.auld@intel.com/T/.
This can be seen as a follow up to the WIP patch that series depends on,
https://patchwork.freedesktop.org/series/122708/

I'm using that patch as a starting point, but as noted in my review
comments there, I think the overall abstraction we had and the ones we
were adding were not optimal. This is my attempt to translate my review
comments into the abstraction I was thinking of.

Lightly tested on DG2.

Lucas De Marchi (9):
  drm/xe: Normalize pte/pde encoding
  drm/xe: Remove check for vma == NULL
  drm/xe: Use vfunc for pte/pde ppgtt encoding
  drm/xe/migrate: Do not hand-encode pte
  drm/xe: Use vfunc to initialize PAT
  drm/xe/pat: Keep track of relevant indexes
  drm/xe: Use pat_index to encode pte
  drm/xe: Use vfunc for ggtt pte encoding
  fixup! drm/xe/display: Implement display support

 drivers/gpu/drm/xe/display/xe_fb_pin.c |  32 +++++--
 drivers/gpu/drm/xe/tests/xe_migrate.c  |   2 +-
 drivers/gpu/drm/xe/xe_bo.h             |  11 ++-
 drivers/gpu/drm/xe/xe_device.c         |   3 +
 drivers/gpu/drm/xe/xe_device_types.h   |  15 +++
 drivers/gpu/drm/xe/xe_ggtt.c           |  53 ++++++++---
 drivers/gpu/drm/xe/xe_ggtt.h           |   1 -
 drivers/gpu/drm/xe/xe_ggtt_types.h     |   9 ++
 drivers/gpu/drm/xe/xe_migrate.c        |  34 ++++---
 drivers/gpu/drm/xe/xe_pat.c            |  68 +++++++++++---
 drivers/gpu/drm/xe/xe_pat.h            |  11 +++
 drivers/gpu/drm/xe/xe_pt.c             | 109 ++--------------------
 drivers/gpu/drm/xe/xe_pt.h             |   6 --
 drivers/gpu/drm/xe/xe_pt_types.h       |  19 ++++
 drivers/gpu/drm/xe/xe_vm.c             | 124 ++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm_types.h       |   2 +
 16 files changed, 337 insertions(+), 162 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 1/9] drm/xe: Normalize pte/pde encoding
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 22:30   ` Matt Roper
  2023-09-25 22:10 ` [Intel-xe] [PATCH 2/9] drm/xe: Remove check for vma == NULL Lucas De Marchi
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

Split functions that do only part of the pde/pte encoding and that can
be called by the different places. This normalizes how pde/pte are
encoded so they can be moved elsewhere in a subsequent change.

xe_pte_encode() was calling __pte_encode() with a NULL vma, which is the
opposite of what xe_pt_stage_bind_entry() does. Stop passing a NULL vma
and just split another function that deals with a vma rather than a bo.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 119 +++++++++++++++++++++----------------
 1 file changed, 67 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index aec469842471..afadd399684c 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -47,91 +47,106 @@ static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
 	return container_of(pt_dir->dir.entries[index], struct xe_pt, base);
 }
 
-/**
- * xe_pde_encode() - Encode a page-table directory entry pointing to
- * another page-table.
- * @bo: The page-table bo of the page-table to point to.
- * @bo_offset: Offset in the page-table bo to point to.
- * @level: The cache level indicating the caching of @bo.
- *
- * TODO: Rename.
- *
- * Return: An encoded page directory entry. No errors.
- */
-u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
-		  const enum xe_cache_level level)
+static u64 pde_encode_cache(enum xe_cache_level cache)
 {
-	u64 pde;
-
-	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
-	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
-
 	/* FIXME: I don't think the PPAT handling is correct for MTL */
 
-	if (level != XE_CACHE_NONE)
-		pde |= PPAT_CACHED_PDE;
-	else
-		pde |= PPAT_UNCACHED;
+	if (cache != XE_CACHE_NONE)
+		return PPAT_CACHED_PDE;
 
-	return pde;
+	return PPAT_UNCACHED;
 }
 
-static u64 __pte_encode(u64 pte, enum xe_cache_level cache,
-			struct xe_vma *vma, u32 pt_level)
+static u64 pte_encode_cache(enum xe_cache_level cache)
 {
-	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
-
-	if (unlikely(vma && xe_vma_read_only(vma)))
-		pte &= ~XE_PAGE_RW;
-
-	if (unlikely(vma && xe_vma_is_null(vma)))
-		pte |= XE_PTE_NULL;
-
 	/* FIXME: I don't think the PPAT handling is correct for MTL */
-
 	switch (cache) {
 	case XE_CACHE_NONE:
-		pte |= PPAT_UNCACHED;
-		break;
+		return PPAT_UNCACHED;
 	case XE_CACHE_WT:
-		pte |= PPAT_DISPLAY_ELLC;
-		break;
+		return PPAT_DISPLAY_ELLC;
 	default:
-		pte |= PPAT_CACHED;
-		break;
+		return PPAT_CACHED;
 	}
+}
+
+static u64 pte_encode_ps(u32 pt_level)
+{
+	/* XXX: Does hw support 1 GiB pages? */
+	XE_WARN_ON(pt_level > 2);
 
 	if (pt_level == 1)
-		pte |= XE_PDE_PS_2M;
+		return XE_PDE_PS_2M;
 	else if (pt_level == 2)
-		pte |= XE_PDPE_PS_1G;
+		return XE_PDPE_PS_1G;
 
-	/* XXX: Does hw support 1 GiB pages? */
-	XE_WARN_ON(pt_level > 2);
+	return 0;
+}
 
-	return pte;
+/**
+ * xe_pde_encode() - Encode a page-table directory entry pointing to
+ * another page-table.
+ * @bo: The page-table bo of the page-table to point to.
+ * @bo_offset: Offset in the page-table bo to point to.
+ * @cache: The cache level indicating the caching of @bo.
+ *
+ * TODO: Rename.
+ *
+ * Return: An encoded page directory entry. No errors.
+ */
+u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
+		  const enum xe_cache_level cache)
+{
+	u64 pde;
+
+	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
+	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
+	pde |= pde_encode_cache(cache);
+
+	return pde;
 }
 
 /**
  * xe_pte_encode() - Encode a page-table entry pointing to memory.
  * @bo: The BO representing the memory to point to.
- * @offset: The offset into @bo.
+ * @bo_offset: The offset into @bo.
  * @cache: The cache level indicating
  * @pt_level: The page-table level of the page-table into which the entry
  * is to be inserted.
  *
  * Return: An encoded page-table entry. No errors.
  */
-u64 xe_pte_encode(struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
+u64 xe_pte_encode(struct xe_bo *bo, u64 bo_offset, enum xe_cache_level cache,
 		  u32 pt_level)
 {
 	u64 pte;
 
-	pte = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
+	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
+	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
+	pte |= pte_encode_cache(cache);
+	pte |= pte_encode_ps(pt_level);
+
 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
 		pte |= XE_PPGTT_PTE_DM;
 
-	return __pte_encode(pte, cache, NULL, pt_level);
+	return pte;
+}
+
+/* Like xe_pte_encode(), but with a vma and a partially-encoded pte */
+static u64 __vma_pte_encode(u64 pte, struct xe_vma *vma,
+			    enum xe_cache_level cache, u32 pt_level)
+{
+	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
+	pte |= pte_encode_cache(cache);
+	pte |= pte_encode_ps(pt_level);
+
+	if (unlikely(vma && xe_vma_read_only(vma)))
+		pte &= ~XE_PAGE_RW;
+
+	if (unlikely(vma && xe_vma_is_null(vma)))
+		pte |= XE_PTE_NULL;
+
+	return pte;
 }
 
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
@@ -614,9 +629,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 
 		XE_WARN_ON(xe_walk->va_curs_start != addr);
 
-		pte = __pte_encode(is_null ? 0 :
-				   xe_res_dma(curs) + xe_walk->dma_offset,
-				   xe_walk->cache, xe_walk->vma, level);
+		pte = __vma_pte_encode(is_null ? 0 :
+				       xe_res_dma(curs) + xe_walk->dma_offset,
+				       xe_walk->vma, xe_walk->cache, level);
 		pte |= xe_walk->default_pte;
 
 		/*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 2/9] drm/xe: Remove check for vma == NULL
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
  2023-09-25 22:10 ` [Intel-xe] [PATCH 1/9] drm/xe: Normalize pte/pde encoding Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 22:31   ` Matt Roper
  2023-09-25 22:10 ` [Intel-xe] [PATCH 3/9] drm/xe: Use vfunc for pte/pde ppgtt encoding Lucas De Marchi
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

vma at this point can never be NULL as otherwise it would crash earlier
in the only caller, xe_pt_stage_bind_entry(). Remove the extra check and
avoid adding and removing the bits from the pte.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index afadd399684c..0b8a45609e83 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -136,14 +136,15 @@ u64 xe_pte_encode(struct xe_bo *bo, u64 bo_offset, enum xe_cache_level cache,
 static u64 __vma_pte_encode(u64 pte, struct xe_vma *vma,
 			    enum xe_cache_level cache, u32 pt_level)
 {
-	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
+	pte |= XE_PAGE_PRESENT;
+
+	if (likely(!xe_vma_read_only(vma)))
+		pte |= XE_PAGE_RW;
+
 	pte |= pte_encode_cache(cache);
 	pte |= pte_encode_ps(pt_level);
 
-	if (unlikely(vma && xe_vma_read_only(vma)))
-		pte &= ~XE_PAGE_RW;
-
-	if (unlikely(vma && xe_vma_is_null(vma)))
+	if (unlikely(xe_vma_is_null(vma)))
 		pte |= XE_PTE_NULL;
 
 	return pte;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 3/9] drm/xe: Use vfunc for pte/pde ppgtt encoding
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
  2023-09-25 22:10 ` [Intel-xe] [PATCH 1/9] drm/xe: Normalize pte/pde encoding Lucas De Marchi
  2023-09-25 22:10 ` [Intel-xe] [PATCH 2/9] drm/xe: Remove check for vma == NULL Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 22:46   ` Matt Roper
  2023-09-25 22:10 ` [Intel-xe] [PATCH 4/9] drm/xe/migrate: Do not hand-encode pte Lucas De Marchi
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

Move the function to encode pte/pde to be vfuncs inside struct xe_vm.
This will allow to easily extend to platforms that don't have a
compatible encoding.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/tests/xe_migrate.c |   2 +-
 drivers/gpu/drm/xe/xe_migrate.c       |  18 ++--
 drivers/gpu/drm/xe/xe_pt.c            | 125 +++-----------------------
 drivers/gpu/drm/xe/xe_pt.h            |   6 --
 drivers/gpu/drm/xe/xe_pt_types.h      |  14 +++
 drivers/gpu/drm/xe/xe_vm.c            |  93 ++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm_types.h      |   2 +
 7 files changed, 128 insertions(+), 132 deletions(-)

diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
index f58cd1da1a34..ffb79e0b4981 100644
--- a/drivers/gpu/drm/xe/tests/xe_migrate.c
+++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
@@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
 	/* First part of the test, are we updating our pagetable bo with a new entry? */
 	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
 		  0xdeaddeadbeefbeef);
-	expected = xe_pte_encode(pt, 0, XE_CACHE_WB, 0);
+	expected = m->q->vm->pte_encode(pt, 0, XE_CACHE_WB, 0);
 	if (m->q->vm->flags & XE_VM_FLAG_64K)
 		expected |= XE_PTE_PS64;
 	if (xe_bo_is_vram(pt))
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index 9438f609d18b..aa0396330903 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -189,14 +189,15 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		return ret;
 	}
 
-	entry = xe_pde_encode(bo, bo->size - XE_PAGE_SIZE, XE_CACHE_WB);
+	entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, XE_CACHE_WB);
 	xe_pt_write(xe, &vm->pt_root[id]->bo->vmap, 0, entry);
 
 	map_ofs = (num_entries - num_level) * XE_PAGE_SIZE;
 
 	/* Map the entire BO in our level 0 pt */
 	for (i = 0, level = 0; i < num_entries; level++) {
-		entry = xe_pte_encode(bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
+		entry = vm->pt_ops->pte_encode_bo(bo, i * XE_PAGE_SIZE,
+						  XE_CACHE_WB, 0);
 
 		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
 
@@ -214,7 +215,8 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		for (i = 0; i < batch->size;
 		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
 		     XE_PAGE_SIZE) {
-			entry = xe_pte_encode(batch, i, XE_CACHE_WB, 0);
+			entry = vm->pt_ops->pte_encode_bo(batch, i,
+							  XE_CACHE_WB, 0);
 
 			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
 				  entry);
@@ -238,16 +240,16 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 		if (vm->flags & XE_VM_FLAG_64K && level == 1)
 			flags = XE_PDE_64K;
 
-		entry = xe_pde_encode(bo, map_ofs + (level - 1) *
-					XE_PAGE_SIZE, XE_CACHE_WB);
+		entry = vm->pt_ops->pde_encode_bo(bo, map_ofs + (level - 1) *
+						  XE_PAGE_SIZE, XE_CACHE_WB);
 		xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE * level, u64,
 			  entry | flags);
 	}
 
 	/* Write PDE's that point to our BO. */
 	for (i = 0; i < num_entries - num_level; i++) {
-		entry = xe_pde_encode(bo, i * XE_PAGE_SIZE,
-				      XE_CACHE_WB);
+		entry = vm->pt_ops->pde_encode_bo(bo, i * XE_PAGE_SIZE,
+						  XE_CACHE_WB);
 
 		xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE +
 			  (i + 1) * 8, u64, entry);
@@ -1255,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
 
 			xe_tile_assert(tile, pt_bo->size == SZ_4K);
 
-			addr = xe_pte_encode(pt_bo, 0, XE_CACHE_WB, 0);
+			addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, XE_CACHE_WB, 0);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 		}
diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 0b8a45609e83..4d4c6a4c305e 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -47,109 +47,6 @@ static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
 	return container_of(pt_dir->dir.entries[index], struct xe_pt, base);
 }
 
-static u64 pde_encode_cache(enum xe_cache_level cache)
-{
-	/* FIXME: I don't think the PPAT handling is correct for MTL */
-
-	if (cache != XE_CACHE_NONE)
-		return PPAT_CACHED_PDE;
-
-	return PPAT_UNCACHED;
-}
-
-static u64 pte_encode_cache(enum xe_cache_level cache)
-{
-	/* FIXME: I don't think the PPAT handling is correct for MTL */
-	switch (cache) {
-	case XE_CACHE_NONE:
-		return PPAT_UNCACHED;
-	case XE_CACHE_WT:
-		return PPAT_DISPLAY_ELLC;
-	default:
-		return PPAT_CACHED;
-	}
-}
-
-static u64 pte_encode_ps(u32 pt_level)
-{
-	/* XXX: Does hw support 1 GiB pages? */
-	XE_WARN_ON(pt_level > 2);
-
-	if (pt_level == 1)
-		return XE_PDE_PS_2M;
-	else if (pt_level == 2)
-		return XE_PDPE_PS_1G;
-
-	return 0;
-}
-
-/**
- * xe_pde_encode() - Encode a page-table directory entry pointing to
- * another page-table.
- * @bo: The page-table bo of the page-table to point to.
- * @bo_offset: Offset in the page-table bo to point to.
- * @cache: The cache level indicating the caching of @bo.
- *
- * TODO: Rename.
- *
- * Return: An encoded page directory entry. No errors.
- */
-u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
-		  const enum xe_cache_level cache)
-{
-	u64 pde;
-
-	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
-	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
-	pde |= pde_encode_cache(cache);
-
-	return pde;
-}
-
-/**
- * xe_pte_encode() - Encode a page-table entry pointing to memory.
- * @bo: The BO representing the memory to point to.
- * @bo_offset: The offset into @bo.
- * @cache: The cache level indicating
- * @pt_level: The page-table level of the page-table into which the entry
- * is to be inserted.
- *
- * Return: An encoded page-table entry. No errors.
- */
-u64 xe_pte_encode(struct xe_bo *bo, u64 bo_offset, enum xe_cache_level cache,
-		  u32 pt_level)
-{
-	u64 pte;
-
-	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
-	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
-	pte |= pte_encode_cache(cache);
-	pte |= pte_encode_ps(pt_level);
-
-	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
-		pte |= XE_PPGTT_PTE_DM;
-
-	return pte;
-}
-
-/* Like xe_pte_encode(), but with a vma and a partially-encoded pte */
-static u64 __vma_pte_encode(u64 pte, struct xe_vma *vma,
-			    enum xe_cache_level cache, u32 pt_level)
-{
-	pte |= XE_PAGE_PRESENT;
-
-	if (likely(!xe_vma_read_only(vma)))
-		pte |= XE_PAGE_RW;
-
-	pte |= pte_encode_cache(cache);
-	pte |= pte_encode_ps(pt_level);
-
-	if (unlikely(xe_vma_is_null(vma)))
-		pte |= XE_PTE_NULL;
-
-	return pte;
-}
-
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
 			     unsigned int level)
 {
@@ -158,15 +55,11 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
 	if (!vm->scratch_bo[id])
 		return 0;
 
-	if (level == 0) {
-		u64 empty = xe_pte_encode(vm->scratch_bo[id], 0,
-					  XE_CACHE_WB, 0);
+	if (level > 0)
+		return vm->pt_ops->pde_encode_bo(vm->scratch_pt[id][level - 1]->bo,
+						 0, XE_CACHE_WB);
 
-		return empty;
-	} else {
-		return xe_pde_encode(vm->scratch_pt[id][level - 1]->bo, 0,
-				     XE_CACHE_WB);
-	}
+	return vm->pt_ops->pte_encode_bo(vm->scratch_bo[id], 0, XE_CACHE_WB, 0);
 }
 
 /**
@@ -618,6 +511,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 	struct xe_pt_stage_bind_walk *xe_walk =
 		container_of(walk, typeof(*xe_walk), base);
 	struct xe_pt *xe_parent = container_of(parent, typeof(*xe_parent), base);
+	struct xe_vm *vm = xe_walk->vm;
 	struct xe_pt *xe_child;
 	bool covers;
 	int ret = 0;
@@ -630,9 +524,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 
 		XE_WARN_ON(xe_walk->va_curs_start != addr);
 
-		pte = __vma_pte_encode(is_null ? 0 :
-				       xe_res_dma(curs) + xe_walk->dma_offset,
-				       xe_walk->vma, xe_walk->cache, level);
+		pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
+						 xe_res_dma(curs) + xe_walk->dma_offset,
+						 xe_walk->vma, xe_walk->cache, level);
 		pte |= xe_walk->default_pte;
 
 		/*
@@ -697,7 +591,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
 			xe_child->is_compact = true;
 		}
 
-		pte = xe_pde_encode(xe_child->bo, 0, xe_walk->cache) | flags;
+		pte = vm->pt_ops->pde_encode_bo(xe_child->bo, 0,
+						xe_walk->cache) | flags;
 		ret = xe_pt_insert_entry(xe_walk, xe_parent, offset, xe_child,
 					 pte);
 	}
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 01be7ab08f87..d5460e58dbbf 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -45,10 +45,4 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 
 bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
 
-u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
-		  const enum xe_cache_level level);
-
-u64 xe_pte_encode(struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
-		  u32 pt_level);
-
 #endif
diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
index 2ed64c0a4485..c58f6926fabf 100644
--- a/drivers/gpu/drm/xe/xe_pt_types.h
+++ b/drivers/gpu/drm/xe/xe_pt_types.h
@@ -6,8 +6,13 @@
 #ifndef _XE_PT_TYPES_H_
 #define _XE_PT_TYPES_H_
 
+#include <linux/types.h>
+
 #include "xe_pt_walk.h"
 
+struct xe_bo;
+struct xe_vma;
+
 enum xe_cache_level {
 	XE_CACHE_NONE,
 	XE_CACHE_WT,
@@ -29,6 +34,15 @@ struct xe_pt {
 #endif
 };
 
+struct xe_pt_ops {
+	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
+			     enum xe_cache_level cache, u32 pt_level);
+	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
+			      enum xe_cache_level cache, u32 pt_level);
+	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
+			     const enum xe_cache_level cache);
+};
+
 struct xe_pt_entry {
 	struct xe_pt *pt;
 	u64 pte;
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 861d050871bb..2e1b4d46d9ea 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1191,6 +1191,93 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
 	.op_alloc = xe_vm_op_alloc,
 };
 
+static u64 pde_encode_cache(enum xe_cache_level cache)
+{
+	/* FIXME: I don't think the PPAT handling is correct for MTL */
+
+	if (cache != XE_CACHE_NONE)
+		return PPAT_CACHED_PDE;
+
+	return PPAT_UNCACHED;
+}
+
+static u64 pte_encode_cache(enum xe_cache_level cache)
+{
+	/* FIXME: I don't think the PPAT handling is correct for MTL */
+	switch (cache) {
+	case XE_CACHE_NONE:
+		return PPAT_UNCACHED;
+	case XE_CACHE_WT:
+		return PPAT_DISPLAY_ELLC;
+	default:
+		return PPAT_CACHED;
+	}
+}
+
+static u64 pte_encode_ps(u32 pt_level)
+{
+	/* XXX: Does hw support 1 GiB pages? */
+	XE_WARN_ON(pt_level > 2);
+
+	if (pt_level == 1)
+		return XE_PDE_PS_2M;
+	else if (pt_level == 2)
+		return XE_PDPE_PS_1G;
+
+	return 0;
+}
+
+static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
+			      const enum xe_cache_level cache)
+{
+	u64 pde;
+
+	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
+	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
+	pde |= pde_encode_cache(cache);
+
+	return pde;
+}
+
+static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
+			      enum xe_cache_level cache, u32 pt_level)
+{
+	u64 pte;
+
+	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
+	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
+	pte |= pte_encode_cache(cache);
+	pte |= pte_encode_ps(pt_level);
+
+	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
+		pte |= XE_PPGTT_PTE_DM;
+
+	return pte;
+}
+
+static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
+			       enum xe_cache_level cache, u32 pt_level)
+{
+	pte |= XE_PAGE_PRESENT;
+
+	if (likely(!xe_vma_read_only(vma)))
+		pte |= XE_PAGE_RW;
+
+	pte |= pte_encode_cache(cache);
+	pte |= pte_encode_ps(pt_level);
+
+	if (unlikely(xe_vma_is_null(vma)))
+		pte |= XE_PTE_NULL;
+
+	return pte;
+}
+
+static const struct xe_pt_ops xelp_pt_ops = {
+	.pte_encode_bo = xelp_pte_encode_bo,
+	.pte_encode_vma = xelp_pte_encode_vma,
+	.pde_encode_bo = xelp_pde_encode_bo,
+};
+
 static void xe_vma_op_work_func(struct work_struct *w);
 static void vm_destroy_work_func(struct work_struct *w);
 
@@ -1239,6 +1326,8 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 
 	INIT_LIST_HEAD(&vm->extobj.list);
 
+	vm->pt_ops = &xelp_pt_ops;
+
 	if (!(flags & XE_VM_FLAG_MIGRATION))
 		xe_device_mem_access_get(xe);
 
@@ -1574,8 +1663,8 @@ struct xe_vm *xe_vm_lookup(struct xe_file *xef, u32 id)
 
 u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile)
 {
-	return xe_pde_encode(vm->pt_root[tile->id]->bo, 0,
-			     XE_CACHE_WB);
+	return vm->pt_ops->pde_encode_bo(vm->pt_root[tile->id]->bo, 0,
+					 XE_CACHE_WB);
 }
 
 static struct dma_fence *
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index af2ba4acf1f9..1c5553b842d7 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -249,6 +249,8 @@ struct xe_vm {
 		bool munmap_rebind_inflight;
 	} async_ops;
 
+	const struct xe_pt_ops *pt_ops;
+
 	/** @userptr: user pointer state */
 	struct {
 		/**
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 4/9] drm/xe/migrate: Do not hand-encode pte
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (2 preceding siblings ...)
  2023-09-25 22:10 ` [Intel-xe] [PATCH 3/9] drm/xe: Use vfunc for pte/pde ppgtt encoding Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 22:55   ` Matt Roper
  2023-09-25 22:10 ` [Intel-xe] [PATCH 5/9] drm/xe: Use vfunc to initialize PAT Lucas De Marchi
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

Instead of encoding the pte, call a new vfunc from xe_vm to handle that.
The encoding may not be the same on every platform, so keeping it in one
place helps to better support them.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c  | 14 ++++++++------
 drivers/gpu/drm/xe/xe_pt_types.h |  2 ++
 drivers/gpu/drm/xe/xe_vm.c       | 23 ++++++++++++++++++++++-
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index aa0396330903..f22356d69ae3 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -261,8 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 
 		level = 2;
 		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
-		flags = XE_PAGE_RW | XE_PAGE_PRESENT | PPAT_CACHED |
-			XE_PPGTT_PTE_DM | XE_PDPE_PS_1G;
+		flags = vm->pt_ops->pte_encode_addr(0, XE_CACHE_WB, level, true, 0);
 
 		/*
 		 * Use 1GB pages, it shouldn't matter the physical amount of
@@ -483,7 +482,8 @@ static void emit_pte(struct xe_migrate *m,
 		ptes -= chunk;
 
 		while (chunk--) {
-			u64 addr;
+			u64 addr, flags = 0;
+			bool devmem = false;
 
 			addr = xe_res_dma(cur) & PAGE_MASK;
 			if (is_vram) {
@@ -491,13 +491,15 @@ static void emit_pte(struct xe_migrate *m,
 				if ((m->q->vm->flags & XE_VM_FLAG_64K) &&
 				    !(cur_ofs & (16 * 8 - 1))) {
 					xe_tile_assert(m->tile, IS_ALIGNED(addr, SZ_64K));
-					addr |= XE_PTE_PS64;
+					flags |= XE_PTE_PS64;
 				}
 
 				addr += vram_region_gpu_offset(bo->ttm.resource);
-				addr |= XE_PPGTT_PTE_DM;
+				devmem = true;
 			}
-			addr |= PPAT_CACHED | XE_PAGE_PRESENT | XE_PAGE_RW;
+
+			addr |= m->q->vm->pt_ops->pte_encode_addr(addr, XE_CACHE_WB,
+								  0, devmem, flags);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
 
diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
index c58f6926fabf..64e3921a0f46 100644
--- a/drivers/gpu/drm/xe/xe_pt_types.h
+++ b/drivers/gpu/drm/xe/xe_pt_types.h
@@ -39,6 +39,8 @@ struct xe_pt_ops {
 			     enum xe_cache_level cache, u32 pt_level);
 	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
 			      enum xe_cache_level cache, u32 pt_level);
+	u64 (*pte_encode_addr)(u64 addr, enum xe_cache_level cache,
+			       u32 pt_level, bool devmem, u64 flags);
 	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
 			     const enum xe_cache_level cache);
 };
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 2e1b4d46d9ea..23452b98d853 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1216,7 +1216,6 @@ static u64 pte_encode_cache(enum xe_cache_level cache)
 
 static u64 pte_encode_ps(u32 pt_level)
 {
-	/* XXX: Does hw support 1 GiB pages? */
 	XE_WARN_ON(pt_level > 2);
 
 	if (pt_level == 1)
@@ -1272,9 +1271,31 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 	return pte;
 }
 
+static u64 xelp_pte_encode_addr(u64 addr, enum xe_cache_level cache,
+				u32 pt_level, bool devmem, u64 flags)
+{
+	u64 pte;
+
+	/* Avoid passing random bits directly as flags */
+	XE_WARN_ON(flags & ~XE_PTE_PS64);
+
+	pte = addr;
+	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
+	pte |= pte_encode_cache(cache);
+	pte |= pte_encode_ps(pt_level);
+
+	if (devmem)
+		pte |= XE_PPGTT_PTE_DM;
+
+	pte |= flags;
+
+	return pte;
+}
+
 static const struct xe_pt_ops xelp_pt_ops = {
 	.pte_encode_bo = xelp_pte_encode_bo,
 	.pte_encode_vma = xelp_pte_encode_vma,
+	.pte_encode_addr = xelp_pte_encode_addr,
 	.pde_encode_bo = xelp_pde_encode_bo,
 };
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 5/9] drm/xe: Use vfunc to initialize PAT
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (3 preceding siblings ...)
  2023-09-25 22:10 ` [Intel-xe] [PATCH 4/9] drm/xe/migrate: Do not hand-encode pte Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 23:08   ` Matt Roper
  2023-09-25 22:10 ` [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes Lucas De Marchi
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

Split the PAT initialization between SW-only and HW. The _early() only
sets up the ops and data structure that are used later to program the
tables. This allows the PAT to be easily extended to other platforms.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_device.c       |  3 ++
 drivers/gpu/drm/xe/xe_device_types.h | 13 ++++++
 drivers/gpu/drm/xe/xe_pat.c          | 59 +++++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_pat.h          | 11 ++++++
 4 files changed, 72 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index 687dc3d79a66..046b7a087451 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -26,6 +26,7 @@
 #include "xe_irq.h"
 #include "xe_mmio.h"
 #include "xe_module.h"
+#include "xe_pat.h"
 #include "xe_pcode.h"
 #include "xe_pm.h"
 #include "xe_query.h"
@@ -276,6 +277,8 @@ int xe_device_probe(struct xe_device *xe)
 	int err;
 	u8 id;
 
+	xe_pat_init_early(xe);
+
 	xe->info.mem_region_mask = 1;
 	err = xe_display_init_nommio(xe);
 	if (err)
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 32ab0fea04ee..98835ee058f5 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -25,6 +25,7 @@
 #endif
 
 struct xe_ggtt;
+struct xe_pat_ops;
 
 #define XE_BO_INVALID_OFFSET	LONG_MAX
 
@@ -331,6 +332,18 @@ struct xe_device {
 		atomic_t ref;
 	} mem_access;
 
+	/**
+	 * @pat: Encapsulate PAT related stuff
+	 */
+	struct {
+		/** Internal operations to abstract platforms */
+		const struct xe_pat_ops *ops;
+		/** PAT table to program in the HW */
+		const u32 *table;
+		/** Number of PAT entries */
+		int n_entries;
+	} pat;
+
 	/** @d3cold: Encapsulate d3cold related stuff */
 	struct {
 		/** capable: Indicates if root port is d3cold capable */
diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index 71e0e047fff3..86386633e206 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -32,6 +32,11 @@
 #define TGL_PAT_WC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 1)
 #define TGL_PAT_UC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 0)
 
+struct xe_pat_ops {
+	void (*program_graphics)(struct xe_gt *gt, const u32 table[], int n_entries);
+	void (*program_media)(struct xe_gt *gt, const u32 table[], int n_entries);
+};
+
 static const u32 tgl_pat_table[] = {
 	[0] = TGL_PAT_WB,
 	[1] = TGL_PAT_WC,
@@ -80,24 +85,37 @@ static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
 	}
 }
 
-void xe_pat_init(struct xe_gt *gt)
-{
-	struct xe_device *xe = gt_to_xe(gt);
+static const struct xe_pat_ops tgl_pat_ops = {
+	.program_graphics = program_pat,
+};
+
+static const struct xe_pat_ops pvc_pat_ops = {
+	.program_graphics = program_pat_mcr,
+};
+
+/*
+ * SAMedia register offsets are adjusted by the write methods and they target
+ * registers that are not MCR, while for normal GT they are MCR
+ */
+static const struct xe_pat_ops mtl_pat_ops = {
+	.program_graphics = program_pat,
+	.program_media = program_pat_mcr,
+};
 
+void xe_pat_init_early(struct xe_device *xe)
+{
 	if (xe->info.platform == XE_METEORLAKE) {
-		/*
-		 * SAMedia register offsets are adjusted by the write methods
-		 * and they target registers that are not MCR, while for normal
-		 * GT they are MCR
-		 */
-		if (xe_gt_is_media_type(gt))
-			program_pat(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
-		else
-			program_pat_mcr(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
+		xe->pat.ops = &mtl_pat_ops;
+		xe->pat.table = mtl_pat_table;
+		xe->pat.n_entries = ARRAY_SIZE(mtl_pat_table);
 	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
-		program_pat_mcr(gt, pvc_pat_table, ARRAY_SIZE(pvc_pat_table));
+		xe->pat.ops = &pvc_pat_ops;
+		xe->pat.table = pvc_pat_table;
+		xe->pat.n_entries = ARRAY_SIZE(pvc_pat_table);
 	} else if (GRAPHICS_VERx100(xe) <= 1210) {
-		program_pat(gt, tgl_pat_table, ARRAY_SIZE(tgl_pat_table));
+		xe->pat.ops = &tgl_pat_ops;
+		xe->pat.table = tgl_pat_table;
+		xe->pat.n_entries = ARRAY_SIZE(tgl_pat_table);
 	} else {
 		/*
 		 * Going forward we expect to need new PAT settings for most
@@ -111,3 +129,16 @@ void xe_pat_init(struct xe_gt *gt)
 			GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100);
 	}
 }
+
+void xe_pat_init(struct xe_gt *gt)
+{
+	struct xe_device *xe = gt_to_xe(gt);
+
+	if (!xe->pat.ops)
+		return;
+
+	if (xe_gt_is_media_type(gt))
+		xe->pat.ops->program_media(gt, xe->pat.table, xe->pat.n_entries);
+	else
+		xe->pat.ops->program_graphics(gt, xe->pat.table, xe->pat.n_entries);
+}
diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
index 659de4008131..744318cab69b 100644
--- a/drivers/gpu/drm/xe/xe_pat.h
+++ b/drivers/gpu/drm/xe/xe_pat.h
@@ -7,7 +7,18 @@
 #define _XE_PAT_H_
 
 struct xe_gt;
+struct xe_device;
 
+/**
+ * xe_pat_init_early - SW initialization, setting up data based on device
+ * @xe: xe device
+ */
+void xe_pat_init_early(struct xe_device *xe);
+
+/**
+ * xe_pat_init_early - Program HW PAT table
+ * @gt: GT structure
+ */
 void xe_pat_init(struct xe_gt *gt);
 
 #endif
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (4 preceding siblings ...)
  2023-09-25 22:10 ` [Intel-xe] [PATCH 5/9] drm/xe: Use vfunc to initialize PAT Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 23:14   ` Matt Roper
  2023-09-25 22:10 ` [Intel-xe] [PATCH 7/9] drm/xe: Use pat_index to encode pte Lucas De Marchi
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

Some of the PAT entries are relevant for own driver use, which varies
per platform. Let the PAT early initialization set what they should
point to so the rest of the driver can use them where needed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h | 2 ++
 drivers/gpu/drm/xe/xe_pat.c          | 9 +++++++++
 drivers/gpu/drm/xe/xe_pt_types.h     | 1 +
 3 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 98835ee058f5..7d0f2109c23a 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -15,6 +15,7 @@
 #include "xe_devcoredump_types.h"
 #include "xe_gt_types.h"
 #include "xe_platform_types.h"
+#include "xe_pt_types.h"
 #include "xe_pmu.h"
 #include "xe_step_types.h"
 
@@ -342,6 +343,7 @@ struct xe_device {
 		const u32 *table;
 		/** Number of PAT entries */
 		int n_entries;
+		u32 idx[__XE_CACHE_LEVEL_COUNT];
 	} pat;
 
 	/** @d3cold: Encapsulate d3cold related stuff */
diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
index 86386633e206..966b63252735 100644
--- a/drivers/gpu/drm/xe/xe_pat.c
+++ b/drivers/gpu/drm/xe/xe_pat.c
@@ -108,14 +108,23 @@ void xe_pat_init_early(struct xe_device *xe)
 		xe->pat.ops = &mtl_pat_ops;
 		xe->pat.table = mtl_pat_table;
 		xe->pat.n_entries = ARRAY_SIZE(mtl_pat_table);
+		xe->pat.idx[XE_CACHE_NONE] = 2;
+		xe->pat.idx[XE_CACHE_WT] = 1;
+		xe->pat.idx[XE_CACHE_WB] = 3;
 	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
 		xe->pat.ops = &pvc_pat_ops;
 		xe->pat.table = pvc_pat_table;
 		xe->pat.n_entries = ARRAY_SIZE(pvc_pat_table);
+		xe->pat.idx[XE_CACHE_NONE] = 0;
+		xe->pat.idx[XE_CACHE_WT] = 2;
+		xe->pat.idx[XE_CACHE_WB] = 3;
 	} else if (GRAPHICS_VERx100(xe) <= 1210) {
 		xe->pat.ops = &tgl_pat_ops;
 		xe->pat.table = tgl_pat_table;
 		xe->pat.n_entries = ARRAY_SIZE(tgl_pat_table);
+		xe->pat.idx[XE_CACHE_NONE] = 3;
+		xe->pat.idx[XE_CACHE_WT] = 2;
+		xe->pat.idx[XE_CACHE_WB] = 0;
 	} else {
 		/*
 		 * Going forward we expect to need new PAT settings for most
diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
index 64e3921a0f46..bf5000499251 100644
--- a/drivers/gpu/drm/xe/xe_pt_types.h
+++ b/drivers/gpu/drm/xe/xe_pt_types.h
@@ -17,6 +17,7 @@ enum xe_cache_level {
 	XE_CACHE_NONE,
 	XE_CACHE_WT,
 	XE_CACHE_WB,
+	__XE_CACHE_LEVEL_COUNT,
 };
 
 #define XE_VM_MAX_LEVEL 4
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 7/9] drm/xe: Use pat_index to encode pte
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (5 preceding siblings ...)
  2023-09-25 22:10 ` [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 23:27   ` Matt Roper
  2023-09-25 22:10 ` [Intel-xe] [PATCH 8/9] drm/xe: Use vfunc for ggtt pte encoding Lucas De Marchi
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

Change the xelp_pte_encode() function to use the platform-dependent
pat_index.  The same function can be used for all platforms as they only
need to encode the pat_index bits in the same pte layout. For platforms
that don't have the most significant bit, as long as they don't return a
bogus index they should be fine.

To maintain consistency, also add xe argument to pde_encode_cache().
This will be needed for Xe2 later.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_bo.h       | 11 ++++---
 drivers/gpu/drm/xe/xe_migrate.c  |  6 ++--
 drivers/gpu/drm/xe/xe_pt_types.h |  4 ++-
 drivers/gpu/drm/xe/xe_vm.c       | 50 +++++++++++++++++++-------------
 4 files changed, 44 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
index e3c90d45e723..885412581158 100644
--- a/drivers/gpu/drm/xe/xe_bo.h
+++ b/drivers/gpu/drm/xe/xe_bo.h
@@ -39,10 +39,13 @@
 #define XE_BO_INTERNAL_TEST		BIT(30)
 #define XE_BO_INTERNAL_64K		BIT(31)
 
-#define PPAT_UNCACHED                   GENMASK_ULL(4, 3)
-#define PPAT_CACHED_PDE                 0
-#define PPAT_CACHED                     BIT_ULL(7)
-#define PPAT_DISPLAY_ELLC               BIT_ULL(4)
+#define XE_PPGTT_PDE_UNCACHED		GENMASK_ULL(4, 3)
+#define XE_PPGTT_PDE_CACHED		0
+
+#define XELPG_PPGTT_PTE_PAT3		BIT_ULL(62)
+#define XE_PPGTT_PTE_PAT2		BIT_ULL(7)
+#define XE_PPGTT_PTE_PAT1		BIT_ULL(4)
+#define XE_PPGTT_PTE_PAT0		BIT_ULL(3)
 
 #define XE_PTE_SHIFT			12
 #define XE_PAGE_SIZE			(1 << XE_PTE_SHIFT)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index f22356d69ae3..11dc3b0e0350 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -261,7 +261,8 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
 
 		level = 2;
 		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
-		flags = vm->pt_ops->pte_encode_addr(0, XE_CACHE_WB, level, true, 0);
+		flags = vm->pt_ops->pte_encode_addr(xe, 0, XE_CACHE_WB, level,
+						    true, 0);
 
 		/*
 		 * Use 1GB pages, it shouldn't matter the physical amount of
@@ -498,7 +499,8 @@ static void emit_pte(struct xe_migrate *m,
 				devmem = true;
 			}
 
-			addr |= m->q->vm->pt_ops->pte_encode_addr(addr, XE_CACHE_WB,
+			addr |= m->q->vm->pt_ops->pte_encode_addr(m->tile->xe,
+								  addr, XE_CACHE_WB,
 								  0, devmem, flags);
 			bb->cs[bb->len++] = lower_32_bits(addr);
 			bb->cs[bb->len++] = upper_32_bits(addr);
diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
index bf5000499251..bd6645295fe6 100644
--- a/drivers/gpu/drm/xe/xe_pt_types.h
+++ b/drivers/gpu/drm/xe/xe_pt_types.h
@@ -11,6 +11,7 @@
 #include "xe_pt_walk.h"
 
 struct xe_bo;
+struct xe_device;
 struct xe_vma;
 
 enum xe_cache_level {
@@ -40,7 +41,8 @@ struct xe_pt_ops {
 			     enum xe_cache_level cache, u32 pt_level);
 	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
 			      enum xe_cache_level cache, u32 pt_level);
-	u64 (*pte_encode_addr)(u64 addr, enum xe_cache_level cache,
+	u64 (*pte_encode_addr)(struct xe_device *xe, u64 addr,
+			       enum xe_cache_level cache,
 			       u32 pt_level, bool devmem, u64 flags);
 	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
 			     const enum xe_cache_level cache);
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 23452b98d853..ab9cf1de566a 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1191,27 +1191,32 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
 	.op_alloc = xe_vm_op_alloc,
 };
 
-static u64 pde_encode_cache(enum xe_cache_level cache)
+static u64 pde_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
 {
-	/* FIXME: I don't think the PPAT handling is correct for MTL */
-
 	if (cache != XE_CACHE_NONE)
-		return PPAT_CACHED_PDE;
+		return XE_PPGTT_PDE_CACHED;
 
-	return PPAT_UNCACHED;
+	return XE_PPGTT_PDE_UNCACHED;
 }
 
-static u64 pte_encode_cache(enum xe_cache_level cache)
+static u64 pte_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
 {
-	/* FIXME: I don't think the PPAT handling is correct for MTL */
-	switch (cache) {
-	case XE_CACHE_NONE:
-		return PPAT_UNCACHED;
-	case XE_CACHE_WT:
-		return PPAT_DISPLAY_ELLC;
-	default:
-		return PPAT_CACHED;
-	}
+	u32 pat_index = xe->pat.idx[cache];
+	u64 pte = 0;
+
+	if (pat_index & BIT(0))
+		pte |= XE_PPGTT_PTE_PAT0;
+
+	if (pat_index & BIT(1))
+		pte |= XE_PPGTT_PTE_PAT1;
+
+	if (pat_index & BIT(2))
+		pte |= XE_PPGTT_PTE_PAT2;
+
+	if (pat_index & BIT(3))
+		pte |= XELPG_PPGTT_PTE_PAT3;
+
+	return pte;
 }
 
 static u64 pte_encode_ps(u32 pt_level)
@@ -1229,11 +1234,12 @@ static u64 pte_encode_ps(u32 pt_level)
 static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
 			      const enum xe_cache_level cache)
 {
+	struct xe_device *xe = xe_bo_device(bo);
 	u64 pde;
 
 	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
 	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
-	pde |= pde_encode_cache(cache);
+	pde |= pde_encode_cache(xe, cache);
 
 	return pde;
 }
@@ -1241,11 +1247,12 @@ static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
 static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
 			      enum xe_cache_level cache, u32 pt_level)
 {
+	struct xe_device *xe = xe_bo_device(bo);
 	u64 pte;
 
 	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
 	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
-	pte |= pte_encode_cache(cache);
+	pte |= pte_encode_cache(xe, cache);
 	pte |= pte_encode_ps(pt_level);
 
 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
@@ -1257,12 +1264,14 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
 static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 			       enum xe_cache_level cache, u32 pt_level)
 {
+	struct xe_device *xe = xe_vma_vm(vma)->xe;
+
 	pte |= XE_PAGE_PRESENT;
 
 	if (likely(!xe_vma_read_only(vma)))
 		pte |= XE_PAGE_RW;
 
-	pte |= pte_encode_cache(cache);
+	pte |= pte_encode_cache(xe, cache);
 	pte |= pte_encode_ps(pt_level);
 
 	if (unlikely(xe_vma_is_null(vma)))
@@ -1271,7 +1280,8 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
 	return pte;
 }
 
-static u64 xelp_pte_encode_addr(u64 addr, enum xe_cache_level cache,
+static u64 xelp_pte_encode_addr(struct xe_device *xe, u64 addr,
+				enum xe_cache_level cache,
 				u32 pt_level, bool devmem, u64 flags)
 {
 	u64 pte;
@@ -1281,7 +1291,7 @@ static u64 xelp_pte_encode_addr(u64 addr, enum xe_cache_level cache,
 
 	pte = addr;
 	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
-	pte |= pte_encode_cache(cache);
+	pte |= pte_encode_cache(xe, cache);
 	pte |= pte_encode_ps(pt_level);
 
 	if (devmem)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 8/9] drm/xe: Use vfunc for ggtt pte encoding
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (6 preceding siblings ...)
  2023-09-25 22:10 ` [Intel-xe] [PATCH 7/9] drm/xe: Use pat_index to encode pte Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 23:37   ` Matt Roper
  2023-09-25 22:10 ` [Intel-xe] [PATCH 9/9] fixup! drm/xe/display: Implement display support Lucas De Marchi
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

Use 2 different functions for encoding the ggtt's pte, assigning them
during initialization. Main difference is that before Xe-LPG, the pte
didn't have the cache bits.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/xe_ggtt.c       | 53 +++++++++++++++++++++++-------
 drivers/gpu/drm/xe/xe_ggtt.h       |  1 -
 drivers/gpu/drm/xe/xe_ggtt_types.h |  9 +++++
 3 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 54baaffc7235..09c6bd46f097 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -20,16 +20,31 @@
 #include "xe_mmio.h"
 #include "xe_wopcm.h"
 
-/* FIXME: Common file, preferably auto-gen */
-#define MTL_GGTT_PTE_PAT0	BIT_ULL(52)
-#define MTL_GGTT_PTE_PAT1	BIT_ULL(53)
+#define XELPG_GGTT_PTE_PAT0	BIT_ULL(52)
+#define XELPG_GGTT_PTE_PAT1	BIT_ULL(53)
 
 /* GuC addresses above GUC_GGTT_TOP also don't map through the GTT */
 #define GUC_GGTT_TOP	0xFEE00000
 
-u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
+static u64 xelp_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
+				   enum xe_cache_level cache)
+{
+	u64 pte;
+
+	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
+	pte |= XE_PAGE_PRESENT;
+
+	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
+		pte |= XE_GGTT_PTE_DM;
+
+	return pte;
+}
+
+static u64 xelpg_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
+				    enum xe_cache_level cache)
 {
 	struct xe_device *xe = xe_bo_device(bo);
+	u32 pat_index = xe->pat.idx[cache];
 	u64 pte;
 
 	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
@@ -38,11 +53,11 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
 	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
 		pte |= XE_GGTT_PTE_DM;
 
-	/* FIXME: vfunc + pass in caching rules */
-	if (xe->info.platform == XE_METEORLAKE) {
-		pte |= MTL_GGTT_PTE_PAT0;
-		pte |= MTL_GGTT_PTE_PAT1;
-	}
+	if (pat_index & BIT(0))
+		pte |= XELPG_GGTT_PTE_PAT0;
+
+	if (pat_index & BIT(1))
+		pte |= XELPG_GGTT_PTE_PAT1;
 
 	return pte;
 }
@@ -72,7 +87,8 @@ static void xe_ggtt_clear(struct xe_ggtt *ggtt, u64 start, u64 size)
 	xe_tile_assert(ggtt->tile, start < end);
 
 	if (ggtt->scratch)
-		scratch_pte = xe_ggtt_pte_encode(ggtt->scratch, 0);
+		scratch_pte = ggtt->pt_ops->pte_encode_bo(ggtt->scratch, 0,
+							  XE_CACHE_WB);
 	else
 		scratch_pte = 0;
 
@@ -102,6 +118,14 @@ static void primelockdep(struct xe_ggtt *ggtt)
 	fs_reclaim_release(GFP_KERNEL);
 }
 
+static const struct xe_ggtt_pt_ops xelp_pt_ops = {
+	.pte_encode_bo = xelp_ggtt_pte_encode_bo,
+};
+
+static const struct xe_ggtt_pt_ops xelpg_pt_ops = {
+	.pte_encode_bo = xelpg_ggtt_pte_encode_bo,
+};
+
 int xe_ggtt_init_noalloc(struct xe_ggtt *ggtt)
 {
 	struct xe_device *xe = tile_to_xe(ggtt->tile);
@@ -146,6 +170,11 @@ int xe_ggtt_init_noalloc(struct xe_ggtt *ggtt)
 	if (ggtt->size > GUC_GGTT_TOP)
 		ggtt->size = GUC_GGTT_TOP;
 
+	if (GRAPHICS_VERx100(xe) >= 1270)
+		ggtt->pt_ops = &xelpg_pt_ops;
+	else
+		ggtt->pt_ops = &xelp_pt_ops;
+
 	drm_mm_init(&ggtt->mm, xe_wopcm_size(xe),
 		    ggtt->size - xe_wopcm_size(xe));
 	mutex_init(&ggtt->lock);
@@ -260,7 +289,7 @@ void xe_ggtt_printk(struct xe_ggtt *ggtt, const char *prefix)
 {
 	u64 addr, scratch_pte;
 
-	scratch_pte = xe_ggtt_pte_encode(ggtt->scratch, 0);
+	scratch_pte = ggtt->pt_ops->pte_encode_bo(ggtt->scratch, 0, XE_CACHE_WB);
 
 	printk("%sGlobal GTT:", prefix);
 	for (addr = 0; addr < ggtt->size; addr += XE_PAGE_SIZE) {
@@ -301,7 +330,7 @@ void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
 	u64 offset, pte;
 
 	for (offset = 0; offset < bo->size; offset += XE_PAGE_SIZE) {
-		pte = xe_ggtt_pte_encode(bo, offset);
+		pte = ggtt->pt_ops->pte_encode_bo(bo, offset, XE_CACHE_WB);
 		xe_ggtt_set_pte(ggtt, start + offset, pte);
 	}
 
diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
index 205a6d058bbd..3faa3c6d0375 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.h
+++ b/drivers/gpu/drm/xe/xe_ggtt.h
@@ -10,7 +10,6 @@
 
 struct drm_printer;
 
-u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset);
 void xe_ggtt_set_pte(struct xe_ggtt *ggtt, u64 addr, u64 pte);
 void xe_ggtt_invalidate(struct xe_ggtt *ggtt);
 int xe_ggtt_init_noalloc(struct xe_ggtt *ggtt);
diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
index d34b3e733945..486016ea5b67 100644
--- a/drivers/gpu/drm/xe/xe_ggtt_types.h
+++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
@@ -8,9 +8,16 @@
 
 #include <drm/drm_mm.h>
 
+#include "xe_pt_types.h"
+
 struct xe_bo;
 struct xe_gt;
 
+struct xe_ggtt_pt_ops {
+	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
+			     enum xe_cache_level cache);
+};
+
 struct xe_ggtt {
 	struct xe_tile *tile;
 
@@ -25,6 +32,8 @@ struct xe_ggtt {
 
 	u64 __iomem *gsm;
 
+	const struct xe_ggtt_pt_ops *pt_ops;
+
 	struct drm_mm mm;
 };
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] [PATCH 9/9] fixup! drm/xe/display: Implement display support
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (7 preceding siblings ...)
  2023-09-25 22:10 ` [Intel-xe] [PATCH 8/9] drm/xe: Use vfunc for ggtt pte encoding Lucas De Marchi
@ 2023-09-25 22:10 ` Lucas De Marchi
  2023-09-25 23:42   ` Matt Roper
  2023-09-25 22:14 ` [Intel-xe] ✓ CI.Patch_applied: success for Refactor kernel setting of PAT Patchwork
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-25 22:10 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi, Matthew Auld

Fix build break due to previous commit. Squash on "Implement display
support" to be moved above the previous commit.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
 drivers/gpu/drm/xe/display/xe_fb_pin.c | 32 ++++++++++++++++++--------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
index 3422942a9951..b7a04fba3585 100644
--- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
+++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
@@ -17,7 +17,10 @@ static void
 write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_ofs,
 		  u32 width, u32 height, u32 src_stride, u32 dst_stride)
 {
+	struct xe_device *xe = xe_bo_device(bo);
+	struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
 	u32 column, row;
+
 	/* TODO: Maybe rewrite so we can traverse the bo addresses sequentially,
 	 * by writing dpt/ggtt in a different order?
 	 */
@@ -26,8 +29,10 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_
 		u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
 
 		for (row = 0; row < height; row++) {
-			iosys_map_wr(map, *dpt_ofs, u64,
-				     xe_ggtt_pte_encode(bo, src_idx * XE_PAGE_SIZE));
+			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
+							      XE_CACHE_WB);
+
+			iosys_map_wr(map, *dpt_ofs, u64, pte);
 			*dpt_ofs += 8;
 			src_idx -= src_stride;
 		}
@@ -46,6 +51,7 @@ static int __xe_pin_fb_vma_dpt(struct intel_framebuffer *fb,
 {
 	struct xe_device *xe = to_xe_device(fb->base.dev);
 	struct xe_tile *tile0 = xe_device_get_root_tile(xe);
+	struct xe_ggtt *ggtt = tile0->mem.ggtt;
 	struct xe_bo *bo = intel_fb_obj(&fb->base), *dpt;
 	u32 dpt_size, size = bo->ttm.base.size;
 
@@ -76,9 +82,12 @@ static int __xe_pin_fb_vma_dpt(struct intel_framebuffer *fb,
 	if (view->type == I915_GTT_VIEW_NORMAL) {
 		u32 x;
 
-		for (x = 0; x < size / XE_PAGE_SIZE; x++)
-			iosys_map_wr(&dpt->vmap, x * 8, u64,
-				     xe_ggtt_pte_encode(bo, x * XE_PAGE_SIZE));
+		for (x = 0; x < size / XE_PAGE_SIZE; x++) {
+			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x * XE_PAGE_SIZE,
+							      XE_CACHE_WB);
+
+			iosys_map_wr(&dpt->vmap, x * 8, u64, pte);
+		}
 	} else {
 		const struct intel_rotation_info *rot_info = &view->rotated;
 		u32 i, dpt_ofs = 0;
@@ -107,8 +116,10 @@ write_ggtt_rotated(struct xe_bo *bo, struct xe_ggtt *ggtt, u32 *ggtt_ofs, u32 bo
 		u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
 
 		for (row = 0; row < height; row++) {
-			xe_ggtt_set_pte(ggtt, *ggtt_ofs,
-					xe_ggtt_pte_encode(bo, src_idx * XE_PAGE_SIZE));
+			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
+							      XE_CACHE_WB);
+
+			xe_ggtt_set_pte(ggtt, *ggtt_ofs, pte);
 			*ggtt_ofs += XE_PAGE_SIZE;
 			src_idx -= src_stride;
 		}
@@ -150,8 +161,11 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
 		if (ret)
 			goto out_unlock;
 
-		for (x = 0; x < size; x += XE_PAGE_SIZE)
-			xe_ggtt_set_pte(ggtt, vma->node.start + x, xe_ggtt_pte_encode(bo, x));
+		for (x = 0; x < size; x += XE_PAGE_SIZE) {
+			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x, XE_CACHE_WB);
+
+			xe_ggtt_set_pte(ggtt, vma->node.start + x, pte);
+		}
 	} else {
 		u32 i, ggtt_ofs;
 		const struct intel_rotation_info *rot_info = &view->rotated;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Intel-xe] ✓ CI.Patch_applied: success for Refactor kernel setting of PAT
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (8 preceding siblings ...)
  2023-09-25 22:10 ` [Intel-xe] [PATCH 9/9] fixup! drm/xe/display: Implement display support Lucas De Marchi
@ 2023-09-25 22:14 ` Patchwork
  2023-09-25 22:14 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
  2023-09-25 22:15 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
  11 siblings, 0 replies; 29+ messages in thread
From: Patchwork @ 2023-09-25 22:14 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe

== Series Details ==

Series: Refactor kernel setting of PAT
URL   : https://patchwork.freedesktop.org/series/124225/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
Base commit: fc8ec3c56 drm/xe: Add Wa_18028616096
=== git am output follows ===
Applying: drm/xe: Normalize pte/pde encoding
Applying: drm/xe: Remove check for vma == NULL
Applying: drm/xe: Use vfunc for pte/pde ppgtt encoding
Applying: drm/xe/migrate: Do not hand-encode pte
Applying: drm/xe: Use vfunc to initialize PAT
Applying: drm/xe/pat: Keep track of relevant indexes
Applying: drm/xe: Use pat_index to encode pte
Applying: drm/xe: Use vfunc for ggtt pte encoding
Applying: fixup! drm/xe/display: Implement display support



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Intel-xe] ✓ CI.checkpatch: success for Refactor kernel setting of PAT
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (9 preceding siblings ...)
  2023-09-25 22:14 ` [Intel-xe] ✓ CI.Patch_applied: success for Refactor kernel setting of PAT Patchwork
@ 2023-09-25 22:14 ` Patchwork
  2023-09-25 22:15 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
  11 siblings, 0 replies; 29+ messages in thread
From: Patchwork @ 2023-09-25 22:14 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe

== Series Details ==

Series: Refactor kernel setting of PAT
URL   : https://patchwork.freedesktop.org/series/124225/
State : success

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
63c2b6b160bca2df6efc7bc4cea6f442097d7854
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 6bcdb926e71a755b73c93b3ca40ca4667466fde1
Author: Lucas De Marchi <lucas.demarchi@intel.com>
Date:   Mon Sep 25 15:10:49 2023 -0700

    fixup! drm/xe/display: Implement display support
    
    Fix build break due to previous commit. Squash on "Implement display
    support" to be moved above the previous commit.
    
    Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
+ /mt/dim checkpatch fc8ec3c56efa5c15b630ddc17c89100440fe03ef drm-intel
ac886e195 drm/xe: Normalize pte/pde encoding
b1d8a2a8e drm/xe: Remove check for vma == NULL
765d093f6 drm/xe: Use vfunc for pte/pde ppgtt encoding
e92116ef4 drm/xe/migrate: Do not hand-encode pte
c3ecdc091 drm/xe: Use vfunc to initialize PAT
84cb739c4 drm/xe/pat: Keep track of relevant indexes
723ffff0e drm/xe: Use pat_index to encode pte
7bf2ea52c drm/xe: Use vfunc for ggtt pte encoding
6bcdb926e fixup! drm/xe/display: Implement display support



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Intel-xe] ✗ CI.KUnit: failure for Refactor kernel setting of PAT
  2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
                   ` (10 preceding siblings ...)
  2023-09-25 22:14 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
@ 2023-09-25 22:15 ` Patchwork
  11 siblings, 0 replies; 29+ messages in thread
From: Patchwork @ 2023-09-25 22:15 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe

== Series Details ==

Series: Refactor kernel setting of PAT
URL   : https://patchwork.freedesktop.org/series/124225/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:In file included from ../drivers/gpu/drm/xe/xe_migrate.c:1368:
../drivers/gpu/drm/xe/tests/xe_migrate.c: In function ‘xe_migrate_sanity_test’:
../drivers/gpu/drm/xe/tests/xe_migrate.c:304:21: error: ‘struct xe_vm’ has no member named ‘pte_encode’
  304 |  expected = m->q->vm->pte_encode(pt, 0, XE_CACHE_WB, 0);
      |                     ^~
make[7]: *** [../scripts/Makefile.build:243: drivers/gpu/drm/xe/xe_migrate.o] Error 1
make[7]: *** Waiting for unfinished jobs....
make[6]: *** [../scripts/Makefile.build:480: drivers/gpu/drm/xe] Error 2
make[5]: *** [../scripts/Makefile.build:480: drivers/gpu/drm] Error 2
make[4]: *** [../scripts/Makefile.build:480: drivers/gpu] Error 2
make[3]: *** [../scripts/Makefile.build:480: drivers] Error 2
make[2]: *** [/kernel/Makefile:2032: .] Error 2
make[1]: *** [/kernel/Makefile:234: __sub-make] Error 2
make: *** [Makefile:234: __sub-make] Error 2

[22:14:36] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[22:14:41] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 1/9] drm/xe: Normalize pte/pde encoding
  2023-09-25 22:10 ` [Intel-xe] [PATCH 1/9] drm/xe: Normalize pte/pde encoding Lucas De Marchi
@ 2023-09-25 22:30   ` Matt Roper
  2023-09-26 13:43     ` Lucas De Marchi
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Roper @ 2023-09-25 22:30 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:41PM -0700, Lucas De Marchi wrote:
> Split functions that do only part of the pde/pte encoding and that can
> be called by the different places. This normalizes how pde/pte are
> encoded so they can be moved elsewhere in a subsequent change.
> 
> xe_pte_encode() was calling __pte_encode() with a NULL vma, which is the
> opposite of what xe_pt_stage_bind_entry() does. Stop passing a NULL vma
> and just split another function that deals with a vma rather than a bo.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_pt.c | 119 +++++++++++++++++++++----------------
>  1 file changed, 67 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index aec469842471..afadd399684c 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -47,91 +47,106 @@ static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
>  	return container_of(pt_dir->dir.entries[index], struct xe_pt, base);
>  }
>  
> -/**
> - * xe_pde_encode() - Encode a page-table directory entry pointing to
> - * another page-table.
> - * @bo: The page-table bo of the page-table to point to.
> - * @bo_offset: Offset in the page-table bo to point to.
> - * @level: The cache level indicating the caching of @bo.
> - *
> - * TODO: Rename.
> - *
> - * Return: An encoded page directory entry. No errors.
> - */
> -u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
> -		  const enum xe_cache_level level)
> +static u64 pde_encode_cache(enum xe_cache_level cache)
>  {
> -	u64 pde;
> -
> -	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> -	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
> -
>  	/* FIXME: I don't think the PPAT handling is correct for MTL */
>  
> -	if (level != XE_CACHE_NONE)
> -		pde |= PPAT_CACHED_PDE;
> -	else
> -		pde |= PPAT_UNCACHED;
> +	if (cache != XE_CACHE_NONE)
> +		return PPAT_CACHED_PDE;
>  
> -	return pde;
> +	return PPAT_UNCACHED;
>  }
>  
> -static u64 __pte_encode(u64 pte, enum xe_cache_level cache,
> -			struct xe_vma *vma, u32 pt_level)
> +static u64 pte_encode_cache(enum xe_cache_level cache)
>  {
> -	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> -
> -	if (unlikely(vma && xe_vma_read_only(vma)))
> -		pte &= ~XE_PAGE_RW;
> -
> -	if (unlikely(vma && xe_vma_is_null(vma)))
> -		pte |= XE_PTE_NULL;
> -
>  	/* FIXME: I don't think the PPAT handling is correct for MTL */
> -
>  	switch (cache) {
>  	case XE_CACHE_NONE:
> -		pte |= PPAT_UNCACHED;
> -		break;
> +		return PPAT_UNCACHED;
>  	case XE_CACHE_WT:
> -		pte |= PPAT_DISPLAY_ELLC;
> -		break;
> +		return PPAT_DISPLAY_ELLC;
>  	default:
> -		pte |= PPAT_CACHED;
> -		break;
> +		return PPAT_CACHED;
>  	}
> +}
> +
> +static u64 pte_encode_ps(u32 pt_level)
> +{
> +	/* XXX: Does hw support 1 GiB pages? */
> +	XE_WARN_ON(pt_level > 2);
>  
>  	if (pt_level == 1)
> -		pte |= XE_PDE_PS_2M;
> +		return XE_PDE_PS_2M;
>  	else if (pt_level == 2)
> -		pte |= XE_PDPE_PS_1G;
> +		return XE_PDPE_PS_1G;
>  
> -	/* XXX: Does hw support 1 GiB pages? */
> -	XE_WARN_ON(pt_level > 2);
> +	return 0;
> +}
>  
> -	return pte;
> +/**
> + * xe_pde_encode() - Encode a page-table directory entry pointing to
> + * another page-table.
> + * @bo: The page-table bo of the page-table to point to.
> + * @bo_offset: Offset in the page-table bo to point to.
> + * @cache: The cache level indicating the caching of @bo.
> + *
> + * TODO: Rename.
> + *
> + * Return: An encoded page directory entry. No errors.
> + */
> +u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
> +		  const enum xe_cache_level cache)
> +{
> +	u64 pde;
> +
> +	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> +	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +	pde |= pde_encode_cache(cache);
> +
> +	return pde;
>  }
>  
>  /**
>   * xe_pte_encode() - Encode a page-table entry pointing to memory.
>   * @bo: The BO representing the memory to point to.
> - * @offset: The offset into @bo.
> + * @bo_offset: The offset into @bo.
>   * @cache: The cache level indicating
>   * @pt_level: The page-table level of the page-table into which the entry
>   * is to be inserted.
>   *
>   * Return: An encoded page-table entry. No errors.
>   */
> -u64 xe_pte_encode(struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> +u64 xe_pte_encode(struct xe_bo *bo, u64 bo_offset, enum xe_cache_level cache,
>  		  u32 pt_level)
>  {
>  	u64 pte;
>  
> -	pte = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
> +	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> +	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +	pte |= pte_encode_cache(cache);
> +	pte |= pte_encode_ps(pt_level);
> +
>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>  		pte |= XE_PPGTT_PTE_DM;
>  
> -	return __pte_encode(pte, cache, NULL, pt_level);
> +	return pte;
> +}
> +
> +/* Like xe_pte_encode(), but with a vma and a partially-encoded pte */
> +static u64 __vma_pte_encode(u64 pte, struct xe_vma *vma,
> +			    enum xe_cache_level cache, u32 pt_level)
> +{
> +	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +	pte |= pte_encode_cache(cache);
> +	pte |= pte_encode_ps(pt_level);
> +
> +	if (unlikely(vma && xe_vma_read_only(vma)))

Do we need the non-NULL check part of this condition (ditto for the one
below)?  With your refactoring, the vma parameter passed to this
function should already be non-NULL, right?  In fact we've already
dereferenced it before the only callsite of this function...

Aside from that,

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> +		pte &= ~XE_PAGE_RW;
> +
> +	if (unlikely(vma && xe_vma_is_null(vma)))
> +		pte |= XE_PTE_NULL;
> +
> +	return pte;
>  }
>  
>  static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
> @@ -614,9 +629,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  
>  		XE_WARN_ON(xe_walk->va_curs_start != addr);
>  
> -		pte = __pte_encode(is_null ? 0 :
> -				   xe_res_dma(curs) + xe_walk->dma_offset,
> -				   xe_walk->cache, xe_walk->vma, level);
> +		pte = __vma_pte_encode(is_null ? 0 :
> +				       xe_res_dma(curs) + xe_walk->dma_offset,
> +				       xe_walk->vma, xe_walk->cache, level);
>  		pte |= xe_walk->default_pte;
>  
>  		/*
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 2/9] drm/xe: Remove check for vma == NULL
  2023-09-25 22:10 ` [Intel-xe] [PATCH 2/9] drm/xe: Remove check for vma == NULL Lucas De Marchi
@ 2023-09-25 22:31   ` Matt Roper
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Roper @ 2023-09-25 22:31 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:42PM -0700, Lucas De Marchi wrote:
> vma at this point can never be NULL as otherwise it would crash earlier
> in the only caller, xe_pt_stage_bind_entry(). Remove the extra check and
> avoid adding and removing the bits from the pte.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

Okay, this takes care of my comment on the previous patch too; I guess I
should have read ahead...

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

> ---
>  drivers/gpu/drm/xe/xe_pt.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index afadd399684c..0b8a45609e83 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -136,14 +136,15 @@ u64 xe_pte_encode(struct xe_bo *bo, u64 bo_offset, enum xe_cache_level cache,
>  static u64 __vma_pte_encode(u64 pte, struct xe_vma *vma,
>  			    enum xe_cache_level cache, u32 pt_level)
>  {
> -	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +	pte |= XE_PAGE_PRESENT;
> +
> +	if (likely(!xe_vma_read_only(vma)))
> +		pte |= XE_PAGE_RW;
> +
>  	pte |= pte_encode_cache(cache);
>  	pte |= pte_encode_ps(pt_level);
>  
> -	if (unlikely(vma && xe_vma_read_only(vma)))
> -		pte &= ~XE_PAGE_RW;
> -
> -	if (unlikely(vma && xe_vma_is_null(vma)))
> +	if (unlikely(xe_vma_is_null(vma)))
>  		pte |= XE_PTE_NULL;
>  
>  	return pte;
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 3/9] drm/xe: Use vfunc for pte/pde ppgtt encoding
  2023-09-25 22:10 ` [Intel-xe] [PATCH 3/9] drm/xe: Use vfunc for pte/pde ppgtt encoding Lucas De Marchi
@ 2023-09-25 22:46   ` Matt Roper
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Roper @ 2023-09-25 22:46 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:43PM -0700, Lucas De Marchi wrote:
> Move the function to encode pte/pde to be vfuncs inside struct xe_vm.
> This will allow to easily extend to platforms that don't have a
> compatible encoding.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/tests/xe_migrate.c |   2 +-
>  drivers/gpu/drm/xe/xe_migrate.c       |  18 ++--
>  drivers/gpu/drm/xe/xe_pt.c            | 125 +++-----------------------
>  drivers/gpu/drm/xe/xe_pt.h            |   6 --
>  drivers/gpu/drm/xe/xe_pt_types.h      |  14 +++
>  drivers/gpu/drm/xe/xe_vm.c            |  93 ++++++++++++++++++-
>  drivers/gpu/drm/xe/xe_vm_types.h      |   2 +
>  7 files changed, 128 insertions(+), 132 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/tests/xe_migrate.c b/drivers/gpu/drm/xe/tests/xe_migrate.c
> index f58cd1da1a34..ffb79e0b4981 100644
> --- a/drivers/gpu/drm/xe/tests/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/tests/xe_migrate.c
> @@ -301,7 +301,7 @@ static void xe_migrate_sanity_test(struct xe_migrate *m, struct kunit *test)
>  	/* First part of the test, are we updating our pagetable bo with a new entry? */
>  	xe_map_wr(xe, &bo->vmap, XE_PAGE_SIZE * (NUM_KERNEL_PDE - 1), u64,
>  		  0xdeaddeadbeefbeef);
> -	expected = xe_pte_encode(pt, 0, XE_CACHE_WB, 0);
> +	expected = m->q->vm->pte_encode(pt, 0, XE_CACHE_WB, 0);

Shouldn't this be pt_ops->pte_encode_bo?


Matt

>  	if (m->q->vm->flags & XE_VM_FLAG_64K)
>  		expected |= XE_PTE_PS64;
>  	if (xe_bo_is_vram(pt))
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 9438f609d18b..aa0396330903 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -189,14 +189,15 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		return ret;
>  	}
>  
> -	entry = xe_pde_encode(bo, bo->size - XE_PAGE_SIZE, XE_CACHE_WB);
> +	entry = vm->pt_ops->pde_encode_bo(bo, bo->size - XE_PAGE_SIZE, XE_CACHE_WB);
>  	xe_pt_write(xe, &vm->pt_root[id]->bo->vmap, 0, entry);
>  
>  	map_ofs = (num_entries - num_level) * XE_PAGE_SIZE;
>  
>  	/* Map the entire BO in our level 0 pt */
>  	for (i = 0, level = 0; i < num_entries; level++) {
> -		entry = xe_pte_encode(bo, i * XE_PAGE_SIZE, XE_CACHE_WB, 0);
> +		entry = vm->pt_ops->pte_encode_bo(bo, i * XE_PAGE_SIZE,
> +						  XE_CACHE_WB, 0);
>  
>  		xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64, entry);
>  
> @@ -214,7 +215,8 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		for (i = 0; i < batch->size;
>  		     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
>  		     XE_PAGE_SIZE) {
> -			entry = xe_pte_encode(batch, i, XE_CACHE_WB, 0);
> +			entry = vm->pt_ops->pte_encode_bo(batch, i,
> +							  XE_CACHE_WB, 0);
>  
>  			xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
>  				  entry);
> @@ -238,16 +240,16 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  		if (vm->flags & XE_VM_FLAG_64K && level == 1)
>  			flags = XE_PDE_64K;
>  
> -		entry = xe_pde_encode(bo, map_ofs + (level - 1) *
> -					XE_PAGE_SIZE, XE_CACHE_WB);
> +		entry = vm->pt_ops->pde_encode_bo(bo, map_ofs + (level - 1) *
> +						  XE_PAGE_SIZE, XE_CACHE_WB);
>  		xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE * level, u64,
>  			  entry | flags);
>  	}
>  
>  	/* Write PDE's that point to our BO. */
>  	for (i = 0; i < num_entries - num_level; i++) {
> -		entry = xe_pde_encode(bo, i * XE_PAGE_SIZE,
> -				      XE_CACHE_WB);
> +		entry = vm->pt_ops->pde_encode_bo(bo, i * XE_PAGE_SIZE,
> +						  XE_CACHE_WB);
>  
>  		xe_map_wr(xe, &bo->vmap, map_ofs + XE_PAGE_SIZE +
>  			  (i + 1) * 8, u64, entry);
> @@ -1255,7 +1257,7 @@ xe_migrate_update_pgtables(struct xe_migrate *m,
>  
>  			xe_tile_assert(tile, pt_bo->size == SZ_4K);
>  
> -			addr = xe_pte_encode(pt_bo, 0, XE_CACHE_WB, 0);
> +			addr = vm->pt_ops->pte_encode_bo(pt_bo, 0, XE_CACHE_WB, 0);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  		}
> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
> index 0b8a45609e83..4d4c6a4c305e 100644
> --- a/drivers/gpu/drm/xe/xe_pt.c
> +++ b/drivers/gpu/drm/xe/xe_pt.c
> @@ -47,109 +47,6 @@ static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
>  	return container_of(pt_dir->dir.entries[index], struct xe_pt, base);
>  }
>  
> -static u64 pde_encode_cache(enum xe_cache_level cache)
> -{
> -	/* FIXME: I don't think the PPAT handling is correct for MTL */
> -
> -	if (cache != XE_CACHE_NONE)
> -		return PPAT_CACHED_PDE;
> -
> -	return PPAT_UNCACHED;
> -}
> -
> -static u64 pte_encode_cache(enum xe_cache_level cache)
> -{
> -	/* FIXME: I don't think the PPAT handling is correct for MTL */
> -	switch (cache) {
> -	case XE_CACHE_NONE:
> -		return PPAT_UNCACHED;
> -	case XE_CACHE_WT:
> -		return PPAT_DISPLAY_ELLC;
> -	default:
> -		return PPAT_CACHED;
> -	}
> -}
> -
> -static u64 pte_encode_ps(u32 pt_level)
> -{
> -	/* XXX: Does hw support 1 GiB pages? */
> -	XE_WARN_ON(pt_level > 2);
> -
> -	if (pt_level == 1)
> -		return XE_PDE_PS_2M;
> -	else if (pt_level == 2)
> -		return XE_PDPE_PS_1G;
> -
> -	return 0;
> -}
> -
> -/**
> - * xe_pde_encode() - Encode a page-table directory entry pointing to
> - * another page-table.
> - * @bo: The page-table bo of the page-table to point to.
> - * @bo_offset: Offset in the page-table bo to point to.
> - * @cache: The cache level indicating the caching of @bo.
> - *
> - * TODO: Rename.
> - *
> - * Return: An encoded page directory entry. No errors.
> - */
> -u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
> -		  const enum xe_cache_level cache)
> -{
> -	u64 pde;
> -
> -	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> -	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
> -	pde |= pde_encode_cache(cache);
> -
> -	return pde;
> -}
> -
> -/**
> - * xe_pte_encode() - Encode a page-table entry pointing to memory.
> - * @bo: The BO representing the memory to point to.
> - * @bo_offset: The offset into @bo.
> - * @cache: The cache level indicating
> - * @pt_level: The page-table level of the page-table into which the entry
> - * is to be inserted.
> - *
> - * Return: An encoded page-table entry. No errors.
> - */
> -u64 xe_pte_encode(struct xe_bo *bo, u64 bo_offset, enum xe_cache_level cache,
> -		  u32 pt_level)
> -{
> -	u64 pte;
> -
> -	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> -	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> -	pte |= pte_encode_cache(cache);
> -	pte |= pte_encode_ps(pt_level);
> -
> -	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
> -		pte |= XE_PPGTT_PTE_DM;
> -
> -	return pte;
> -}
> -
> -/* Like xe_pte_encode(), but with a vma and a partially-encoded pte */
> -static u64 __vma_pte_encode(u64 pte, struct xe_vma *vma,
> -			    enum xe_cache_level cache, u32 pt_level)
> -{
> -	pte |= XE_PAGE_PRESENT;
> -
> -	if (likely(!xe_vma_read_only(vma)))
> -		pte |= XE_PAGE_RW;
> -
> -	pte |= pte_encode_cache(cache);
> -	pte |= pte_encode_ps(pt_level);
> -
> -	if (unlikely(xe_vma_is_null(vma)))
> -		pte |= XE_PTE_NULL;
> -
> -	return pte;
> -}
> -
>  static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>  			     unsigned int level)
>  {
> @@ -158,15 +55,11 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>  	if (!vm->scratch_bo[id])
>  		return 0;
>  
> -	if (level == 0) {
> -		u64 empty = xe_pte_encode(vm->scratch_bo[id], 0,
> -					  XE_CACHE_WB, 0);
> +	if (level > 0)
> +		return vm->pt_ops->pde_encode_bo(vm->scratch_pt[id][level - 1]->bo,
> +						 0, XE_CACHE_WB);
>  
> -		return empty;
> -	} else {
> -		return xe_pde_encode(vm->scratch_pt[id][level - 1]->bo, 0,
> -				     XE_CACHE_WB);
> -	}
> +	return vm->pt_ops->pte_encode_bo(vm->scratch_bo[id], 0, XE_CACHE_WB, 0);
>  }
>  
>  /**
> @@ -618,6 +511,7 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  	struct xe_pt_stage_bind_walk *xe_walk =
>  		container_of(walk, typeof(*xe_walk), base);
>  	struct xe_pt *xe_parent = container_of(parent, typeof(*xe_parent), base);
> +	struct xe_vm *vm = xe_walk->vm;
>  	struct xe_pt *xe_child;
>  	bool covers;
>  	int ret = 0;
> @@ -630,9 +524,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  
>  		XE_WARN_ON(xe_walk->va_curs_start != addr);
>  
> -		pte = __vma_pte_encode(is_null ? 0 :
> -				       xe_res_dma(curs) + xe_walk->dma_offset,
> -				       xe_walk->vma, xe_walk->cache, level);
> +		pte = vm->pt_ops->pte_encode_vma(is_null ? 0 :
> +						 xe_res_dma(curs) + xe_walk->dma_offset,
> +						 xe_walk->vma, xe_walk->cache, level);
>  		pte |= xe_walk->default_pte;
>  
>  		/*
> @@ -697,7 +591,8 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>  			xe_child->is_compact = true;
>  		}
>  
> -		pte = xe_pde_encode(xe_child->bo, 0, xe_walk->cache) | flags;
> +		pte = vm->pt_ops->pde_encode_bo(xe_child->bo, 0,
> +						xe_walk->cache) | flags;
>  		ret = xe_pt_insert_entry(xe_walk, xe_parent, offset, xe_child,
>  					 pte);
>  	}
> diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
> index 01be7ab08f87..d5460e58dbbf 100644
> --- a/drivers/gpu/drm/xe/xe_pt.h
> +++ b/drivers/gpu/drm/xe/xe_pt.h
> @@ -45,10 +45,4 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
>  
>  bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
>  
> -u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
> -		  const enum xe_cache_level level);
> -
> -u64 xe_pte_encode(struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
> -		  u32 pt_level);
> -
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
> index 2ed64c0a4485..c58f6926fabf 100644
> --- a/drivers/gpu/drm/xe/xe_pt_types.h
> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> @@ -6,8 +6,13 @@
>  #ifndef _XE_PT_TYPES_H_
>  #define _XE_PT_TYPES_H_
>  
> +#include <linux/types.h>
> +
>  #include "xe_pt_walk.h"
>  
> +struct xe_bo;
> +struct xe_vma;
> +
>  enum xe_cache_level {
>  	XE_CACHE_NONE,
>  	XE_CACHE_WT,
> @@ -29,6 +34,15 @@ struct xe_pt {
>  #endif
>  };
>  
> +struct xe_pt_ops {
> +	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
> +			     enum xe_cache_level cache, u32 pt_level);
> +	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
> +			      enum xe_cache_level cache, u32 pt_level);
> +	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
> +			     const enum xe_cache_level cache);
> +};
> +
>  struct xe_pt_entry {
>  	struct xe_pt *pt;
>  	u64 pte;
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 861d050871bb..2e1b4d46d9ea 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1191,6 +1191,93 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
>  	.op_alloc = xe_vm_op_alloc,
>  };
>  
> +static u64 pde_encode_cache(enum xe_cache_level cache)
> +{
> +	/* FIXME: I don't think the PPAT handling is correct for MTL */
> +
> +	if (cache != XE_CACHE_NONE)
> +		return PPAT_CACHED_PDE;
> +
> +	return PPAT_UNCACHED;
> +}
> +
> +static u64 pte_encode_cache(enum xe_cache_level cache)
> +{
> +	/* FIXME: I don't think the PPAT handling is correct for MTL */
> +	switch (cache) {
> +	case XE_CACHE_NONE:
> +		return PPAT_UNCACHED;
> +	case XE_CACHE_WT:
> +		return PPAT_DISPLAY_ELLC;
> +	default:
> +		return PPAT_CACHED;
> +	}
> +}
> +
> +static u64 pte_encode_ps(u32 pt_level)
> +{
> +	/* XXX: Does hw support 1 GiB pages? */
> +	XE_WARN_ON(pt_level > 2);
> +
> +	if (pt_level == 1)
> +		return XE_PDE_PS_2M;
> +	else if (pt_level == 2)
> +		return XE_PDPE_PS_1G;
> +
> +	return 0;
> +}
> +
> +static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
> +			      const enum xe_cache_level cache)
> +{
> +	u64 pde;
> +
> +	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> +	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +	pde |= pde_encode_cache(cache);
> +
> +	return pde;
> +}
> +
> +static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
> +			      enum xe_cache_level cache, u32 pt_level)
> +{
> +	u64 pte;
> +
> +	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> +	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +	pte |= pte_encode_cache(cache);
> +	pte |= pte_encode_ps(pt_level);
> +
> +	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
> +		pte |= XE_PPGTT_PTE_DM;
> +
> +	return pte;
> +}
> +
> +static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
> +			       enum xe_cache_level cache, u32 pt_level)
> +{
> +	pte |= XE_PAGE_PRESENT;
> +
> +	if (likely(!xe_vma_read_only(vma)))
> +		pte |= XE_PAGE_RW;
> +
> +	pte |= pte_encode_cache(cache);
> +	pte |= pte_encode_ps(pt_level);
> +
> +	if (unlikely(xe_vma_is_null(vma)))
> +		pte |= XE_PTE_NULL;
> +
> +	return pte;
> +}
> +
> +static const struct xe_pt_ops xelp_pt_ops = {
> +	.pte_encode_bo = xelp_pte_encode_bo,
> +	.pte_encode_vma = xelp_pte_encode_vma,
> +	.pde_encode_bo = xelp_pde_encode_bo,
> +};
> +
>  static void xe_vma_op_work_func(struct work_struct *w);
>  static void vm_destroy_work_func(struct work_struct *w);
>  
> @@ -1239,6 +1326,8 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
>  
>  	INIT_LIST_HEAD(&vm->extobj.list);
>  
> +	vm->pt_ops = &xelp_pt_ops;
> +
>  	if (!(flags & XE_VM_FLAG_MIGRATION))
>  		xe_device_mem_access_get(xe);
>  
> @@ -1574,8 +1663,8 @@ struct xe_vm *xe_vm_lookup(struct xe_file *xef, u32 id)
>  
>  u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile)
>  {
> -	return xe_pde_encode(vm->pt_root[tile->id]->bo, 0,
> -			     XE_CACHE_WB);
> +	return vm->pt_ops->pde_encode_bo(vm->pt_root[tile->id]->bo, 0,
> +					 XE_CACHE_WB);
>  }
>  
>  static struct dma_fence *
> diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
> index af2ba4acf1f9..1c5553b842d7 100644
> --- a/drivers/gpu/drm/xe/xe_vm_types.h
> +++ b/drivers/gpu/drm/xe/xe_vm_types.h
> @@ -249,6 +249,8 @@ struct xe_vm {
>  		bool munmap_rebind_inflight;
>  	} async_ops;
>  
> +	const struct xe_pt_ops *pt_ops;
> +
>  	/** @userptr: user pointer state */
>  	struct {
>  		/**
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 4/9] drm/xe/migrate: Do not hand-encode pte
  2023-09-25 22:10 ` [Intel-xe] [PATCH 4/9] drm/xe/migrate: Do not hand-encode pte Lucas De Marchi
@ 2023-09-25 22:55   ` Matt Roper
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Roper @ 2023-09-25 22:55 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:44PM -0700, Lucas De Marchi wrote:
> Instead of encoding the pte, call a new vfunc from xe_vm to handle that.
> The encoding may not be the same on every platform, so keeping it in one
> place helps to better support them.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_migrate.c  | 14 ++++++++------
>  drivers/gpu/drm/xe/xe_pt_types.h |  2 ++
>  drivers/gpu/drm/xe/xe_vm.c       | 23 ++++++++++++++++++++++-
>  3 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index aa0396330903..f22356d69ae3 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -261,8 +261,7 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  
>  		level = 2;
>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
> -		flags = XE_PAGE_RW | XE_PAGE_PRESENT | PPAT_CACHED |
> -			XE_PPGTT_PTE_DM | XE_PDPE_PS_1G;
> +		flags = vm->pt_ops->pte_encode_addr(0, XE_CACHE_WB, level, true, 0);
>  
>  		/*
>  		 * Use 1GB pages, it shouldn't matter the physical amount of
> @@ -483,7 +482,8 @@ static void emit_pte(struct xe_migrate *m,
>  		ptes -= chunk;
>  
>  		while (chunk--) {
> -			u64 addr;
> +			u64 addr, flags = 0;
> +			bool devmem = false;
>  
>  			addr = xe_res_dma(cur) & PAGE_MASK;
>  			if (is_vram) {
> @@ -491,13 +491,15 @@ static void emit_pte(struct xe_migrate *m,
>  				if ((m->q->vm->flags & XE_VM_FLAG_64K) &&
>  				    !(cur_ofs & (16 * 8 - 1))) {
>  					xe_tile_assert(m->tile, IS_ALIGNED(addr, SZ_64K));
> -					addr |= XE_PTE_PS64;
> +					flags |= XE_PTE_PS64;
>  				}
>  
>  				addr += vram_region_gpu_offset(bo->ttm.resource);
> -				addr |= XE_PPGTT_PTE_DM;
> +				devmem = true;
>  			}
> -			addr |= PPAT_CACHED | XE_PAGE_PRESENT | XE_PAGE_RW;
> +
> +			addr |= m->q->vm->pt_ops->pte_encode_addr(addr, XE_CACHE_WB,
> +								  0, devmem, flags);

It seems like "addr =" might be more intuitive than "addr |=" here, even
though there's no functional difference.

Either way,

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
>  
> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
> index c58f6926fabf..64e3921a0f46 100644
> --- a/drivers/gpu/drm/xe/xe_pt_types.h
> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> @@ -39,6 +39,8 @@ struct xe_pt_ops {
>  			     enum xe_cache_level cache, u32 pt_level);
>  	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
>  			      enum xe_cache_level cache, u32 pt_level);
> +	u64 (*pte_encode_addr)(u64 addr, enum xe_cache_level cache,
> +			       u32 pt_level, bool devmem, u64 flags);
>  	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>  			     const enum xe_cache_level cache);
>  };
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 2e1b4d46d9ea..23452b98d853 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1216,7 +1216,6 @@ static u64 pte_encode_cache(enum xe_cache_level cache)
>  
>  static u64 pte_encode_ps(u32 pt_level)
>  {
> -	/* XXX: Does hw support 1 GiB pages? */
>  	XE_WARN_ON(pt_level > 2);
>  
>  	if (pt_level == 1)
> @@ -1272,9 +1271,31 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>  	return pte;
>  }
>  
> +static u64 xelp_pte_encode_addr(u64 addr, enum xe_cache_level cache,
> +				u32 pt_level, bool devmem, u64 flags)
> +{
> +	u64 pte;
> +
> +	/* Avoid passing random bits directly as flags */
> +	XE_WARN_ON(flags & ~XE_PTE_PS64);
> +
> +	pte = addr;
> +	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> +	pte |= pte_encode_cache(cache);
> +	pte |= pte_encode_ps(pt_level);
> +
> +	if (devmem)
> +		pte |= XE_PPGTT_PTE_DM;
> +
> +	pte |= flags;
> +
> +	return pte;
> +}
> +
>  static const struct xe_pt_ops xelp_pt_ops = {
>  	.pte_encode_bo = xelp_pte_encode_bo,
>  	.pte_encode_vma = xelp_pte_encode_vma,
> +	.pte_encode_addr = xelp_pte_encode_addr,
>  	.pde_encode_bo = xelp_pde_encode_bo,
>  };
>  
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 5/9] drm/xe: Use vfunc to initialize PAT
  2023-09-25 22:10 ` [Intel-xe] [PATCH 5/9] drm/xe: Use vfunc to initialize PAT Lucas De Marchi
@ 2023-09-25 23:08   ` Matt Roper
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Roper @ 2023-09-25 23:08 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:45PM -0700, Lucas De Marchi wrote:
> Split the PAT initialization between SW-only and HW. The _early() only
> sets up the ops and data structure that are used later to program the
> tables. This allows the PAT to be easily extended to other platforms.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.c       |  3 ++
>  drivers/gpu/drm/xe/xe_device_types.h | 13 ++++++
>  drivers/gpu/drm/xe/xe_pat.c          | 59 +++++++++++++++++++++-------
>  drivers/gpu/drm/xe/xe_pat.h          | 11 ++++++
>  4 files changed, 72 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index 687dc3d79a66..046b7a087451 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -26,6 +26,7 @@
>  #include "xe_irq.h"
>  #include "xe_mmio.h"
>  #include "xe_module.h"
> +#include "xe_pat.h"
>  #include "xe_pcode.h"
>  #include "xe_pm.h"
>  #include "xe_query.h"
> @@ -276,6 +277,8 @@ int xe_device_probe(struct xe_device *xe)
>  	int err;
>  	u8 id;
>  
> +	xe_pat_init_early(xe);
> +
>  	xe->info.mem_region_mask = 1;
>  	err = xe_display_init_nommio(xe);
>  	if (err)
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 32ab0fea04ee..98835ee058f5 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -25,6 +25,7 @@
>  #endif
>  
>  struct xe_ggtt;
> +struct xe_pat_ops;
>  
>  #define XE_BO_INVALID_OFFSET	LONG_MAX
>  
> @@ -331,6 +332,18 @@ struct xe_device {
>  		atomic_t ref;
>  	} mem_access;
>  
> +	/**
> +	 * @pat: Encapsulate PAT related stuff
> +	 */
> +	struct {
> +		/** Internal operations to abstract platforms */
> +		const struct xe_pat_ops *ops;

We can start with this for now, but down the road we may want to move
this into xe_gt instead of keeping it in the xe_device.  Moving it to
xe_gt would let us just have a single 'program' vfunc, but would also
make it easier to provide per-GT functionality for other things like
coherency decisions.  Even though the coherency is marked in the
device-wide table and is theoretically equivalent for each GT, we've
already seen workarounds like Wa_22016122933 that alter the true
coherency behavior and require that we select actual PAT indices in a
different manner on each GT.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>


Matt

> +		/** PAT table to program in the HW */
> +		const u32 *table;
> +		/** Number of PAT entries */
> +		int n_entries;
> +	} pat;
> +
>  	/** @d3cold: Encapsulate d3cold related stuff */
>  	struct {
>  		/** capable: Indicates if root port is d3cold capable */
> diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
> index 71e0e047fff3..86386633e206 100644
> --- a/drivers/gpu/drm/xe/xe_pat.c
> +++ b/drivers/gpu/drm/xe/xe_pat.c
> @@ -32,6 +32,11 @@
>  #define TGL_PAT_WC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 1)
>  #define TGL_PAT_UC				REG_FIELD_PREP(TGL_MEM_TYPE_MASK, 0)
>  
> +struct xe_pat_ops {
> +	void (*program_graphics)(struct xe_gt *gt, const u32 table[], int n_entries);
> +	void (*program_media)(struct xe_gt *gt, const u32 table[], int n_entries);
> +};
> +
>  static const u32 tgl_pat_table[] = {
>  	[0] = TGL_PAT_WB,
>  	[1] = TGL_PAT_WC,
> @@ -80,24 +85,37 @@ static void program_pat_mcr(struct xe_gt *gt, const u32 table[], int n_entries)
>  	}
>  }
>  
> -void xe_pat_init(struct xe_gt *gt)
> -{
> -	struct xe_device *xe = gt_to_xe(gt);
> +static const struct xe_pat_ops tgl_pat_ops = {
> +	.program_graphics = program_pat,
> +};
> +
> +static const struct xe_pat_ops pvc_pat_ops = {
> +	.program_graphics = program_pat_mcr,
> +};
> +
> +/*
> + * SAMedia register offsets are adjusted by the write methods and they target
> + * registers that are not MCR, while for normal GT they are MCR
> + */
> +static const struct xe_pat_ops mtl_pat_ops = {
> +	.program_graphics = program_pat,
> +	.program_media = program_pat_mcr,
> +};
>  
> +void xe_pat_init_early(struct xe_device *xe)
> +{
>  	if (xe->info.platform == XE_METEORLAKE) {
> -		/*
> -		 * SAMedia register offsets are adjusted by the write methods
> -		 * and they target registers that are not MCR, while for normal
> -		 * GT they are MCR
> -		 */
> -		if (xe_gt_is_media_type(gt))
> -			program_pat(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
> -		else
> -			program_pat_mcr(gt, mtl_pat_table, ARRAY_SIZE(mtl_pat_table));
> +		xe->pat.ops = &mtl_pat_ops;
> +		xe->pat.table = mtl_pat_table;
> +		xe->pat.n_entries = ARRAY_SIZE(mtl_pat_table);
>  	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
> -		program_pat_mcr(gt, pvc_pat_table, ARRAY_SIZE(pvc_pat_table));
> +		xe->pat.ops = &pvc_pat_ops;
> +		xe->pat.table = pvc_pat_table;
> +		xe->pat.n_entries = ARRAY_SIZE(pvc_pat_table);
>  	} else if (GRAPHICS_VERx100(xe) <= 1210) {
> -		program_pat(gt, tgl_pat_table, ARRAY_SIZE(tgl_pat_table));
> +		xe->pat.ops = &tgl_pat_ops;
> +		xe->pat.table = tgl_pat_table;
> +		xe->pat.n_entries = ARRAY_SIZE(tgl_pat_table);
>  	} else {
>  		/*
>  		 * Going forward we expect to need new PAT settings for most
> @@ -111,3 +129,16 @@ void xe_pat_init(struct xe_gt *gt)
>  			GRAPHICS_VER(xe), GRAPHICS_VERx100(xe) % 100);
>  	}
>  }
> +
> +void xe_pat_init(struct xe_gt *gt)
> +{
> +	struct xe_device *xe = gt_to_xe(gt);
> +
> +	if (!xe->pat.ops)
> +		return;
> +
> +	if (xe_gt_is_media_type(gt))
> +		xe->pat.ops->program_media(gt, xe->pat.table, xe->pat.n_entries);
> +	else
> +		xe->pat.ops->program_graphics(gt, xe->pat.table, xe->pat.n_entries);
> +}
> diff --git a/drivers/gpu/drm/xe/xe_pat.h b/drivers/gpu/drm/xe/xe_pat.h
> index 659de4008131..744318cab69b 100644
> --- a/drivers/gpu/drm/xe/xe_pat.h
> +++ b/drivers/gpu/drm/xe/xe_pat.h
> @@ -7,7 +7,18 @@
>  #define _XE_PAT_H_
>  
>  struct xe_gt;
> +struct xe_device;
>  
> +/**
> + * xe_pat_init_early - SW initialization, setting up data based on device
> + * @xe: xe device
> + */
> +void xe_pat_init_early(struct xe_device *xe);
> +
> +/**
> + * xe_pat_init_early - Program HW PAT table
> + * @gt: GT structure
> + */
>  void xe_pat_init(struct xe_gt *gt);
>  
>  #endif
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes
  2023-09-25 22:10 ` [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes Lucas De Marchi
@ 2023-09-25 23:14   ` Matt Roper
  2023-09-26 14:43     ` Lucas De Marchi
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Roper @ 2023-09-25 23:14 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:46PM -0700, Lucas De Marchi wrote:
> Some of the PAT entries are relevant for own driver use, which varies
> per platform. Let the PAT early initialization set what they should
> point to so the rest of the driver can use them where needed.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device_types.h | 2 ++
>  drivers/gpu/drm/xe/xe_pat.c          | 9 +++++++++
>  drivers/gpu/drm/xe/xe_pt_types.h     | 1 +
>  3 files changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 98835ee058f5..7d0f2109c23a 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -15,6 +15,7 @@
>  #include "xe_devcoredump_types.h"
>  #include "xe_gt_types.h"
>  #include "xe_platform_types.h"
> +#include "xe_pt_types.h"
>  #include "xe_pmu.h"
>  #include "xe_step_types.h"
>  
> @@ -342,6 +343,7 @@ struct xe_device {
>  		const u32 *table;
>  		/** Number of PAT entries */
>  		int n_entries;
> +		u32 idx[__XE_CACHE_LEVEL_COUNT];
>  	} pat;
>  
>  	/** @d3cold: Encapsulate d3cold related stuff */
> diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
> index 86386633e206..966b63252735 100644
> --- a/drivers/gpu/drm/xe/xe_pat.c
> +++ b/drivers/gpu/drm/xe/xe_pat.c
> @@ -108,14 +108,23 @@ void xe_pat_init_early(struct xe_device *xe)
>  		xe->pat.ops = &mtl_pat_ops;
>  		xe->pat.table = mtl_pat_table;
>  		xe->pat.n_entries = ARRAY_SIZE(mtl_pat_table);
> +		xe->pat.idx[XE_CACHE_NONE] = 2;
> +		xe->pat.idx[XE_CACHE_WT] = 1;
> +		xe->pat.idx[XE_CACHE_WB] = 3;
>  	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
>  		xe->pat.ops = &pvc_pat_ops;
>  		xe->pat.table = pvc_pat_table;
>  		xe->pat.n_entries = ARRAY_SIZE(pvc_pat_table);
> +		xe->pat.idx[XE_CACHE_NONE] = 0;
> +		xe->pat.idx[XE_CACHE_WT] = 2;
> +		xe->pat.idx[XE_CACHE_WB] = 3;

These are the PVC indices, but they're not correct for DG2.  IIRC, DG2
should be using the same ones as xe_lp below (although it looks like the
bspec tagging might be messed up right now...)


Matt

>  	} else if (GRAPHICS_VERx100(xe) <= 1210) {
>  		xe->pat.ops = &tgl_pat_ops;
>  		xe->pat.table = tgl_pat_table;
>  		xe->pat.n_entries = ARRAY_SIZE(tgl_pat_table);
> +		xe->pat.idx[XE_CACHE_NONE] = 3;
> +		xe->pat.idx[XE_CACHE_WT] = 2;
> +		xe->pat.idx[XE_CACHE_WB] = 0;
>  	} else {
>  		/*
>  		 * Going forward we expect to need new PAT settings for most
> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
> index 64e3921a0f46..bf5000499251 100644
> --- a/drivers/gpu/drm/xe/xe_pt_types.h
> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> @@ -17,6 +17,7 @@ enum xe_cache_level {
>  	XE_CACHE_NONE,
>  	XE_CACHE_WT,
>  	XE_CACHE_WB,
> +	__XE_CACHE_LEVEL_COUNT,
>  };
>  
>  #define XE_VM_MAX_LEVEL 4
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 7/9] drm/xe: Use pat_index to encode pte
  2023-09-25 22:10 ` [Intel-xe] [PATCH 7/9] drm/xe: Use pat_index to encode pte Lucas De Marchi
@ 2023-09-25 23:27   ` Matt Roper
  2023-09-26 14:59     ` Lucas De Marchi
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Roper @ 2023-09-25 23:27 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:47PM -0700, Lucas De Marchi wrote:
> Change the xelp_pte_encode() function to use the platform-dependent
> pat_index.  The same function can be used for all platforms as they only
> need to encode the pat_index bits in the same pte layout. For platforms
> that don't have the most significant bit, as long as they don't return a
> bogus index they should be fine.
> 
> To maintain consistency, also add xe argument to pde_encode_cache().
> This will be needed for Xe2 later.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_bo.h       | 11 ++++---
>  drivers/gpu/drm/xe/xe_migrate.c  |  6 ++--
>  drivers/gpu/drm/xe/xe_pt_types.h |  4 ++-
>  drivers/gpu/drm/xe/xe_vm.c       | 50 +++++++++++++++++++-------------
>  4 files changed, 44 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
> index e3c90d45e723..885412581158 100644
> --- a/drivers/gpu/drm/xe/xe_bo.h
> +++ b/drivers/gpu/drm/xe/xe_bo.h
> @@ -39,10 +39,13 @@
>  #define XE_BO_INTERNAL_TEST		BIT(30)
>  #define XE_BO_INTERNAL_64K		BIT(31)
>  
> -#define PPAT_UNCACHED                   GENMASK_ULL(4, 3)
> -#define PPAT_CACHED_PDE                 0
> -#define PPAT_CACHED                     BIT_ULL(7)
> -#define PPAT_DISPLAY_ELLC               BIT_ULL(4)
> +#define XE_PPGTT_PDE_UNCACHED		GENMASK_ULL(4, 3)
> +#define XE_PPGTT_PDE_CACHED		0

Shouldn't these two be going away?  Indices 0 and 3 are only correct for
certain platforms.  The PDE uses the same PAT encoding as the PTE, as
documented in the bspec's PTF_COMMON_BITS (55730) and referenced from
pages 62361, 55731, etc.

> +
> +#define XELPG_PPGTT_PTE_PAT3		BIT_ULL(62)
> +#define XE_PPGTT_PTE_PAT2		BIT_ULL(7)
> +#define XE_PPGTT_PTE_PAT1		BIT_ULL(4)
> +#define XE_PPGTT_PTE_PAT0		BIT_ULL(3)
>  
>  #define XE_PTE_SHIFT			12
>  #define XE_PAGE_SIZE			(1 << XE_PTE_SHIFT)
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index f22356d69ae3..11dc3b0e0350 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -261,7 +261,8 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>  
>  		level = 2;
>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
> -		flags = vm->pt_ops->pte_encode_addr(0, XE_CACHE_WB, level, true, 0);
> +		flags = vm->pt_ops->pte_encode_addr(xe, 0, XE_CACHE_WB, level,
> +						    true, 0);
>  
>  		/*
>  		 * Use 1GB pages, it shouldn't matter the physical amount of
> @@ -498,7 +499,8 @@ static void emit_pte(struct xe_migrate *m,
>  				devmem = true;
>  			}
>  
> -			addr |= m->q->vm->pt_ops->pte_encode_addr(addr, XE_CACHE_WB,
> +			addr |= m->q->vm->pt_ops->pte_encode_addr(m->tile->xe,
> +								  addr, XE_CACHE_WB,
>  								  0, devmem, flags);
>  			bb->cs[bb->len++] = lower_32_bits(addr);
>  			bb->cs[bb->len++] = upper_32_bits(addr);
> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
> index bf5000499251..bd6645295fe6 100644
> --- a/drivers/gpu/drm/xe/xe_pt_types.h
> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> @@ -11,6 +11,7 @@
>  #include "xe_pt_walk.h"
>  
>  struct xe_bo;
> +struct xe_device;
>  struct xe_vma;
>  
>  enum xe_cache_level {
> @@ -40,7 +41,8 @@ struct xe_pt_ops {
>  			     enum xe_cache_level cache, u32 pt_level);
>  	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
>  			      enum xe_cache_level cache, u32 pt_level);
> -	u64 (*pte_encode_addr)(u64 addr, enum xe_cache_level cache,
> +	u64 (*pte_encode_addr)(struct xe_device *xe, u64 addr,
> +			       enum xe_cache_level cache,
>  			       u32 pt_level, bool devmem, u64 flags);
>  	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>  			     const enum xe_cache_level cache);
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index 23452b98d853..ab9cf1de566a 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -1191,27 +1191,32 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
>  	.op_alloc = xe_vm_op_alloc,
>  };
>  
> -static u64 pde_encode_cache(enum xe_cache_level cache)
> +static u64 pde_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
>  {
> -	/* FIXME: I don't think the PPAT handling is correct for MTL */
> -
>  	if (cache != XE_CACHE_NONE)
> -		return PPAT_CACHED_PDE;
> +		return XE_PPGTT_PDE_CACHED;
>  
> -	return PPAT_UNCACHED;
> +	return XE_PPGTT_PDE_UNCACHED;

As noted above, we're getting different behavior from these on different
platforms.  We should be encoding the same way we do for the PTE.


Matt

>  }
>  
> -static u64 pte_encode_cache(enum xe_cache_level cache)
> +static u64 pte_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
>  {
> -	/* FIXME: I don't think the PPAT handling is correct for MTL */
> -	switch (cache) {
> -	case XE_CACHE_NONE:
> -		return PPAT_UNCACHED;
> -	case XE_CACHE_WT:
> -		return PPAT_DISPLAY_ELLC;
> -	default:
> -		return PPAT_CACHED;
> -	}
> +	u32 pat_index = xe->pat.idx[cache];
> +	u64 pte = 0;
> +
> +	if (pat_index & BIT(0))
> +		pte |= XE_PPGTT_PTE_PAT0;
> +
> +	if (pat_index & BIT(1))
> +		pte |= XE_PPGTT_PTE_PAT1;
> +
> +	if (pat_index & BIT(2))
> +		pte |= XE_PPGTT_PTE_PAT2;
> +
> +	if (pat_index & BIT(3))
> +		pte |= XELPG_PPGTT_PTE_PAT3;
> +
> +	return pte;
>  }
>  
>  static u64 pte_encode_ps(u32 pt_level)
> @@ -1229,11 +1234,12 @@ static u64 pte_encode_ps(u32 pt_level)
>  static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
>  			      const enum xe_cache_level cache)
>  {
> +	struct xe_device *xe = xe_bo_device(bo);
>  	u64 pde;
>  
>  	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>  	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
> -	pde |= pde_encode_cache(cache);
> +	pde |= pde_encode_cache(xe, cache);
>  
>  	return pde;
>  }
> @@ -1241,11 +1247,12 @@ static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
>  static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>  			      enum xe_cache_level cache, u32 pt_level)
>  {
> +	struct xe_device *xe = xe_bo_device(bo);
>  	u64 pte;
>  
>  	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>  	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> -	pte |= pte_encode_cache(cache);
> +	pte |= pte_encode_cache(xe, cache);
>  	pte |= pte_encode_ps(pt_level);
>  
>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
> @@ -1257,12 +1264,14 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>  static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>  			       enum xe_cache_level cache, u32 pt_level)
>  {
> +	struct xe_device *xe = xe_vma_vm(vma)->xe;
> +
>  	pte |= XE_PAGE_PRESENT;
>  
>  	if (likely(!xe_vma_read_only(vma)))
>  		pte |= XE_PAGE_RW;
>  
> -	pte |= pte_encode_cache(cache);
> +	pte |= pte_encode_cache(xe, cache);
>  	pte |= pte_encode_ps(pt_level);
>  
>  	if (unlikely(xe_vma_is_null(vma)))
> @@ -1271,7 +1280,8 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>  	return pte;
>  }
>  
> -static u64 xelp_pte_encode_addr(u64 addr, enum xe_cache_level cache,
> +static u64 xelp_pte_encode_addr(struct xe_device *xe, u64 addr,
> +				enum xe_cache_level cache,
>  				u32 pt_level, bool devmem, u64 flags)
>  {
>  	u64 pte;
> @@ -1281,7 +1291,7 @@ static u64 xelp_pte_encode_addr(u64 addr, enum xe_cache_level cache,
>  
>  	pte = addr;
>  	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
> -	pte |= pte_encode_cache(cache);
> +	pte |= pte_encode_cache(xe, cache);
>  	pte |= pte_encode_ps(pt_level);
>  
>  	if (devmem)
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 8/9] drm/xe: Use vfunc for ggtt pte encoding
  2023-09-25 22:10 ` [Intel-xe] [PATCH 8/9] drm/xe: Use vfunc for ggtt pte encoding Lucas De Marchi
@ 2023-09-25 23:37   ` Matt Roper
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Roper @ 2023-09-25 23:37 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:48PM -0700, Lucas De Marchi wrote:
> Use 2 different functions for encoding the ggtt's pte, assigning them
> during initialization. Main difference is that before Xe-LPG, the pte
> didn't have the cache bits.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_ggtt.c       | 53 +++++++++++++++++++++++-------
>  drivers/gpu/drm/xe/xe_ggtt.h       |  1 -
>  drivers/gpu/drm/xe/xe_ggtt_types.h |  9 +++++
>  3 files changed, 50 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
> index 54baaffc7235..09c6bd46f097 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.c
> +++ b/drivers/gpu/drm/xe/xe_ggtt.c
> @@ -20,16 +20,31 @@
>  #include "xe_mmio.h"
>  #include "xe_wopcm.h"
>  
> -/* FIXME: Common file, preferably auto-gen */
> -#define MTL_GGTT_PTE_PAT0	BIT_ULL(52)
> -#define MTL_GGTT_PTE_PAT1	BIT_ULL(53)
> +#define XELPG_GGTT_PTE_PAT0	BIT_ULL(52)
> +#define XELPG_GGTT_PTE_PAT1	BIT_ULL(53)
>  
>  /* GuC addresses above GUC_GGTT_TOP also don't map through the GTT */
>  #define GUC_GGTT_TOP	0xFEE00000
>  
> -u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
> +static u64 xelp_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
> +				   enum xe_cache_level cache)
> +{
> +	u64 pte;
> +
> +	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> +	pte |= XE_PAGE_PRESENT;
> +
> +	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
> +		pte |= XE_GGTT_PTE_DM;
> +
> +	return pte;
> +}
> +
> +static u64 xelpg_ggtt_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
> +				    enum xe_cache_level cache)
>  {

Since the two functions are basically the same aside from PAT, should we
just implement this by calling the xelp version and then add the
PAT-specific stuff at the end?

        pte = xelp_ggtt_pte_encode_bo(bo, bo_offset, cache);

        xe_assert(xe, pat_index <= 3);
        if (pat_index & BIT(0))
                pte |= XELPG_GGTT_PTE_PAT0;
        if (pat_index & BIT(1))
                pte |= XELPG_GGTT_PTE_PAT1;

        return pte;

Either way,
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>


Matt

>  	struct xe_device *xe = xe_bo_device(bo);
> +	u32 pat_index = xe->pat.idx[cache];
>  	u64 pte;
>  
>  	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
> @@ -38,11 +53,11 @@ u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset)
>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>  		pte |= XE_GGTT_PTE_DM;
>  
> -	/* FIXME: vfunc + pass in caching rules */
> -	if (xe->info.platform == XE_METEORLAKE) {
> -		pte |= MTL_GGTT_PTE_PAT0;
> -		pte |= MTL_GGTT_PTE_PAT1;
> -	}
> +	if (pat_index & BIT(0))
> +		pte |= XELPG_GGTT_PTE_PAT0;
> +
> +	if (pat_index & BIT(1))
> +		pte |= XELPG_GGTT_PTE_PAT1;
>  
>  	return pte;
>  }
> @@ -72,7 +87,8 @@ static void xe_ggtt_clear(struct xe_ggtt *ggtt, u64 start, u64 size)
>  	xe_tile_assert(ggtt->tile, start < end);
>  
>  	if (ggtt->scratch)
> -		scratch_pte = xe_ggtt_pte_encode(ggtt->scratch, 0);
> +		scratch_pte = ggtt->pt_ops->pte_encode_bo(ggtt->scratch, 0,
> +							  XE_CACHE_WB);
>  	else
>  		scratch_pte = 0;
>  
> @@ -102,6 +118,14 @@ static void primelockdep(struct xe_ggtt *ggtt)
>  	fs_reclaim_release(GFP_KERNEL);
>  }
>  
> +static const struct xe_ggtt_pt_ops xelp_pt_ops = {
> +	.pte_encode_bo = xelp_ggtt_pte_encode_bo,
> +};
> +
> +static const struct xe_ggtt_pt_ops xelpg_pt_ops = {
> +	.pte_encode_bo = xelpg_ggtt_pte_encode_bo,
> +};
> +
>  int xe_ggtt_init_noalloc(struct xe_ggtt *ggtt)
>  {
>  	struct xe_device *xe = tile_to_xe(ggtt->tile);
> @@ -146,6 +170,11 @@ int xe_ggtt_init_noalloc(struct xe_ggtt *ggtt)
>  	if (ggtt->size > GUC_GGTT_TOP)
>  		ggtt->size = GUC_GGTT_TOP;
>  
> +	if (GRAPHICS_VERx100(xe) >= 1270)
> +		ggtt->pt_ops = &xelpg_pt_ops;
> +	else
> +		ggtt->pt_ops = &xelp_pt_ops;
> +
>  	drm_mm_init(&ggtt->mm, xe_wopcm_size(xe),
>  		    ggtt->size - xe_wopcm_size(xe));
>  	mutex_init(&ggtt->lock);
> @@ -260,7 +289,7 @@ void xe_ggtt_printk(struct xe_ggtt *ggtt, const char *prefix)
>  {
>  	u64 addr, scratch_pte;
>  
> -	scratch_pte = xe_ggtt_pte_encode(ggtt->scratch, 0);
> +	scratch_pte = ggtt->pt_ops->pte_encode_bo(ggtt->scratch, 0, XE_CACHE_WB);
>  
>  	printk("%sGlobal GTT:", prefix);
>  	for (addr = 0; addr < ggtt->size; addr += XE_PAGE_SIZE) {
> @@ -301,7 +330,7 @@ void xe_ggtt_map_bo(struct xe_ggtt *ggtt, struct xe_bo *bo)
>  	u64 offset, pte;
>  
>  	for (offset = 0; offset < bo->size; offset += XE_PAGE_SIZE) {
> -		pte = xe_ggtt_pte_encode(bo, offset);
> +		pte = ggtt->pt_ops->pte_encode_bo(bo, offset, XE_CACHE_WB);
>  		xe_ggtt_set_pte(ggtt, start + offset, pte);
>  	}
>  
> diff --git a/drivers/gpu/drm/xe/xe_ggtt.h b/drivers/gpu/drm/xe/xe_ggtt.h
> index 205a6d058bbd..3faa3c6d0375 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt.h
> +++ b/drivers/gpu/drm/xe/xe_ggtt.h
> @@ -10,7 +10,6 @@
>  
>  struct drm_printer;
>  
> -u64 xe_ggtt_pte_encode(struct xe_bo *bo, u64 bo_offset);
>  void xe_ggtt_set_pte(struct xe_ggtt *ggtt, u64 addr, u64 pte);
>  void xe_ggtt_invalidate(struct xe_ggtt *ggtt);
>  int xe_ggtt_init_noalloc(struct xe_ggtt *ggtt);
> diff --git a/drivers/gpu/drm/xe/xe_ggtt_types.h b/drivers/gpu/drm/xe/xe_ggtt_types.h
> index d34b3e733945..486016ea5b67 100644
> --- a/drivers/gpu/drm/xe/xe_ggtt_types.h
> +++ b/drivers/gpu/drm/xe/xe_ggtt_types.h
> @@ -8,9 +8,16 @@
>  
>  #include <drm/drm_mm.h>
>  
> +#include "xe_pt_types.h"
> +
>  struct xe_bo;
>  struct xe_gt;
>  
> +struct xe_ggtt_pt_ops {
> +	u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset,
> +			     enum xe_cache_level cache);
> +};
> +
>  struct xe_ggtt {
>  	struct xe_tile *tile;
>  
> @@ -25,6 +32,8 @@ struct xe_ggtt {
>  
>  	u64 __iomem *gsm;
>  
> +	const struct xe_ggtt_pt_ops *pt_ops;
> +
>  	struct drm_mm mm;
>  };
>  
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 9/9] fixup! drm/xe/display: Implement display support
  2023-09-25 22:10 ` [Intel-xe] [PATCH 9/9] fixup! drm/xe/display: Implement display support Lucas De Marchi
@ 2023-09-25 23:42   ` Matt Roper
  2023-09-26 20:48     ` Lucas De Marchi
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Roper @ 2023-09-25 23:42 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:10:49PM -0700, Lucas De Marchi wrote:
> Fix build break due to previous commit. Squash on "Implement display
> support" to be moved above the previous commit.

I know Matt Auld's series is going to come through and replace all the
hardcoded cache settings with the proper PAT index for whatever object
we're operating on, but for the time being should we at least switch the
hardcoding in this patch to be CACHE_WT rather than CACHE_WB?  Since
display is never coherent (regardless of the platform's CPU<->GT
coherency behavior), WB would pretty much always be the wrong thing for
surfaces that we're mapping into a DPT.


Matt

> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
>  drivers/gpu/drm/xe/display/xe_fb_pin.c | 32 ++++++++++++++++++--------
>  1 file changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> index 3422942a9951..b7a04fba3585 100644
> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> @@ -17,7 +17,10 @@ static void
>  write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_ofs,
>  		  u32 width, u32 height, u32 src_stride, u32 dst_stride)
>  {
> +	struct xe_device *xe = xe_bo_device(bo);
> +	struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>  	u32 column, row;
> +
>  	/* TODO: Maybe rewrite so we can traverse the bo addresses sequentially,
>  	 * by writing dpt/ggtt in a different order?
>  	 */
> @@ -26,8 +29,10 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_
>  		u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
>  
>  		for (row = 0; row < height; row++) {
> -			iosys_map_wr(map, *dpt_ofs, u64,
> -				     xe_ggtt_pte_encode(bo, src_idx * XE_PAGE_SIZE));
> +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
> +							      XE_CACHE_WB);
> +
> +			iosys_map_wr(map, *dpt_ofs, u64, pte);
>  			*dpt_ofs += 8;
>  			src_idx -= src_stride;
>  		}
> @@ -46,6 +51,7 @@ static int __xe_pin_fb_vma_dpt(struct intel_framebuffer *fb,
>  {
>  	struct xe_device *xe = to_xe_device(fb->base.dev);
>  	struct xe_tile *tile0 = xe_device_get_root_tile(xe);
> +	struct xe_ggtt *ggtt = tile0->mem.ggtt;
>  	struct xe_bo *bo = intel_fb_obj(&fb->base), *dpt;
>  	u32 dpt_size, size = bo->ttm.base.size;
>  
> @@ -76,9 +82,12 @@ static int __xe_pin_fb_vma_dpt(struct intel_framebuffer *fb,
>  	if (view->type == I915_GTT_VIEW_NORMAL) {
>  		u32 x;
>  
> -		for (x = 0; x < size / XE_PAGE_SIZE; x++)
> -			iosys_map_wr(&dpt->vmap, x * 8, u64,
> -				     xe_ggtt_pte_encode(bo, x * XE_PAGE_SIZE));
> +		for (x = 0; x < size / XE_PAGE_SIZE; x++) {
> +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x * XE_PAGE_SIZE,
> +							      XE_CACHE_WB);
> +
> +			iosys_map_wr(&dpt->vmap, x * 8, u64, pte);
> +		}
>  	} else {
>  		const struct intel_rotation_info *rot_info = &view->rotated;
>  		u32 i, dpt_ofs = 0;
> @@ -107,8 +116,10 @@ write_ggtt_rotated(struct xe_bo *bo, struct xe_ggtt *ggtt, u32 *ggtt_ofs, u32 bo
>  		u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
>  
>  		for (row = 0; row < height; row++) {
> -			xe_ggtt_set_pte(ggtt, *ggtt_ofs,
> -					xe_ggtt_pte_encode(bo, src_idx * XE_PAGE_SIZE));
> +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
> +							      XE_CACHE_WB);
> +
> +			xe_ggtt_set_pte(ggtt, *ggtt_ofs, pte);
>  			*ggtt_ofs += XE_PAGE_SIZE;
>  			src_idx -= src_stride;
>  		}
> @@ -150,8 +161,11 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
>  		if (ret)
>  			goto out_unlock;
>  
> -		for (x = 0; x < size; x += XE_PAGE_SIZE)
> -			xe_ggtt_set_pte(ggtt, vma->node.start + x, xe_ggtt_pte_encode(bo, x));
> +		for (x = 0; x < size; x += XE_PAGE_SIZE) {
> +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x, XE_CACHE_WB);
> +
> +			xe_ggtt_set_pte(ggtt, vma->node.start + x, pte);
> +		}
>  	} else {
>  		u32 i, ggtt_ofs;
>  		const struct intel_rotation_info *rot_info = &view->rotated;
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 1/9] drm/xe: Normalize pte/pde encoding
  2023-09-25 22:30   ` Matt Roper
@ 2023-09-26 13:43     ` Lucas De Marchi
  0 siblings, 0 replies; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-26 13:43 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 03:30:06PM -0700, Matt Roper wrote:
>On Mon, Sep 25, 2023 at 03:10:41PM -0700, Lucas De Marchi wrote:
>> Split functions that do only part of the pde/pte encoding and that can
>> be called by the different places. This normalizes how pde/pte are
>> encoded so they can be moved elsewhere in a subsequent change.
>>
>> xe_pte_encode() was calling __pte_encode() with a NULL vma, which is the
>> opposite of what xe_pt_stage_bind_entry() does. Stop passing a NULL vma
>> and just split another function that deals with a vma rather than a bo.
>>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_pt.c | 119 +++++++++++++++++++++----------------
>>  1 file changed, 67 insertions(+), 52 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
>> index aec469842471..afadd399684c 100644
>> --- a/drivers/gpu/drm/xe/xe_pt.c
>> +++ b/drivers/gpu/drm/xe/xe_pt.c
>> @@ -47,91 +47,106 @@ static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
>>  	return container_of(pt_dir->dir.entries[index], struct xe_pt, base);
>>  }
>>
>> -/**
>> - * xe_pde_encode() - Encode a page-table directory entry pointing to
>> - * another page-table.
>> - * @bo: The page-table bo of the page-table to point to.
>> - * @bo_offset: Offset in the page-table bo to point to.
>> - * @level: The cache level indicating the caching of @bo.
>> - *
>> - * TODO: Rename.
>> - *
>> - * Return: An encoded page directory entry. No errors.
>> - */
>> -u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
>> -		  const enum xe_cache_level level)
>> +static u64 pde_encode_cache(enum xe_cache_level cache)
>>  {
>> -	u64 pde;
>> -
>> -	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>> -	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> -
>>  	/* FIXME: I don't think the PPAT handling is correct for MTL */
>>
>> -	if (level != XE_CACHE_NONE)
>> -		pde |= PPAT_CACHED_PDE;
>> -	else
>> -		pde |= PPAT_UNCACHED;
>> +	if (cache != XE_CACHE_NONE)
>> +		return PPAT_CACHED_PDE;
>>
>> -	return pde;
>> +	return PPAT_UNCACHED;
>>  }
>>
>> -static u64 __pte_encode(u64 pte, enum xe_cache_level cache,
>> -			struct xe_vma *vma, u32 pt_level)
>> +static u64 pte_encode_cache(enum xe_cache_level cache)
>>  {
>> -	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> -
>> -	if (unlikely(vma && xe_vma_read_only(vma)))
>> -		pte &= ~XE_PAGE_RW;
>> -
>> -	if (unlikely(vma && xe_vma_is_null(vma)))
>> -		pte |= XE_PTE_NULL;
>> -
>>  	/* FIXME: I don't think the PPAT handling is correct for MTL */
>> -
>>  	switch (cache) {
>>  	case XE_CACHE_NONE:
>> -		pte |= PPAT_UNCACHED;
>> -		break;
>> +		return PPAT_UNCACHED;
>>  	case XE_CACHE_WT:
>> -		pte |= PPAT_DISPLAY_ELLC;
>> -		break;
>> +		return PPAT_DISPLAY_ELLC;
>>  	default:
>> -		pte |= PPAT_CACHED;
>> -		break;
>> +		return PPAT_CACHED;
>>  	}
>> +}
>> +
>> +static u64 pte_encode_ps(u32 pt_level)
>> +{
>> +	/* XXX: Does hw support 1 GiB pages? */
>> +	XE_WARN_ON(pt_level > 2);
>>
>>  	if (pt_level == 1)
>> -		pte |= XE_PDE_PS_2M;
>> +		return XE_PDE_PS_2M;
>>  	else if (pt_level == 2)
>> -		pte |= XE_PDPE_PS_1G;
>> +		return XE_PDPE_PS_1G;
>>
>> -	/* XXX: Does hw support 1 GiB pages? */
>> -	XE_WARN_ON(pt_level > 2);
>> +	return 0;
>> +}
>>
>> -	return pte;
>> +/**
>> + * xe_pde_encode() - Encode a page-table directory entry pointing to
>> + * another page-table.
>> + * @bo: The page-table bo of the page-table to point to.
>> + * @bo_offset: Offset in the page-table bo to point to.
>> + * @cache: The cache level indicating the caching of @bo.
>> + *
>> + * TODO: Rename.
>> + *
>> + * Return: An encoded page directory entry. No errors.
>> + */
>> +u64 xe_pde_encode(struct xe_bo *bo, u64 bo_offset,
>> +		  const enum xe_cache_level cache)
>> +{
>> +	u64 pde;
>> +
>> +	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>> +	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> +	pde |= pde_encode_cache(cache);
>> +
>> +	return pde;
>>  }
>>
>>  /**
>>   * xe_pte_encode() - Encode a page-table entry pointing to memory.
>>   * @bo: The BO representing the memory to point to.
>> - * @offset: The offset into @bo.
>> + * @bo_offset: The offset into @bo.
>>   * @cache: The cache level indicating
>>   * @pt_level: The page-table level of the page-table into which the entry
>>   * is to be inserted.
>>   *
>>   * Return: An encoded page-table entry. No errors.
>>   */
>> -u64 xe_pte_encode(struct xe_bo *bo, u64 offset, enum xe_cache_level cache,
>> +u64 xe_pte_encode(struct xe_bo *bo, u64 bo_offset, enum xe_cache_level cache,
>>  		  u32 pt_level)
>>  {
>>  	u64 pte;
>>
>> -	pte = xe_bo_addr(bo, offset, XE_PAGE_SIZE);
>> +	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>> +	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> +	pte |= pte_encode_cache(cache);
>> +	pte |= pte_encode_ps(pt_level);
>> +
>>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>>  		pte |= XE_PPGTT_PTE_DM;
>>
>> -	return __pte_encode(pte, cache, NULL, pt_level);
>> +	return pte;
>> +}
>> +
>> +/* Like xe_pte_encode(), but with a vma and a partially-encoded pte */
>> +static u64 __vma_pte_encode(u64 pte, struct xe_vma *vma,
>> +			    enum xe_cache_level cache, u32 pt_level)
>> +{
>> +	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> +	pte |= pte_encode_cache(cache);
>> +	pte |= pte_encode_ps(pt_level);
>> +
>> +	if (unlikely(vma && xe_vma_read_only(vma)))
>
>Do we need the non-NULL check part of this condition (ditto for the one
>below)?  With your refactoring, the vma parameter passed to this
>function should already be non-NULL, right?  In fact we've already
>dereferenced it before the only callsite of this function...

yeah... I think the goal of this patch was accomplished: to become
obvious we can't have a NULL vma at this point. I see you noticed it
later in patch 2 ;)


>
>Aside from that,
>
>Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

thanks
Lucas De Marchi

>
>> +		pte &= ~XE_PAGE_RW;
>> +
>> +	if (unlikely(vma && xe_vma_is_null(vma)))
>> +		pte |= XE_PTE_NULL;
>> +
>> +	return pte;
>>  }
>>
>>  static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
>> @@ -614,9 +629,9 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
>>
>>  		XE_WARN_ON(xe_walk->va_curs_start != addr);
>>
>> -		pte = __pte_encode(is_null ? 0 :
>> -				   xe_res_dma(curs) + xe_walk->dma_offset,
>> -				   xe_walk->cache, xe_walk->vma, level);
>> +		pte = __vma_pte_encode(is_null ? 0 :
>> +				       xe_res_dma(curs) + xe_walk->dma_offset,
>> +				       xe_walk->vma, xe_walk->cache, level);
>>  		pte |= xe_walk->default_pte;
>>
>>  		/*
>> --
>> 2.40.1
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes
  2023-09-25 23:14   ` Matt Roper
@ 2023-09-26 14:43     ` Lucas De Marchi
  2023-09-26 15:34       ` Matt Roper
  0 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-26 14:43 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 04:14:43PM -0700, Matt Roper wrote:
>On Mon, Sep 25, 2023 at 03:10:46PM -0700, Lucas De Marchi wrote:
>> Some of the PAT entries are relevant for own driver use, which varies
>> per platform. Let the PAT early initialization set what they should
>> point to so the rest of the driver can use them where needed.
>>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_device_types.h | 2 ++
>>  drivers/gpu/drm/xe/xe_pat.c          | 9 +++++++++
>>  drivers/gpu/drm/xe/xe_pt_types.h     | 1 +
>>  3 files changed, 12 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> index 98835ee058f5..7d0f2109c23a 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -15,6 +15,7 @@
>>  #include "xe_devcoredump_types.h"
>>  #include "xe_gt_types.h"
>>  #include "xe_platform_types.h"
>> +#include "xe_pt_types.h"
>>  #include "xe_pmu.h"
>>  #include "xe_step_types.h"
>>
>> @@ -342,6 +343,7 @@ struct xe_device {
>>  		const u32 *table;
>>  		/** Number of PAT entries */
>>  		int n_entries;
>> +		u32 idx[__XE_CACHE_LEVEL_COUNT];
>>  	} pat;
>>
>>  	/** @d3cold: Encapsulate d3cold related stuff */
>> diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
>> index 86386633e206..966b63252735 100644
>> --- a/drivers/gpu/drm/xe/xe_pat.c
>> +++ b/drivers/gpu/drm/xe/xe_pat.c
>> @@ -108,14 +108,23 @@ void xe_pat_init_early(struct xe_device *xe)
>>  		xe->pat.ops = &mtl_pat_ops;
>>  		xe->pat.table = mtl_pat_table;
>>  		xe->pat.n_entries = ARRAY_SIZE(mtl_pat_table);
>> +		xe->pat.idx[XE_CACHE_NONE] = 2;
>> +		xe->pat.idx[XE_CACHE_WT] = 1;
>> +		xe->pat.idx[XE_CACHE_WB] = 3;
>>  	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
>>  		xe->pat.ops = &pvc_pat_ops;
>>  		xe->pat.table = pvc_pat_table;
>>  		xe->pat.n_entries = ARRAY_SIZE(pvc_pat_table);
>> +		xe->pat.idx[XE_CACHE_NONE] = 0;
>> +		xe->pat.idx[XE_CACHE_WT] = 2;
>> +		xe->pat.idx[XE_CACHE_WB] = 3;
>
>These are the PVC indices, but they're not correct for DG2.  IIRC, DG2
>should be using the same ones as xe_lp below (although it looks like the
>bspec tagging might be messed up right now...)

But that should imply a different table, otherwise the index doesn't
really make much sense. Looking at i915 we have, for DG2:

drivers/gpu/drm/i915/i915_pci.c
	#define TGL_CACHELEVEL \
		.cachelevel_to_pat = { \
			[I915_CACHE_NONE]   = 0, \
			[I915_CACHE_LLC]    = 1, \
			[I915_CACHE_L3_LLC] = 2, \
			[I915_CACHE_WT]     = 3, \
		}

drivers/gpu/drm/i915/gt/intel_gtt.c
	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(0), GEN8_PPAT_WB);
	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(1), GEN8_PPAT_WC);
	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(2), GEN8_PPAT_WT);
	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(3), GEN8_PPAT_UC);
	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(4), GEN8_PPAT_WB);
	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(5), GEN8_PPAT_WB);
	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(6), GEN8_PPAT_WB);
	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(7), GEN8_PPAT_WB);

So, it's effectively:
	I915_CACHE_NONE		-> GEN8_PPAT_WB
	I915_CACHE_LLC		-> GEN8_PPAT_WC
	I915_CACHE_L3_LLC	-> GEN8_PPAT_WT
	I915_CACHE_WT		-> GEN8_PPAT_UC

Does this even make sense?

For TGL it is:
	I915_CACHE_NONE		-> GEN8_PPAT_UC
	I915_CACHE_LLC		-> GEN8_PPAT_WB
	I915_CACHE_L3_LLC	-> GEN8_PPAT_WB
	I915_CACHE_WT		-> GEN8_PPAT_WT

.... which seems better. /me confused, maybe I'm reading something wrong
here.

Lucas De Marchi

>
>Matt
>
>>  	} else if (GRAPHICS_VERx100(xe) <= 1210) {
>>  		xe->pat.ops = &tgl_pat_ops;
>>  		xe->pat.table = tgl_pat_table;
>>  		xe->pat.n_entries = ARRAY_SIZE(tgl_pat_table);
>> +		xe->pat.idx[XE_CACHE_NONE] = 3;
>> +		xe->pat.idx[XE_CACHE_WT] = 2;
>> +		xe->pat.idx[XE_CACHE_WB] = 0;
>>  	} else {
>>  		/*
>>  		 * Going forward we expect to need new PAT settings for most
>> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
>> index 64e3921a0f46..bf5000499251 100644
>> --- a/drivers/gpu/drm/xe/xe_pt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
>> @@ -17,6 +17,7 @@ enum xe_cache_level {
>>  	XE_CACHE_NONE,
>>  	XE_CACHE_WT,
>>  	XE_CACHE_WB,
>> +	__XE_CACHE_LEVEL_COUNT,
>>  };
>>
>>  #define XE_VM_MAX_LEVEL 4
>> --
>> 2.40.1
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 7/9] drm/xe: Use pat_index to encode pte
  2023-09-25 23:27   ` Matt Roper
@ 2023-09-26 14:59     ` Lucas De Marchi
  0 siblings, 0 replies; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-26 14:59 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 04:27:20PM -0700, Matt Roper wrote:
>On Mon, Sep 25, 2023 at 03:10:47PM -0700, Lucas De Marchi wrote:
>> Change the xelp_pte_encode() function to use the platform-dependent
>> pat_index.  The same function can be used for all platforms as they only
>> need to encode the pat_index bits in the same pte layout. For platforms
>> that don't have the most significant bit, as long as they don't return a
>> bogus index they should be fine.
>>
>> To maintain consistency, also add xe argument to pde_encode_cache().
>> This will be needed for Xe2 later.
>>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_bo.h       | 11 ++++---
>>  drivers/gpu/drm/xe/xe_migrate.c  |  6 ++--
>>  drivers/gpu/drm/xe/xe_pt_types.h |  4 ++-
>>  drivers/gpu/drm/xe/xe_vm.c       | 50 +++++++++++++++++++-------------
>>  4 files changed, 44 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.h b/drivers/gpu/drm/xe/xe_bo.h
>> index e3c90d45e723..885412581158 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.h
>> +++ b/drivers/gpu/drm/xe/xe_bo.h
>> @@ -39,10 +39,13 @@
>>  #define XE_BO_INTERNAL_TEST		BIT(30)
>>  #define XE_BO_INTERNAL_64K		BIT(31)
>>
>> -#define PPAT_UNCACHED                   GENMASK_ULL(4, 3)
>> -#define PPAT_CACHED_PDE                 0
>> -#define PPAT_CACHED                     BIT_ULL(7)
>> -#define PPAT_DISPLAY_ELLC               BIT_ULL(4)
>> +#define XE_PPGTT_PDE_UNCACHED		GENMASK_ULL(4, 3)
>> +#define XE_PPGTT_PDE_CACHED		0
>
>Shouldn't these two be going away?  Indices 0 and 3 are only correct for
>certain platforms.  The PDE uses the same PAT encoding as the PTE, as
>documented in the bspec's PTF_COMMON_BITS (55730) and referenced from
>pages 62361, 55731, etc.

I was postponing this change to when it's needed in xe2, but it seems
this is already not compatible with (at least) MTL and what we have
currently have in xe is broken.

I will take a look to include this in v2. Thanks.

Lucas De Marchi

>
>> +
>> +#define XELPG_PPGTT_PTE_PAT3		BIT_ULL(62)
>> +#define XE_PPGTT_PTE_PAT2		BIT_ULL(7)
>> +#define XE_PPGTT_PTE_PAT1		BIT_ULL(4)
>> +#define XE_PPGTT_PTE_PAT0		BIT_ULL(3)
>>
>>  #define XE_PTE_SHIFT			12
>>  #define XE_PAGE_SIZE			(1 << XE_PTE_SHIFT)
>> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>> index f22356d69ae3..11dc3b0e0350 100644
>> --- a/drivers/gpu/drm/xe/xe_migrate.c
>> +++ b/drivers/gpu/drm/xe/xe_migrate.c
>> @@ -261,7 +261,8 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
>>
>>  		level = 2;
>>  		ofs = map_ofs + XE_PAGE_SIZE * level + 256 * 8;
>> -		flags = vm->pt_ops->pte_encode_addr(0, XE_CACHE_WB, level, true, 0);
>> +		flags = vm->pt_ops->pte_encode_addr(xe, 0, XE_CACHE_WB, level,
>> +						    true, 0);
>>
>>  		/*
>>  		 * Use 1GB pages, it shouldn't matter the physical amount of
>> @@ -498,7 +499,8 @@ static void emit_pte(struct xe_migrate *m,
>>  				devmem = true;
>>  			}
>>
>> -			addr |= m->q->vm->pt_ops->pte_encode_addr(addr, XE_CACHE_WB,
>> +			addr |= m->q->vm->pt_ops->pte_encode_addr(m->tile->xe,
>> +								  addr, XE_CACHE_WB,
>>  								  0, devmem, flags);
>>  			bb->cs[bb->len++] = lower_32_bits(addr);
>>  			bb->cs[bb->len++] = upper_32_bits(addr);
>> diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
>> index bf5000499251..bd6645295fe6 100644
>> --- a/drivers/gpu/drm/xe/xe_pt_types.h
>> +++ b/drivers/gpu/drm/xe/xe_pt_types.h
>> @@ -11,6 +11,7 @@
>>  #include "xe_pt_walk.h"
>>
>>  struct xe_bo;
>> +struct xe_device;
>>  struct xe_vma;
>>
>>  enum xe_cache_level {
>> @@ -40,7 +41,8 @@ struct xe_pt_ops {
>>  			     enum xe_cache_level cache, u32 pt_level);
>>  	u64 (*pte_encode_vma)(u64 pte, struct xe_vma *vma,
>>  			      enum xe_cache_level cache, u32 pt_level);
>> -	u64 (*pte_encode_addr)(u64 addr, enum xe_cache_level cache,
>> +	u64 (*pte_encode_addr)(struct xe_device *xe, u64 addr,
>> +			       enum xe_cache_level cache,
>>  			       u32 pt_level, bool devmem, u64 flags);
>>  	u64 (*pde_encode_bo)(struct xe_bo *bo, u64 bo_offset,
>>  			     const enum xe_cache_level cache);
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index 23452b98d853..ab9cf1de566a 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -1191,27 +1191,32 @@ static struct drm_gpuva_fn_ops gpuva_ops = {
>>  	.op_alloc = xe_vm_op_alloc,
>>  };
>>
>> -static u64 pde_encode_cache(enum xe_cache_level cache)
>> +static u64 pde_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
>>  {
>> -	/* FIXME: I don't think the PPAT handling is correct for MTL */
>> -
>>  	if (cache != XE_CACHE_NONE)
>> -		return PPAT_CACHED_PDE;
>> +		return XE_PPGTT_PDE_CACHED;
>>
>> -	return PPAT_UNCACHED;
>> +	return XE_PPGTT_PDE_UNCACHED;
>
>As noted above, we're getting different behavior from these on different
>platforms.  We should be encoding the same way we do for the PTE.
>
>
>Matt
>
>>  }
>>
>> -static u64 pte_encode_cache(enum xe_cache_level cache)
>> +static u64 pte_encode_cache(struct xe_device *xe, enum xe_cache_level cache)
>>  {
>> -	/* FIXME: I don't think the PPAT handling is correct for MTL */
>> -	switch (cache) {
>> -	case XE_CACHE_NONE:
>> -		return PPAT_UNCACHED;
>> -	case XE_CACHE_WT:
>> -		return PPAT_DISPLAY_ELLC;
>> -	default:
>> -		return PPAT_CACHED;
>> -	}
>> +	u32 pat_index = xe->pat.idx[cache];
>> +	u64 pte = 0;
>> +
>> +	if (pat_index & BIT(0))
>> +		pte |= XE_PPGTT_PTE_PAT0;
>> +
>> +	if (pat_index & BIT(1))
>> +		pte |= XE_PPGTT_PTE_PAT1;
>> +
>> +	if (pat_index & BIT(2))
>> +		pte |= XE_PPGTT_PTE_PAT2;
>> +
>> +	if (pat_index & BIT(3))
>> +		pte |= XELPG_PPGTT_PTE_PAT3;
>> +
>> +	return pte;
>>  }
>>
>>  static u64 pte_encode_ps(u32 pt_level)
>> @@ -1229,11 +1234,12 @@ static u64 pte_encode_ps(u32 pt_level)
>>  static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
>>  			      const enum xe_cache_level cache)
>>  {
>> +	struct xe_device *xe = xe_bo_device(bo);
>>  	u64 pde;
>>
>>  	pde = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>>  	pde |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> -	pde |= pde_encode_cache(cache);
>> +	pde |= pde_encode_cache(xe, cache);
>>
>>  	return pde;
>>  }
>> @@ -1241,11 +1247,12 @@ static u64 xelp_pde_encode_bo(struct xe_bo *bo, u64 bo_offset,
>>  static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>>  			      enum xe_cache_level cache, u32 pt_level)
>>  {
>> +	struct xe_device *xe = xe_bo_device(bo);
>>  	u64 pte;
>>
>>  	pte = xe_bo_addr(bo, bo_offset, XE_PAGE_SIZE);
>>  	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> -	pte |= pte_encode_cache(cache);
>> +	pte |= pte_encode_cache(xe, cache);
>>  	pte |= pte_encode_ps(pt_level);
>>
>>  	if (xe_bo_is_vram(bo) || xe_bo_is_stolen_devmem(bo))
>> @@ -1257,12 +1264,14 @@ static u64 xelp_pte_encode_bo(struct xe_bo *bo, u64 bo_offset,
>>  static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>>  			       enum xe_cache_level cache, u32 pt_level)
>>  {
>> +	struct xe_device *xe = xe_vma_vm(vma)->xe;
>> +
>>  	pte |= XE_PAGE_PRESENT;
>>
>>  	if (likely(!xe_vma_read_only(vma)))
>>  		pte |= XE_PAGE_RW;
>>
>> -	pte |= pte_encode_cache(cache);
>> +	pte |= pte_encode_cache(xe, cache);
>>  	pte |= pte_encode_ps(pt_level);
>>
>>  	if (unlikely(xe_vma_is_null(vma)))
>> @@ -1271,7 +1280,8 @@ static u64 xelp_pte_encode_vma(u64 pte, struct xe_vma *vma,
>>  	return pte;
>>  }
>>
>> -static u64 xelp_pte_encode_addr(u64 addr, enum xe_cache_level cache,
>> +static u64 xelp_pte_encode_addr(struct xe_device *xe, u64 addr,
>> +				enum xe_cache_level cache,
>>  				u32 pt_level, bool devmem, u64 flags)
>>  {
>>  	u64 pte;
>> @@ -1281,7 +1291,7 @@ static u64 xelp_pte_encode_addr(u64 addr, enum xe_cache_level cache,
>>
>>  	pte = addr;
>>  	pte |= XE_PAGE_PRESENT | XE_PAGE_RW;
>> -	pte |= pte_encode_cache(cache);
>> +	pte |= pte_encode_cache(xe, cache);
>>  	pte |= pte_encode_ps(pt_level);
>>
>>  	if (devmem)
>> --
>> 2.40.1
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes
  2023-09-26 14:43     ` Lucas De Marchi
@ 2023-09-26 15:34       ` Matt Roper
  2023-09-26 16:55         ` Lucas De Marchi
  0 siblings, 1 reply; 29+ messages in thread
From: Matt Roper @ 2023-09-26 15:34 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Tue, Sep 26, 2023 at 09:43:51AM -0500, Lucas De Marchi wrote:
> On Mon, Sep 25, 2023 at 04:14:43PM -0700, Matt Roper wrote:
> > On Mon, Sep 25, 2023 at 03:10:46PM -0700, Lucas De Marchi wrote:
> > > Some of the PAT entries are relevant for own driver use, which varies
> > > per platform. Let the PAT early initialization set what they should
> > > point to so the rest of the driver can use them where needed.
> > > 
> > > Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_device_types.h | 2 ++
> > >  drivers/gpu/drm/xe/xe_pat.c          | 9 +++++++++
> > >  drivers/gpu/drm/xe/xe_pt_types.h     | 1 +
> > >  3 files changed, 12 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > > index 98835ee058f5..7d0f2109c23a 100644
> > > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > @@ -15,6 +15,7 @@
> > >  #include "xe_devcoredump_types.h"
> > >  #include "xe_gt_types.h"
> > >  #include "xe_platform_types.h"
> > > +#include "xe_pt_types.h"
> > >  #include "xe_pmu.h"
> > >  #include "xe_step_types.h"
> > > 
> > > @@ -342,6 +343,7 @@ struct xe_device {
> > >  		const u32 *table;
> > >  		/** Number of PAT entries */
> > >  		int n_entries;
> > > +		u32 idx[__XE_CACHE_LEVEL_COUNT];
> > >  	} pat;
> > > 
> > >  	/** @d3cold: Encapsulate d3cold related stuff */
> > > diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
> > > index 86386633e206..966b63252735 100644
> > > --- a/drivers/gpu/drm/xe/xe_pat.c
> > > +++ b/drivers/gpu/drm/xe/xe_pat.c
> > > @@ -108,14 +108,23 @@ void xe_pat_init_early(struct xe_device *xe)
> > >  		xe->pat.ops = &mtl_pat_ops;
> > >  		xe->pat.table = mtl_pat_table;
> > >  		xe->pat.n_entries = ARRAY_SIZE(mtl_pat_table);
> > > +		xe->pat.idx[XE_CACHE_NONE] = 2;
> > > +		xe->pat.idx[XE_CACHE_WT] = 1;
> > > +		xe->pat.idx[XE_CACHE_WB] = 3;
> > >  	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
> > >  		xe->pat.ops = &pvc_pat_ops;
> > >  		xe->pat.table = pvc_pat_table;
> > >  		xe->pat.n_entries = ARRAY_SIZE(pvc_pat_table);
> > > +		xe->pat.idx[XE_CACHE_NONE] = 0;
> > > +		xe->pat.idx[XE_CACHE_WT] = 2;
> > > +		xe->pat.idx[XE_CACHE_WB] = 3;
> > 
> > These are the PVC indices, but they're not correct for DG2.  IIRC, DG2
> > should be using the same ones as xe_lp below (although it looks like the
> > bspec tagging might be messed up right now...)
> 
> But that should imply a different table, otherwise the index doesn't
> really make much sense. Looking at i915 we have, for DG2:
> 
> drivers/gpu/drm/i915/i915_pci.c
> 	#define TGL_CACHELEVEL \
> 		.cachelevel_to_pat = { \
> 			[I915_CACHE_NONE]   = 0, \
> 			[I915_CACHE_LLC]    = 1, \
> 			[I915_CACHE_L3_LLC] = 2, \
> 			[I915_CACHE_WT]     = 3, \
> 		}

Where (what branch) do you see that on?  drm-tip has

	#define TGL_CACHELEVEL \                                                                                       
		.cachelevel_to_pat = { \                                                                               
			[I915_CACHE_NONE]   = 3, \                                                                     
			[I915_CACHE_LLC]    = 0, \                                                                     
			[I915_CACHE_L3_LLC] = 0, \                                                                     
			[I915_CACHE_WT]     = 2, \                                                                     

which also matches the intel_gtt.c programming below.

The table you pasted should be the LEGACY_CACHELEVEL identity table,
used for platforms that don't really have/use PAT index.


Matt

> 
> drivers/gpu/drm/i915/gt/intel_gtt.c
> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(0), GEN8_PPAT_WB);
> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(1), GEN8_PPAT_WC);
> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(2), GEN8_PPAT_WT);
> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(3), GEN8_PPAT_UC);
> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(4), GEN8_PPAT_WB);
> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(5), GEN8_PPAT_WB);
> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(6), GEN8_PPAT_WB);
> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(7), GEN8_PPAT_WB);
> 
> So, it's effectively:
> 	I915_CACHE_NONE		-> GEN8_PPAT_WB
> 	I915_CACHE_LLC		-> GEN8_PPAT_WC
> 	I915_CACHE_L3_LLC	-> GEN8_PPAT_WT
> 	I915_CACHE_WT		-> GEN8_PPAT_UC
> 
> Does this even make sense?
> 
> For TGL it is:
> 	I915_CACHE_NONE		-> GEN8_PPAT_UC
> 	I915_CACHE_LLC		-> GEN8_PPAT_WB
> 	I915_CACHE_L3_LLC	-> GEN8_PPAT_WB
> 	I915_CACHE_WT		-> GEN8_PPAT_WT
> 
> .... which seems better. /me confused, maybe I'm reading something wrong
> here.
> 
> Lucas De Marchi
> 
> > 
> > Matt
> > 
> > >  	} else if (GRAPHICS_VERx100(xe) <= 1210) {
> > >  		xe->pat.ops = &tgl_pat_ops;
> > >  		xe->pat.table = tgl_pat_table;
> > >  		xe->pat.n_entries = ARRAY_SIZE(tgl_pat_table);
> > > +		xe->pat.idx[XE_CACHE_NONE] = 3;
> > > +		xe->pat.idx[XE_CACHE_WT] = 2;
> > > +		xe->pat.idx[XE_CACHE_WB] = 0;
> > >  	} else {
> > >  		/*
> > >  		 * Going forward we expect to need new PAT settings for most
> > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
> > > index 64e3921a0f46..bf5000499251 100644
> > > --- a/drivers/gpu/drm/xe/xe_pt_types.h
> > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h
> > > @@ -17,6 +17,7 @@ enum xe_cache_level {
> > >  	XE_CACHE_NONE,
> > >  	XE_CACHE_WT,
> > >  	XE_CACHE_WB,
> > > +	__XE_CACHE_LEVEL_COUNT,
> > >  };
> > > 
> > >  #define XE_VM_MAX_LEVEL 4
> > > --
> > > 2.40.1
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes
  2023-09-26 15:34       ` Matt Roper
@ 2023-09-26 16:55         ` Lucas De Marchi
  0 siblings, 0 replies; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-26 16:55 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-xe, Matthew Auld

On Tue, Sep 26, 2023 at 08:34:18AM -0700, Matt Roper wrote:
>On Tue, Sep 26, 2023 at 09:43:51AM -0500, Lucas De Marchi wrote:
>> On Mon, Sep 25, 2023 at 04:14:43PM -0700, Matt Roper wrote:
>> > On Mon, Sep 25, 2023 at 03:10:46PM -0700, Lucas De Marchi wrote:
>> > > Some of the PAT entries are relevant for own driver use, which varies
>> > > per platform. Let the PAT early initialization set what they should
>> > > point to so the rest of the driver can use them where needed.
>> > >
>> > > Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> > > ---
>> > >  drivers/gpu/drm/xe/xe_device_types.h | 2 ++
>> > >  drivers/gpu/drm/xe/xe_pat.c          | 9 +++++++++
>> > >  drivers/gpu/drm/xe/xe_pt_types.h     | 1 +
>> > >  3 files changed, 12 insertions(+)
>> > >
>> > > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> > > index 98835ee058f5..7d0f2109c23a 100644
>> > > --- a/drivers/gpu/drm/xe/xe_device_types.h
>> > > +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> > > @@ -15,6 +15,7 @@
>> > >  #include "xe_devcoredump_types.h"
>> > >  #include "xe_gt_types.h"
>> > >  #include "xe_platform_types.h"
>> > > +#include "xe_pt_types.h"
>> > >  #include "xe_pmu.h"
>> > >  #include "xe_step_types.h"
>> > >
>> > > @@ -342,6 +343,7 @@ struct xe_device {
>> > >  		const u32 *table;
>> > >  		/** Number of PAT entries */
>> > >  		int n_entries;
>> > > +		u32 idx[__XE_CACHE_LEVEL_COUNT];
>> > >  	} pat;
>> > >
>> > >  	/** @d3cold: Encapsulate d3cold related stuff */
>> > > diff --git a/drivers/gpu/drm/xe/xe_pat.c b/drivers/gpu/drm/xe/xe_pat.c
>> > > index 86386633e206..966b63252735 100644
>> > > --- a/drivers/gpu/drm/xe/xe_pat.c
>> > > +++ b/drivers/gpu/drm/xe/xe_pat.c
>> > > @@ -108,14 +108,23 @@ void xe_pat_init_early(struct xe_device *xe)
>> > >  		xe->pat.ops = &mtl_pat_ops;
>> > >  		xe->pat.table = mtl_pat_table;
>> > >  		xe->pat.n_entries = ARRAY_SIZE(mtl_pat_table);
>> > > +		xe->pat.idx[XE_CACHE_NONE] = 2;
>> > > +		xe->pat.idx[XE_CACHE_WT] = 1;
>> > > +		xe->pat.idx[XE_CACHE_WB] = 3;
>> > >  	} else if (xe->info.platform == XE_PVC || xe->info.platform == XE_DG2) {
>> > >  		xe->pat.ops = &pvc_pat_ops;
>> > >  		xe->pat.table = pvc_pat_table;
>> > >  		xe->pat.n_entries = ARRAY_SIZE(pvc_pat_table);
>> > > +		xe->pat.idx[XE_CACHE_NONE] = 0;
>> > > +		xe->pat.idx[XE_CACHE_WT] = 2;
>> > > +		xe->pat.idx[XE_CACHE_WB] = 3;
>> >
>> > These are the PVC indices, but they're not correct for DG2.  IIRC, DG2
>> > should be using the same ones as xe_lp below (although it looks like the
>> > bspec tagging might be messed up right now...)
>>
>> But that should imply a different table, otherwise the index doesn't
>> really make much sense. Looking at i915 we have, for DG2:
>>
>> drivers/gpu/drm/i915/i915_pci.c
>> 	#define TGL_CACHELEVEL \
>> 		.cachelevel_to_pat = { \
>> 			[I915_CACHE_NONE]   = 0, \
>> 			[I915_CACHE_LLC]    = 1, \
>> 			[I915_CACHE_L3_LLC] = 2, \
>> 			[I915_CACHE_WT]     = 3, \
>> 		}
>
>Where (what branch) do you see that on?  drm-tip has
>
>	#define TGL_CACHELEVEL \
>		.cachelevel_to_pat = { \
>			[I915_CACHE_NONE]   = 3, \
>			[I915_CACHE_LLC]    = 0, \
>			[I915_CACHE_L3_LLC] = 0, \
>			[I915_CACHE_WT]     = 2, \
>
>which also matches the intel_gtt.c programming below.
>
>The table you pasted should be the LEGACY_CACHELEVEL identity table,

yes, I was just reading the wrong table here. With the right table at
hand it makes much more sense.

Lucas De Marchi

>used for platforms that don't really have/use PAT index.
>
>
>Matt
>
>>
>> drivers/gpu/drm/i915/gt/intel_gtt.c
>> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(0), GEN8_PPAT_WB);
>> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(1), GEN8_PPAT_WC);
>> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(2), GEN8_PPAT_WT);
>> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(3), GEN8_PPAT_UC);
>> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(4), GEN8_PPAT_WB);
>> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(5), GEN8_PPAT_WB);
>> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(6), GEN8_PPAT_WB);
>> 	intel_gt_mcr_multicast_write_fw(gt, XEHP_PAT_INDEX(7), GEN8_PPAT_WB);
>>
>> So, it's effectively:
>> 	I915_CACHE_NONE		-> GEN8_PPAT_WB
>> 	I915_CACHE_LLC		-> GEN8_PPAT_WC
>> 	I915_CACHE_L3_LLC	-> GEN8_PPAT_WT
>> 	I915_CACHE_WT		-> GEN8_PPAT_UC
>>
>> Does this even make sense?
>>
>> For TGL it is:
>> 	I915_CACHE_NONE		-> GEN8_PPAT_UC
>> 	I915_CACHE_LLC		-> GEN8_PPAT_WB
>> 	I915_CACHE_L3_LLC	-> GEN8_PPAT_WB
>> 	I915_CACHE_WT		-> GEN8_PPAT_WT
>>
>> .... which seems better. /me confused, maybe I'm reading something wrong
>> here.
>>
>> Lucas De Marchi
>>
>> >
>> > Matt
>> >
>> > >  	} else if (GRAPHICS_VERx100(xe) <= 1210) {
>> > >  		xe->pat.ops = &tgl_pat_ops;
>> > >  		xe->pat.table = tgl_pat_table;
>> > >  		xe->pat.n_entries = ARRAY_SIZE(tgl_pat_table);
>> > > +		xe->pat.idx[XE_CACHE_NONE] = 3;
>> > > +		xe->pat.idx[XE_CACHE_WT] = 2;
>> > > +		xe->pat.idx[XE_CACHE_WB] = 0;
>> > >  	} else {
>> > >  		/*
>> > >  		 * Going forward we expect to need new PAT settings for most
>> > > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h b/drivers/gpu/drm/xe/xe_pt_types.h
>> > > index 64e3921a0f46..bf5000499251 100644
>> > > --- a/drivers/gpu/drm/xe/xe_pt_types.h
>> > > +++ b/drivers/gpu/drm/xe/xe_pt_types.h
>> > > @@ -17,6 +17,7 @@ enum xe_cache_level {
>> > >  	XE_CACHE_NONE,
>> > >  	XE_CACHE_WT,
>> > >  	XE_CACHE_WB,
>> > > +	__XE_CACHE_LEVEL_COUNT,
>> > >  };
>> > >
>> > >  #define XE_VM_MAX_LEVEL 4
>> > > --
>> > > 2.40.1
>> > >
>> >
>> > --
>> > Matt Roper
>> > Graphics Software Engineer
>> > Linux GPU Platform Enablement
>> > Intel Corporation
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 9/9] fixup! drm/xe/display: Implement display support
  2023-09-25 23:42   ` Matt Roper
@ 2023-09-26 20:48     ` Lucas De Marchi
  2023-09-26 21:04       ` Matt Roper
  0 siblings, 1 reply; 29+ messages in thread
From: Lucas De Marchi @ 2023-09-26 20:48 UTC (permalink / raw)
  To: Matt Roper; +Cc: intel-xe, Matthew Auld

On Mon, Sep 25, 2023 at 04:42:04PM -0700, Matt Roper wrote:
>On Mon, Sep 25, 2023 at 03:10:49PM -0700, Lucas De Marchi wrote:
>> Fix build break due to previous commit. Squash on "Implement display
>> support" to be moved above the previous commit.
>
>I know Matt Auld's series is going to come through and replace all the
>hardcoded cache settings with the proper PAT index for whatever object
>we're operating on, but for the time being should we at least switch the
>hardcoding in this patch to be CACHE_WT rather than CACHE_WB?  Since
>display is never coherent (regardless of the platform's CPU<->GT
>coherency behavior), WB would pretty much always be the wrong thing for
>surfaces that we're mapping into a DPT.

makes sense, but this means we are changing the behavior for all the
platforms with display. MTL was using a hack to use index 3 and all
the others were (implicitly) using index 0, which means WB for all the
platforms with display.

Lucas De Marchi

>
>
>Matt
>
>>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>  drivers/gpu/drm/xe/display/xe_fb_pin.c | 32 ++++++++++++++++++--------
>>  1 file changed, 23 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> index 3422942a9951..b7a04fba3585 100644
>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>> @@ -17,7 +17,10 @@ static void
>>  write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_ofs,
>>  		  u32 width, u32 height, u32 src_stride, u32 dst_stride)
>>  {
>> +	struct xe_device *xe = xe_bo_device(bo);
>> +	struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>>  	u32 column, row;
>> +
>>  	/* TODO: Maybe rewrite so we can traverse the bo addresses sequentially,
>>  	 * by writing dpt/ggtt in a different order?
>>  	 */
>> @@ -26,8 +29,10 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_
>>  		u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
>>
>>  		for (row = 0; row < height; row++) {
>> -			iosys_map_wr(map, *dpt_ofs, u64,
>> -				     xe_ggtt_pte_encode(bo, src_idx * XE_PAGE_SIZE));
>> +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
>> +							      XE_CACHE_WB);
>> +
>> +			iosys_map_wr(map, *dpt_ofs, u64, pte);
>>  			*dpt_ofs += 8;
>>  			src_idx -= src_stride;
>>  		}
>> @@ -46,6 +51,7 @@ static int __xe_pin_fb_vma_dpt(struct intel_framebuffer *fb,
>>  {
>>  	struct xe_device *xe = to_xe_device(fb->base.dev);
>>  	struct xe_tile *tile0 = xe_device_get_root_tile(xe);
>> +	struct xe_ggtt *ggtt = tile0->mem.ggtt;
>>  	struct xe_bo *bo = intel_fb_obj(&fb->base), *dpt;
>>  	u32 dpt_size, size = bo->ttm.base.size;
>>
>> @@ -76,9 +82,12 @@ static int __xe_pin_fb_vma_dpt(struct intel_framebuffer *fb,
>>  	if (view->type == I915_GTT_VIEW_NORMAL) {
>>  		u32 x;
>>
>> -		for (x = 0; x < size / XE_PAGE_SIZE; x++)
>> -			iosys_map_wr(&dpt->vmap, x * 8, u64,
>> -				     xe_ggtt_pte_encode(bo, x * XE_PAGE_SIZE));
>> +		for (x = 0; x < size / XE_PAGE_SIZE; x++) {
>> +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x * XE_PAGE_SIZE,
>> +							      XE_CACHE_WB);
>> +
>> +			iosys_map_wr(&dpt->vmap, x * 8, u64, pte);
>> +		}
>>  	} else {
>>  		const struct intel_rotation_info *rot_info = &view->rotated;
>>  		u32 i, dpt_ofs = 0;
>> @@ -107,8 +116,10 @@ write_ggtt_rotated(struct xe_bo *bo, struct xe_ggtt *ggtt, u32 *ggtt_ofs, u32 bo
>>  		u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
>>
>>  		for (row = 0; row < height; row++) {
>> -			xe_ggtt_set_pte(ggtt, *ggtt_ofs,
>> -					xe_ggtt_pte_encode(bo, src_idx * XE_PAGE_SIZE));
>> +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
>> +							      XE_CACHE_WB);
>> +
>> +			xe_ggtt_set_pte(ggtt, *ggtt_ofs, pte);
>>  			*ggtt_ofs += XE_PAGE_SIZE;
>>  			src_idx -= src_stride;
>>  		}
>> @@ -150,8 +161,11 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
>>  		if (ret)
>>  			goto out_unlock;
>>
>> -		for (x = 0; x < size; x += XE_PAGE_SIZE)
>> -			xe_ggtt_set_pte(ggtt, vma->node.start + x, xe_ggtt_pte_encode(bo, x));
>> +		for (x = 0; x < size; x += XE_PAGE_SIZE) {
>> +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x, XE_CACHE_WB);
>> +
>> +			xe_ggtt_set_pte(ggtt, vma->node.start + x, pte);
>> +		}
>>  	} else {
>>  		u32 i, ggtt_ofs;
>>  		const struct intel_rotation_info *rot_info = &view->rotated;
>> --
>> 2.40.1
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Intel-xe] [PATCH 9/9] fixup! drm/xe/display: Implement display support
  2023-09-26 20:48     ` Lucas De Marchi
@ 2023-09-26 21:04       ` Matt Roper
  0 siblings, 0 replies; 29+ messages in thread
From: Matt Roper @ 2023-09-26 21:04 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: intel-xe, Matthew Auld

On Tue, Sep 26, 2023 at 03:48:56PM -0500, Lucas De Marchi wrote:
> On Mon, Sep 25, 2023 at 04:42:04PM -0700, Matt Roper wrote:
> > On Mon, Sep 25, 2023 at 03:10:49PM -0700, Lucas De Marchi wrote:
> > > Fix build break due to previous commit. Squash on "Implement display
> > > support" to be moved above the previous commit.
> > 
> > I know Matt Auld's series is going to come through and replace all the
> > hardcoded cache settings with the proper PAT index for whatever object
> > we're operating on, but for the time being should we at least switch the
> > hardcoding in this patch to be CACHE_WT rather than CACHE_WB?  Since
> > display is never coherent (regardless of the platform's CPU<->GT
> > coherency behavior), WB would pretty much always be the wrong thing for
> > surfaces that we're mapping into a DPT.
> 
> makes sense, but this means we are changing the behavior for all the
> platforms with display. MTL was using a hack to use index 3 and all
> the others were (implicitly) using index 0, which means WB for all the
> platforms with display.

Yeah, good point.  And now that I look at this again with fresh eyes, I
see that I mixed up CPU:WT with GPU:WT.  It's the CPU-side mapping of
scanout buffers that needs to be WT (or WC/UC although those would give
lower performance), not the GPU-side mapping in the PTEs.  I think
GPU:WB should still be fine since we'll be issuing a PIPE_CONTROL /
MI_FLUSH_DW after generating the content, before it gets flipped to a
display plane.

So disregard my earlier comments.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>

This isn't the first time there's been mixups about whether WB/WT/WC/UC
terms in the code refer to the CPU or GPU side.  I know Matt Auld's
series has added some "cpu" terminology to some of the fields/defines to
help make the code more obvious; maybe these XE_CACHE_* macros also need
to pick up a "_GPU_" in their name as well at some point.


Matt

> 
> Lucas De Marchi
> 
> > 
> > 
> > Matt
> > 
> > > 
> > > Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/display/xe_fb_pin.c | 32 ++++++++++++++++++--------
> > >  1 file changed, 23 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > index 3422942a9951..b7a04fba3585 100644
> > > --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
> > > @@ -17,7 +17,10 @@ static void
> > >  write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_ofs,
> > >  		  u32 width, u32 height, u32 src_stride, u32 dst_stride)
> > >  {
> > > +	struct xe_device *xe = xe_bo_device(bo);
> > > +	struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
> > >  	u32 column, row;
> > > +
> > >  	/* TODO: Maybe rewrite so we can traverse the bo addresses sequentially,
> > >  	 * by writing dpt/ggtt in a different order?
> > >  	 */
> > > @@ -26,8 +29,10 @@ write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32 *dpt_ofs, u32 bo_
> > >  		u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
> > > 
> > >  		for (row = 0; row < height; row++) {
> > > -			iosys_map_wr(map, *dpt_ofs, u64,
> > > -				     xe_ggtt_pte_encode(bo, src_idx * XE_PAGE_SIZE));
> > > +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
> > > +							      XE_CACHE_WB);
> > > +
> > > +			iosys_map_wr(map, *dpt_ofs, u64, pte);
> > >  			*dpt_ofs += 8;
> > >  			src_idx -= src_stride;
> > >  		}
> > > @@ -46,6 +51,7 @@ static int __xe_pin_fb_vma_dpt(struct intel_framebuffer *fb,
> > >  {
> > >  	struct xe_device *xe = to_xe_device(fb->base.dev);
> > >  	struct xe_tile *tile0 = xe_device_get_root_tile(xe);
> > > +	struct xe_ggtt *ggtt = tile0->mem.ggtt;
> > >  	struct xe_bo *bo = intel_fb_obj(&fb->base), *dpt;
> > >  	u32 dpt_size, size = bo->ttm.base.size;
> > > 
> > > @@ -76,9 +82,12 @@ static int __xe_pin_fb_vma_dpt(struct intel_framebuffer *fb,
> > >  	if (view->type == I915_GTT_VIEW_NORMAL) {
> > >  		u32 x;
> > > 
> > > -		for (x = 0; x < size / XE_PAGE_SIZE; x++)
> > > -			iosys_map_wr(&dpt->vmap, x * 8, u64,
> > > -				     xe_ggtt_pte_encode(bo, x * XE_PAGE_SIZE));
> > > +		for (x = 0; x < size / XE_PAGE_SIZE; x++) {
> > > +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x * XE_PAGE_SIZE,
> > > +							      XE_CACHE_WB);
> > > +
> > > +			iosys_map_wr(&dpt->vmap, x * 8, u64, pte);
> > > +		}
> > >  	} else {
> > >  		const struct intel_rotation_info *rot_info = &view->rotated;
> > >  		u32 i, dpt_ofs = 0;
> > > @@ -107,8 +116,10 @@ write_ggtt_rotated(struct xe_bo *bo, struct xe_ggtt *ggtt, u32 *ggtt_ofs, u32 bo
> > >  		u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
> > > 
> > >  		for (row = 0; row < height; row++) {
> > > -			xe_ggtt_set_pte(ggtt, *ggtt_ofs,
> > > -					xe_ggtt_pte_encode(bo, src_idx * XE_PAGE_SIZE));
> > > +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
> > > +							      XE_CACHE_WB);
> > > +
> > > +			xe_ggtt_set_pte(ggtt, *ggtt_ofs, pte);
> > >  			*ggtt_ofs += XE_PAGE_SIZE;
> > >  			src_idx -= src_stride;
> > >  		}
> > > @@ -150,8 +161,11 @@ static int __xe_pin_fb_vma_ggtt(struct intel_framebuffer *fb,
> > >  		if (ret)
> > >  			goto out_unlock;
> > > 
> > > -		for (x = 0; x < size; x += XE_PAGE_SIZE)
> > > -			xe_ggtt_set_pte(ggtt, vma->node.start + x, xe_ggtt_pte_encode(bo, x));
> > > +		for (x = 0; x < size; x += XE_PAGE_SIZE) {
> > > +			u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x, XE_CACHE_WB);
> > > +
> > > +			xe_ggtt_set_pte(ggtt, vma->node.start + x, pte);
> > > +		}
> > >  	} else {
> > >  		u32 i, ggtt_ofs;
> > >  		const struct intel_rotation_info *rot_info = &view->rotated;
> > > --
> > > 2.40.1
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-09-26 21:04 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-25 22:10 [Intel-xe] [PATCH 0/9] Refactor kernel setting of PAT Lucas De Marchi
2023-09-25 22:10 ` [Intel-xe] [PATCH 1/9] drm/xe: Normalize pte/pde encoding Lucas De Marchi
2023-09-25 22:30   ` Matt Roper
2023-09-26 13:43     ` Lucas De Marchi
2023-09-25 22:10 ` [Intel-xe] [PATCH 2/9] drm/xe: Remove check for vma == NULL Lucas De Marchi
2023-09-25 22:31   ` Matt Roper
2023-09-25 22:10 ` [Intel-xe] [PATCH 3/9] drm/xe: Use vfunc for pte/pde ppgtt encoding Lucas De Marchi
2023-09-25 22:46   ` Matt Roper
2023-09-25 22:10 ` [Intel-xe] [PATCH 4/9] drm/xe/migrate: Do not hand-encode pte Lucas De Marchi
2023-09-25 22:55   ` Matt Roper
2023-09-25 22:10 ` [Intel-xe] [PATCH 5/9] drm/xe: Use vfunc to initialize PAT Lucas De Marchi
2023-09-25 23:08   ` Matt Roper
2023-09-25 22:10 ` [Intel-xe] [PATCH 6/9] drm/xe/pat: Keep track of relevant indexes Lucas De Marchi
2023-09-25 23:14   ` Matt Roper
2023-09-26 14:43     ` Lucas De Marchi
2023-09-26 15:34       ` Matt Roper
2023-09-26 16:55         ` Lucas De Marchi
2023-09-25 22:10 ` [Intel-xe] [PATCH 7/9] drm/xe: Use pat_index to encode pte Lucas De Marchi
2023-09-25 23:27   ` Matt Roper
2023-09-26 14:59     ` Lucas De Marchi
2023-09-25 22:10 ` [Intel-xe] [PATCH 8/9] drm/xe: Use vfunc for ggtt pte encoding Lucas De Marchi
2023-09-25 23:37   ` Matt Roper
2023-09-25 22:10 ` [Intel-xe] [PATCH 9/9] fixup! drm/xe/display: Implement display support Lucas De Marchi
2023-09-25 23:42   ` Matt Roper
2023-09-26 20:48     ` Lucas De Marchi
2023-09-26 21:04       ` Matt Roper
2023-09-25 22:14 ` [Intel-xe] ✓ CI.Patch_applied: success for Refactor kernel setting of PAT Patchwork
2023-09-25 22:14 ` [Intel-xe] ✓ CI.checkpatch: " Patchwork
2023-09-25 22:15 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.